First Line Software is a premier provider of software engineering, software enablement, and digital transformation services. Headquartered in Cambridge, Massachusetts, the global staff of 450 technical experts serve clients across North America, Europe, Asia, and Australia.
How do you move from here… to… there? As humans, we can get into or onto a vehicle and go; walking is always an option before there ever were options like a horse. Animals also travel and more often than not, make their migrations round-trips. But you’ve been tasked by your company to move information from one place to another, specifically to achieve a goal or to complete or initiate a particular task.
ETL vs. Data Migration
This is where you’ve likely come across the terms, ETL and Data Migration. ETL stands for Extract, Transform, Load. This three-step process occurs when you want to move information from one system into another data environment. Data migration and ETL are similar in that they both involve moving information from one source to another. The difference is, data migration does not involve changing the fundamental structure of the source information whereas ETL does… that’s where the word, “transform” comes into play. ETL usually is the chosen route when an organization wants to utilize or maximize the most out of the data that they already have by creating added value out of the “transformed” data. Examples of this might be (but certainly not limited to) solving business problems by looking at fundamentally new analytics, constructing new business applications, or finding unexploited competitive advantages.
A simple analogy to this is the act of mining for gold. Extract is panning a river for large nuggets of gold as big as a golf ball. Transform is melting the gold down, removing any impurities, and putting it into a standardized format like bullion. Load is then moving the gold bricks to a jeweler to be made later into jewelry. Each step of this process adds and creates value over the one before it.
Data Migration is typically utilized for simply technical aspects required by an organization. Examples of this can be transitioning to another platform, database maintenance, or backup and replication of existing data. The goal is usually to basically move the original information from one place to another, with little to no transformation or manipulation of the data.
Ultimately the differences between ETL and Data migration (and which one gets utilized) are determined by the desired business task in the end, and not the technological features of the original information in the first place. We’ll focus on ETL here as that’s the process that adds real value to an organization recognizing the need to extract all the information it can from the data it has collected.
How is ETL implemented?
Today, companies across every major industry accumulate data recognized in the three “V’s” – encompassing the volume of information, the velocity or speed at which it is updated and collected, and the variety or scope of the data points being covered. This paradigm makes the Transform part of ETL becomes perhaps the most critical and is the point of distinction between simple data migration and true ETL.
The transform process is actually an incredibly complex series of data manipulations that combine calculation and sophisticated applications of algorithms that process the updated information, which ultimately creates the valuable data that is Loaded in the end.
When a company performs simple data migration, usually all that is required is a database engineer who can oversee the process to make sure all of the information is transferred smoothly from point A to B. However in the Transform process of ETL it’s not that simple and is virtually impossible without specific domain experts that have a high-level knowledge of the information itself. This is absolutely necessary to ensure that the accuracy and consistency of the data are preserved. A domain expert is a valuable human resource that depending on the domain or area of expertise is not always likely to be found within the company needing to execute an ETL process. For instance, a doctor is an expert in the domain of healthcare. The development of healthcare industry-related software and its manipulation during the ETL process requires high-level knowledge in two different domains: healthcare and software. For this reason, established software companies such as First Line Software have on-staff, trained clinicians and medical informaticists that work on every ETL project to assure that data is clean, meaningful, and usable. The doctors and the programmers work together as one team because the speed and quality of the project depend on this symbiosis.
When delving even further into ETL as businesses small and large recognize the value of the process, we see the emergence of new essential roles in addition to domain experts such as software developers and architects, business analysts, and dedicated testers. And the ETL process itself continues to evolve with complexities and stages being added such as separate processing and analysis of the data which includes verifying the completeness and accuracy of the information, constructing analytical reports to control the construction of the ETL process, and the emergence of automation, AI, testing, and various other internal tasks to ensure the actual user is more than satisfied with the end result.
So, what does that end result look like?
The emergent, newly constructed system utilizes the original, raw data in a new way allowing the business to receive fundamentally new data from the system. This results in the highly-desired, (but often difficult to achieve) synergy effect – where the result of combination or cooperation of two or more parts is greater than the simple sum of these parts.
ETL is ultimately an evolutionary development of data migration. Data migration will continue to exist as long as we have the need to move large data sets around and more organizations see the economic value in storing their information in the cloud, as opposed to hosting on local servers. But once that data has migrated to the cloud… you’re already one move closer to reaping the synergy effect benefits of ETL.
Your actual first step is to determine what business problem you want to solve… This will determine whether the next step is simply a technical data migration, or organizing a fully-executed ETL project that will take your business to the next level.