Data migration is like everything else – there is a right way and a wrong way to do it. Actually there is a right way for every scenario, and many, many wrong ways! This article will help you consider the alternatives.
What is data migration?
Just to make sure we’re comparing apples to apples, we had better define our terms. At times like this, I turn to Wikipedia:
Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger/takeover).
Can you imagine doing this by hand?? Nowhere in the world are there workers desperate enough to take on that task uncoerced. Hence the many automated tools on the market – computers will do anything.
What to look for
When you’re looking for a solution to your data migration problem, make sure that you consider a few key features that can make the difference between success and disaster.
1. Enables schema changes
At the end of the day, you want your mission-critical data organized in a rational way that makes it easy to get the data out in a form that serves your business. Trouble is, most legacy data stores are anything but rational. They have grown organically over the past 20 years like English Ivy next to a leaky nuclear reactor – not in an organized way, but with a certain craziness. They tend to exhibit “archeological layering”, so you can see where a certain field switched from use A to use B without actually changing names. The bottom line is that if you have a chance to fix your underlying data model, you should do it. It will make your life much easier in the future. Look for a toolset that allows a flexible and powerful assortment of transformations, including table splitting and merging, rationalization of foreign key relationships, extraction of value lists, etc.
2. Declarative
I hesitate to trot out the comp-sci jargon, but you want a data migration tool that stores your transforms as a set of rules, not as a script. Writing a bunch of scripts is kind of the old fashioned way to get the job done. And sometimes those scripts can get pretty huge and ugly. After all, it’s a programming task in itself, and not a pretty one. Do yourself a big favor and use a specialized tool that has some understanding of the fundamentals of the problem and stores your migration rules as rules, not scripting statements.
3. Iterative
Admit it – you will not get this right the first time. Trying to get it right the first time would be a huge waste of time, and probably lead you down the garden path to “analysis paralysis.” You need to get something done, see how it worked, learn from it, learn your lessons, make changes and do it again. Lather, rinse, repeat. You want a toolset that lets you run your data migration, inspect the results, and facilitate the iterative refinement of the script.
4. Auditable
In some environments this is not a nice-to-have, it’s the law. We can’t go losing our patient drug allergy records or our policy line items or (God forbid) our deposit and withdrawal records! And even in less regulated niches, it’s a good idea to be able to show where every record came from, and where every record went.
5. High performance
This is probably obvious, but when you’re cutting over from the old environment to the new, you will not have weeks and weeks go do it. You’ll be lucky to get 24 hours. And you may have hundreds of millions of records to read, cleanse, transform, re-insert, validate and audit in that all-too-brief period of time. Your tool had better be up to the task!
Integration: the killer feature while modernizing
It’s hard to overstate how useful a good data migration toolset is while you’re modernizing a large legacy system, especially if it’s integrated properly with your modernization toolset.
The first huge benefit is test data. While you’re developing, it makes a world of difference to be able to try out your new screens, batches and reports with actual (not just realistic, but actual) data. So the earlier it’s available in the life cycle, the better. Woe betide those who leave data migration to the end of the project, for they have missed a big opportunity.
Integration is a two-way street as well, because it’s extremely valuable to feed the inevitable screen and interfaces changes back into the modernized schema and the transformation rules, so that you can keep your test data happening with limited fuss.
Yes we built one
I’m sure you will not be shocked to discover that MAKE has just such a tool, designed from the ground-up with data migration in mind, integrated with the rest of our modernization suite. But don’t take my word for it – use your new found knowlege and perspective to evaluate all the data migration tools on the market before making any kind of decision.
- Tom Metzger




