- Needed time to do the effective migration
- Cleaning up data
- Validate current data (consistency)
- Transform existing data in a new model
- Handle special cases
This list isn’t a complete one, but you see that it is essential to have a plan how to handle it. One thing I learned during several data migrations about data quality: It’s not only for the current application, is also about future applications which will use the data.
Data as a business value
For some companies, data is all what they have. For example an insurance: they don’t have a material product (beside the brochures). What they have is data. Their data is about their customers and cases. So their whole business is about this data and its quality is essential to make profit or not.
Consequences of miss of quality in data
What happens if the data quality isn’t good enough? Well, if your application could handle it, nothing. But that’s exactly the point. If the data quality isn’t good enough, your application has to compensate it everywhere. Every dialog, validation rule or report has to deal with it. This is a huge amount of work and risk!
As in every company you have also in software companies fluctuation of people. This fluctuation means also a probable loss of knowhow. If you have a problematic data quality, you have to document it, so that other people are able to create new features and know all the “special cases”. But we know all, that documentation isn’t the best made part in software development. Also the maintenance of the documentation isn’t always as good as it should be.
Not to handle the risk of bad data quality is a wrong decisions and means in some cases also loss of profit.
Tackle bad data quality during design
One of the biggest problem is a data model which allows “special cases”. As a software engineer or architect you have to fight against every drop of constraints or rules. Also referential integrity is indisputable. Be aware to change the attribute of a field to allow NULL values, because this means that you have probably found a “special case”. There are for sure other “smells”, which open the door for bad data quality.
Tackle bad data quality during introduction
Too often data migration are underestimated or missing in a project plan. Data migration is the most important task when you introduce a new software based on existing data.
When you do a data migration, try really to use the new software to migrate the data. This means, that you reuse the validation rules and also some of the layers of your new software, for example the business layer and the data access layer. If you use parts of your software you create data which the data will also create and is able to read. With this procedure you can avoid “special cases”. Another point is the reuse of code, you don’t have to code the same rules again, so the risk of bugs sinks.
Tackle bad data quality during production
First you have to know how good or bad the quality of the data is. For this purpose I recommend you to create a data quality report. This report contains the results of several consistency checks. If you design the consistency checks properly, then you create per consistency check a class (single responsibility principle). And in some cases, you can even implement the cleanup logic for the found inconsistencies.But this quality report with the consistency checks and some cleanup logic is just to cleanup a mess. You have to improve the application by fixing the sources of bad data quality.
It is one of the most important tasks in enterprise development to deal with data. To ensure good data quality is an ongoing task. It starts during the design, where you have to create a simple, consistent data model. Try to describe all the constraints or rules and enforce referential integrity on the database (as a minimum).
I highly recommend you to use the new software to migrate data, try to reuse your code, just follow the DRY principle.
What I also really recommend you is the data quality report based on implemented consistency rules. It is also worthwhile to try automate the cleanup of the found inconsistencies. I created a little framework, so it is easier for the application to create and run consistency checks.
Not only you, your company and your customer will profit by the better data quality, also the future software engineers who have to work with data data will profit.