{"id":1033,"date":"2011-06-03T16:30:13","date_gmt":"2011-06-03T14:30:13","guid":{"rendered":"http:\/\/blog.eweibel.net\/?p=1033"},"modified":"2011-06-03T16:56:30","modified_gmt":"2011-06-03T14:56:30","slug":"data-quality-as-a-business-value","status":"publish","type":"post","link":"https:\/\/blog.eweibel.net\/?p=1033","title":{"rendered":"Data quality as a business value"},"content":{"rendered":"<p><a href=\"http:\/\/blog.eweibel.net\/wp-content\/uploads\/DataQuality.jpg\"><img loading=\"lazy\" decoding=\"async\" style=\"border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px\" title=\"DataQuality\" border=\"0\" alt=\"DataQuality\" align=\"right\" src=\"http:\/\/blog.eweibel.net\/wp-content\/uploads\/DataQuality_thumb.jpg\" width=\"272\" height=\"202\" \/><\/a> It could happen, that you have to do some data migrations from time to time. If you are familiar with data migrations you know that it isn\u2019t an easy job. There are several concerns:<\/p>\n<ul>\n<li>Needed time to do the effective migration <\/li>\n<li>Cleaning up data <\/li>\n<li>Validate current data (consistency) <\/li>\n<li>Transform existing data in a new model <\/li>\n<li>Handle special cases <\/li>\n<li>etc <\/li>\n<\/ul>\n<p>This list isn\u2019t a complete one, but you see that it is essential to have a plan how to handle it. One thing I learned during several data migrations about data quality: It\u2019s not only for the current application, is also about future applications which will use the data.<\/p>\n<p><strong>Data as a business value<\/strong><\/p>\n<p>For some companies, data is all what they have. For example an insurance: they don\u2019t have a material product (beside the brochures). What they have is data. Their data is about their customers and cases. So their whole business is about this data and its quality is essential to make profit or not.<\/p>\n<p><strong>Consequences of miss of quality in data<\/strong><\/p>\n<p>What happens if the data quality isn\u2019t good enough? Well, if your application could handle it, nothing. But that\u2019s exactly the point. If the data quality isn\u2019t good enough, your application has to compensate it everywhere. Every dialog, validation rule or report has to deal with it. This is a huge amount of work and risk!    <br \/>As in every company you have also in software companies fluctuation of people. This fluctuation means also a probable loss of knowhow. If you have a problematic data quality, you have to document it, so that other people are able to create new features and know all the \u201cspecial cases\u201d. But we know all, that documentation isn\u2019t the best made part in software development. Also the maintenance of the documentation isn\u2019t always as good as it should be. <\/p>\n<p>Not to handle the risk of bad data quality is a wrong decisions and means in some cases also loss of profit.<\/p>\n<p><strong>Tackle bad data quality during design<\/strong><\/p>\n<p>One of the biggest problem is a data model which allows \u201cspecial cases\u201d. As a software engineer or architect you have to fight against every drop of constraints or rules. Also referential integrity is indisputable. Be aware to change the attribute of a field to allow NULL values, because this means that you have probably found a \u201cspecial case\u201d. There are for sure other \u201csmells\u201d, which open the door for bad data quality.<\/p>\n<p><strong>Tackle bad data quality during introduction<\/strong><\/p>\n<p>Too often data migration are underestimated or missing in a project plan. Data migration is the most important task when you introduce a new software based on existing data.&#160; <br \/>When you do a data migration, try really to use the new software to migrate the data. This means, that you reuse the validation rules and also some of the layers of your new software, for example the business layer and the data access layer. If you use parts of your software you create data which the data will also create and is able to read. With this procedure you can avoid \u201cspecial cases\u201d. Another point is the reuse of code, you don\u2019t have to code the same rules again, so the risk of bugs sinks.<\/p>\n<p><strong>Tackle bad data quality during production<\/strong><\/p>\n<p><a href=\"http:\/\/blog.eweibel.net\/wp-content\/uploads\/DataQuality3.jpg\"><img loading=\"lazy\" decoding=\"async\" style=\"border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px\" title=\"DataQuality3\" border=\"0\" alt=\"DataQuality3\" align=\"right\" src=\"http:\/\/blog.eweibel.net\/wp-content\/uploads\/DataQuality3_thumb.jpg\" width=\"183\" height=\"184\" \/><\/a> First you have to know how good or bad the quality of the data is. For this purpose I recommend you to create&#160; a data quality report. This report contains the results of several consistency checks. If you design the consistency checks properly, then you create per consistency check a class (single responsibility principle). And in some cases, you can even implement the cleanup logic for the found inconsistencies.But this quality report with the consistency checks and some cleanup logic is just to cleanup a mess. You have to improve the application by fixing the sources of bad data quality.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>It is one of the most important tasks in enterprise development to deal with data. To ensure good data quality is an ongoing task. It starts during the design, where you have to create a simple, consistent data model. Try to describe all the constraints or rules and enforce referential integrity on the database (as a minimum).    <br \/>I highly recommend you to use the new software to migrate data, try to reuse your code, just follow the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Don%27t_repeat_yourself\" target=\"_blank\">DRY principle<\/a>.     <br \/>What I also really recommend you is the data quality report based on implemented consistency rules. It is also worthwhile to try automate the cleanup of the found inconsistencies. I created a little framework, so it is easier for the application to create and run consistency checks.<\/p>\n<p>Not only you, your company and your customer will profit by the better data quality, also the future software engineers who have to work with data data will profit.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It could happen, that you have to do some data migrations from time to time. If you are familiar with data migrations you know that it isn\u2019t an easy job. There are several concerns: Needed time to do the effective migration Cleaning up data Validate current data (consistency) Transform existing data in a new model Handle special cases etc This list isn\u2019t a complete one, but you see that it is essential to have a plan how to handle it&#8230;.<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/blog.eweibel.net\/?p=1033\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[14],"tags":[],"class_list":["post-1033","post","type-post","status-publish","format-standard","hentry","category-software-engineering"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/plOV9-gF","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":1198,"url":"https:\/\/blog.eweibel.net\/?p=1198","url_meta":{"origin":1033,"position":0},"title":"Are stale data evil?","author":"Patrick","date":"27. Feb 2012","format":false,"excerpt":"When you're a software engineer who produces software for enterprises like banks or assurances, then it is normal you have huge databases (several gigabytes). Such systems have an operative application where users do the daily business of the company and there are more informative parts (or strategic parts) of the\u2026","rel":"","context":"In &quot;Software architecture&quot;","block_context":{"text":"Software architecture","link":"https:\/\/blog.eweibel.net\/?cat=4"},"img":{"alt_text":"Sexy young woman as devil in fire","src":"https:\/\/i0.wp.com\/blog.eweibel.net\/wp-content\/uploads\/Fotolia_37310173_S_thumb.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1182,"url":"https:\/\/blog.eweibel.net\/?p=1182","url_meta":{"origin":1033,"position":1},"title":"Anti-Pattern &#8216;Validation by Execute &#8216;n&#8217; Rollback&#8217;","author":"Patrick","date":"21. Feb 2012","format":false,"excerpt":"Recently in some reviews I saw an anti-pattern. First you have to know, in the code, there was a validation of the data before it was stored in the database. So far so good. But when I looked at the validation code, I saw the following: public void Validate() {\u2026","rel":"","context":"In &quot;Anti patterns&quot;","block_context":{"text":"Anti patterns","link":"https:\/\/blog.eweibel.net\/?cat=8"},"img":{"alt_text":"Fotolia_20233238_S","src":"https:\/\/i0.wp.com\/blog.eweibel.net\/wp-content\/uploads\/Fotolia_20233238_S_thumb1.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":367,"url":"https:\/\/blog.eweibel.net\/?p=367","url_meta":{"origin":1033,"position":2},"title":"Reason of silence","author":"Patrick","date":"13. Aug 2009","format":false,"excerpt":"Why is there no blog entry for the July and for the half of august? Well, the answer is, that my current project where I'm the project manager and and also the leader of a team of software engineers, comes to an end. Currently we implement the last features like\u2026","rel":"","context":"In &quot;Private&quot;","block_context":{"text":"Private","link":"https:\/\/blog.eweibel.net\/?cat=9"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":677,"url":"https:\/\/blog.eweibel.net\/?p=677","url_meta":{"origin":1033,"position":3},"title":"Round-up of a data centric architecture","author":"Patrick","date":"11. Apr 2010","format":false,"excerpt":"In my last big project we had to use a data centric architecture. There was a learning curve which architecture was the most appropriate one. The result is visible in the picture bellow: Lets explaining the diagram. The data (or state) is managed by the database layer and the common\u2026","rel":"","context":"In &quot;Software architecture&quot;","block_context":{"text":"Software architecture","link":"https:\/\/blog.eweibel.net\/?cat=4"},"img":{"alt_text":"Architektur","src":"https:\/\/i0.wp.com\/blog.eweibel.net\/wp-content\/uploads\/Architektur_thumb.jpg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.eweibel.net\/wp-content\/uploads\/Architektur_thumb.jpg?resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.eweibel.net\/wp-content\/uploads\/Architektur_thumb.jpg?resize=525%2C300 1.5x"},"classes":[]},{"id":17,"url":"https:\/\/blog.eweibel.net\/?p=17","url_meta":{"origin":1033,"position":4},"title":"Get the size of your tables","author":"Patrick","date":"17. Jan 2008","format":false,"excerpt":"Recently my boss asked me, why the databases (Microsoft SQL Server 2005) of our customers are so big. With the following SQL-Statements I could give more or less an answer to my boss. CREATE TABLE PW_SPACE( name varchar(255), rows int, reserved varchar(255), data varchar(255), index_size varchar(255), unused varchar(255) ) GO\u2026","rel":"","context":"In &quot;First experiencies&quot;","block_context":{"text":"First experiencies","link":"https:\/\/blog.eweibel.net\/?cat=7"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":160,"url":"https:\/\/blog.eweibel.net\/?p=160","url_meta":{"origin":1033,"position":5},"title":"When to use stored procedures","author":"Patrick","date":"13. May 2009","format":false,"excerpt":"Recently I discussed with a colleague when to use stored procedures. As exptected it was quite a religious conversation. A few days later I found the following screencast: The Pros and Cons of Stored Procedures Based on the discussion and the screencast I tried to summarize my Pros and Cons:\u2026","rel":"","context":"In &quot;Good practices&quot;","block_context":{"text":"Good practices","link":"https:\/\/blog.eweibel.net\/?cat=5"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/posts\/1033","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1033"}],"version-history":[{"count":15,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/posts\/1033\/revisions"}],"predecessor-version":[{"id":1052,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=\/wp\/v2\/posts\/1033\/revisions\/1052"}],"wp:attachment":[{"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1033"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1033"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.eweibel.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1033"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}