Reconcile data between source and analytics database after every data load
Measures how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for purpose
Ability to understand, diagnose, and manage data health across multiple IT tools throughout the data lifecycle
An organized inventory of data assets in the organization
Practice in which incremental code changes are made frequently and reliably
Compare two database definitions and apply the differences from the source to the target
Collaborative data management practice
Flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system.
“We’re not like your other organizations. Our data is terrible.” A familiar sentence that we hear from most people using or managing data in any organization. Good news for everyone is that no organization has perfect data.
Any organization claims that their data is perfect, most likely that claim is not true or they have not analyzed their data enough. Reality is that no organization could ever reach a perfect state of data. Perfect data is an imaginary target. However, Good news is that every organization could improve and make their data better.
In today’s data driven world, organizations acquire and accumulate data from multiple sources on a continuous basis. Their data is constantly moving, drifting and changing. On top of it, as an organization’s priority changes, their requirement to generate insights from data changes which in turn requires their data team to transform, process and manage data differently.
Now the question is how should data teams prepare themselves to improve their data?
Well, getting to perfect data is a journey. Organizations need take the journey and make the journey better by adopting the right strategy, tools and techniques.
Lets see what are the steps that we need to take to make the journey to perfect data smoother and faster.
a) Develop a criteria for perfect data – First thing, every organization should develop their own criteria for data quality. Again there is no one criteria for quantifying the perfect data for all data in the same organization. For an example It could be that while the supply chain department is fine with 98% accuracy in the sales numbers because for their analysis they need a directional number not an exact number, the finance or accounts receivable team team would need sales data to reconcile 100% with invoices raised in the month or for marketing team having key attributes used for marketing segment 100% populated is the key criteria.
b) Have a dedicated team to maintain and store these rules at enterprise level – Developing these rules in a silo is not a good idea. It is essential that these rules are managed using enterprise tools and have owners defined for each of these criteria. Data reconciliation and data quality tools such as 4DAlert which is a cloud based tool that provides an enterprise platform to manage the rule catalog and specific customized rules would be good fit to maintain the criteria for perfect data.
c) Bench mark the criteria for perfect data at the beginning – Before starting the journey for perfect data, it is important to quantify where organizations stand today. Organizations should adopt a technique and process to quantify how good the data are at its current state. The result of the exercise should be leveraged as the starting point.
Tools like 4DAlert have the capability to automatically group the criteria for perfect data (data quality rules) into different categories which comes as a pre-built dashboard for organization to start with. These dashboards within 4DAlert could be customized further for any organization’s specific needs.
d) Measure the criteria for perfect data on a continuous basis – Once the initial benchmark of the criteria is complete, it is important to measure the criteria on an ongoing basis. As we discussed at the beginning, getting perfect data is a journey, so organizations need to have a process to measure the trend and improve the trend on an ongoing basis.
e) Get alert when key metrics on data quality shows a downward trend – This is the most important step in the journey for perfect data. It is ok to have a small percentage of bad data for any analytics, but knowing when this 1% becomes 2% is important. For example, an organization could have 2M customers data. Out of these 2 Million may have 200 customers blank attributes used for marketing segments and this is probably ok for the team. But this 200 should not be 400 or 2,000. When that happens data teams need to be alerted and analyze the root cause and take steps to improve. Oftentimes, it is hard to know what is the critical point at which alerts should be generated. This is why tools like 4DAlert, adopts AI/ML to analyze the historical data and future data and decide when to raise alerts.
f) Go beyond simple quality checks – Traditional view on data quality has been checking if data meets certain criteria such as blank vs non-blank checks, number range check, email formatting check, address validation checks, or data formatting checks. These data quality checks are more traditional and old methods.
In order to have a 360 degree perspective on the data quality, organizations need to go beyond these simple checks to more advanced checks such as data reconciliation with source. Certain data could have blank vs non-blank or meet number range criteria but if those data sets don’t match with source data then meeting those data quality criteria would be none of any use. That is why along with traditional data quality rules, data teams need to adopt methodology to reconcile the data in the analytics platform with source data.
Tools such as 4DAlert go beyond traditional data quality checks. Data teams schedule ETL or ELT jobs to move data from one system to another. Many times ETL jobs show status as green, but now all data moves or sometimes data moves twice. Only way the data team knows is when users complain. But that is too late to react. Users lose confidence on data and data teams scrambles to fix the issue. In this case a simple data quality check conducted by traditional tools would not solve the problem. In this case data teams need to check the density of the data by reconciling data end to end. Now the question comes: how do you reconcile data on a daily basis after each load?
Tools such as 4DAlert allow you to reconcile data between two systems. Again, it not only allows you to reconcile row counts but also allows you to reconcile real data. For example, if you are loading sales data from an on-premise Oracle system to a cloud analytics platform then you could reconcile sales data by day. Month, plant, country, region or something equivalent. If the data reconciles, then we know that we processed the data right. For details on how 4DAlert automated data reconciliation, follow my earlier BLOG.
g) AI/ML part of data quality check – As organizations accumulate data from a variety of sources, applying rules manually to all data sets becomes increasingly difficult.
Therefore organizations need to leverage AI/ML technology to measure and improve data quality. Tool such as 4DAlert sits on top of organizations’ data assets and applies auto metrics such as freshness, row count checks and other data quality checks on an automatic basis. Without applying any rules manually and without any manual intervention, organizations could automate 70% of the data quality check. If there is any organization specific data quality check then only those checks require manual intervention.
In summary, the organization and data team need to realize that no organization has perfect data. Getting to perfect data is a journey which requires continuous improvement, continuous monitoring and continuous improvement. There are tools and techniques that data teams could follow to improve the data.
Author – Nihar Rout, Managing Partner 4DAlert ,
https://www.4DAlert.com
Contact nihar.rout@performalytic.com
Keep up on our always evolving product features and technology. Enter your e-mail and subscribe to our 4DAlert.