Loading a lot of data to Google BigQuery. That is great. But do you reconcile data after every load to make sure data is in good quality?

 

A large retail company has multiple ERPs and systems. Everyday, analytics teams load large volumes of data from multiple systems to Google BigQuery analytics platform(or for that matter to any analytics system such as Snowflake or Azure or Databricks or something similar). ETL jobs show green, but from time to time users find missing data or duplicate data that throws the numbers off. 

 

 

When this occurs, the analytics team starts taking corrective action which sometimes takes days or weeks and users are not able to use the analytics system. It results in lost productivity, lost faith in the system and so on.

All these issues could have been avoided if there were a system in place that reconciles the data in the analytics system periodically and takes action proactively rather than reactively. 

Lets see how this could be achieved with 4DAlert- a cloud based AI/ML solution that delivers automatc data reconciliation and data quality.

 

 

4DAlert – Automatic Data Reconciliation And Data Quality

 

4DAlert is a one stop solution that connects to a diverse set of source databases such as SAP, Oracle, Salesforce, WorkDay, SQL Server, Data lake etc…within your landscape and reconciles data from those source systems with your data in your analytics systems(which could be Google Bigquery, Snowflake, Redshift or Databricks) automatically. Whenever a system detects any anomaly in data, it automatically sends alerts proactively to the pre-defined group of stakeholders and helps the analytics team take proactive action. Isn’t that cool?

 

Connectivity to Diverse Database Systems

 

4DAlert connects to most common modern databases using its API technology. Many customers use 4DAlert to reconcile data between SAP HANA, Oracle, SQL Server, mySQL, Flat files, Cloud APIs for data coming from Salesforce or Workday with their data lake built in Azure Data Lake Services(ADLS), AWS , Athena, Snowflake, Google Bigquery, Databricks etc. These databases are mentioned here as these are the most common databases used.

 

 

Periodic Automatic Data Reconciliation

 

4DAlert provides multiple methods for you to configure data reconciliation. Simple data reconciliations could be set up using its Wizard that doesn;t require to learn anything about database or SQL. You could simply select a entity and attributes and then reconcile data.

If you need any special logic within your reconciliation, then no issues. You could write your SQLs and embed those into your reconciliation setup. The solution runs these SQLs on a periodic basis within your source or/an target system and reconciles data.

FInally, API method available for data reconciliation in the scenarios when you are not able to connect to the database directly or system doesn;t allow you direct connectivity. In those cases, the solution provides APIs to read data from those systems and reconciles data.

 

 

How Do You Know If It’s An Anomaly?

 

When you compare data between source and analytics systems, how does the system know that it is anomally? There are several ways systems can detect the anomalies. First, systems use AI/ML technology to learn from historical data issues and analyze the new data to flag the anomallies. Second, users could provide their own data anomaly criterias and system could use a combination of user defined parameters and its AI/ML algorithm to detect anomalies.

Once anomalies are detected, the system sends the alerts using your preferred channels which could be email, message via Microsoft Teams or Text messages. Most importantly because the solution integrates very well with your organization’s Active directory, when stakeholders leave the position, the system automatically stops sending the alerts to those stakeholders.

 

Is There Any Way I Could Be Alerted On Anomalies Even Without Connecting To Source Databases?

 

Many times, due to one or other reason it is not always possible to connect to source systems. In that case 4DAlert connects to your Google Bigquery platform periodically and checks the new data with historical snapshots and then uses AI/ML technologies to detect the anomalies. It is very simple, if sales for a division within your company was supposed to be USD 2 Billion a year then suddenly it can’t be USD 5 Billion. I wish that were the case, but with 99.99% certainty we can say that is a data anomaly.

 

Data 4DAlert’s Data Reconciliation Is Cool. Does The Solution Does Any Other Data Quality Checks?

 

The answer is yes, You name a data quality check 4DAlert has it.

While Google bigquery allows us to store and compute large sets of data in the analytics platform, it is very imperative that we have a solution in place that checks the quality of the data. 4DAlert comes with a predefined data quality catalog of cloud analytics platforms such as Google bigquery. Customers could leverage these rules without any additional effort to build their own rules. The predefined rules check most data quality issues which include, null check, distinct checks, valid email , zip code or address check, enumeration checks, number range check, date format check etc..

Sometimes it could be necessary to build your own data quality rules for any special scenarios. In that case, 4DAlert provides a very intuitive template for you to copy the existing rules and then customize the rule for your needs. You could also write your own SQL snippet for very advanced data quality checks and 4DAlert would integrate these checks into its overall data quality rules.

 

Data Quality Checks Are Great, But Can I Be Alerted When There Is An Issue?

 

Anytime there is a data quality issue, 4DAlert categorizes those issues into different severity levels. Based on criticality of the issue, users could be alerted instantly, or daily overall summary of the issues a weekly scorecard. This is handy, because not all issues are critical and nobody wants to be alerted 1000 times a day.

 

Data Quality Dashboards

 

4DAlert provides pre-configured dashboards for a variety of needs. You could get daily, weekly or monthly summary, overall data quality scores and its trends, repeat offenders or repeat issues. These dashboards help the leadership team get a grip on the issues and look for root causes and ways to improve the quality on a regular basis.

If pre-configured dashboards are not sufficient and there is a need for custom dashboard then no issues as well. You could use any dashboarding tool such as PowerBI or Tableau to build your own dashboard.

 

Integration With Data Catalog Tool Such As Alation Or Collbra

 

In some cases, customers use data catalog tools such as Alation or Collibra and they need the data quality and data reconciliation output to be published to these data catalog tools. 4DAler has APIs that integrate with these data catalog solutions and feed periodic data quality issues. This data quality and reconciliation output provides easy options for the larger user community to be aware of the overall data quality at system level and/or at object level before they start using these data for their analysis.

 

Conclusion

 

As customers adopt Google Bigquery or Snowflake or Redshift or Databricks as their enterprise analytics platform, it is very essential that they have automated data reconciliation solutions that reconcile data on a regular basis (if possible after each load) and proactively communicate any data quality issues. Automatic data reconciliation on regular basis saves manual effort in reconciliation, proactively alerts any data issues and delivers a higher value on the investment in their analytics platform.

Like this article?

Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest