Data in many forms. We can love it or hate it, but we can’t leave it. In the corporate world, whether we agree or not, many valuable data sets stay in varieties of databases which include excel or Oracle, SQL server, SAP HANA, Datalake. With the introduction of cloud analytics now corporations have abundtant amount of capacity to ingest data into their cloud analytics platform built using Snowflake or Redshift or Google BigQuery.
To get the full benefit of these data sets, companies ingest these data sets to their analytics platform and integrate with other data sets on a regular basis. But there are a couple of challenges in getting real value out of the data. Main reasons for these challenges are
a) Data comes from databases which are not always a great ERP such as SAP or Oracle, therefore, data doesn’t always come with great quality.
b) Data sets ingested are high in number and ad hoc in nature, therefore, it is hard to reconcile the data to confirm all data came on time and in full.
To mitigate these challenges, companies adopt manual methods of reconciling data and checking data quality. Because this method data reconciliation and data quality check are manual, this requires custom jobs, investment of a plethora of resources and time. Everytime analytics team or business team creates ad hoc custom scripts to be sure data came in full and with good quality.
Now the question is, can we automate data reconciliation and avoid manual effort for each data set? Can we have a solution that could put checks and alerts when there are issues? Answer is 4DAlert solution, an automatic data reconciliation and data quality solution. Let’s look in detail how 4DAlert helps in automating manual data reconciliation steps.
To automate the CI/CD, schema compare, and database change deployment, the company looked for various options. Finally adopted the 4DAlert solution to automate and optimize the process. Let’s see how 4DAlert helped achieve the goal.
4DAlert is a cloud based solution that runs within a small virtual machine. Companies don’t need huge infrastructure to run the solution, rather an Azure VM or EC2 instance with as minimum as 2Vcore is enough to run the solution. Solution could also run inside docker and Kubernetes architecture without any issue. This gives an opportunity for companies to run the solution with a minimal infrastructure.
4DAlert leverages an API based architecture which allows the solution to integrate with most modern analytics platforms and source systems. Whether you are adopting a Snowflake or Redshift or Azure Synapse or Databricks or Google BigQuery analytics platform or you are pulling data from Excels stored in on prem or in Google Sheets or SAP or Oracle ERPs 4DAlert could connect to all the systems seamlessly.
Creating a new connection is always easy. Solution provides an Wizard to connect to the system of your choice and allows you to connect to the system in no time. Also we add connectivity to new systems very frequently. If you have a system and that we don;t connect today, no worries. We would build a connector for you in as little as 2 weeks. Yes, you are right, we would create it in 2 weeks time.
As soon as you establish a connection, the solution does the rest. 4DAlert automatically reads the datasets available in the system(of course only the data sets to which you give access) and imports all metadata data i.e. Entity name, Columns, Data types etc.. After initial import, the solution syncs up the metadata on a periodic basis and keeps the structure in sync with any new additions or deletion of data sets. Isn’t that cool ?
In this method, we describe what the program should do, but don’t specify the control flow. For example, a declarative language might say “Prepare me chicken for dinner”.
Once meta data is imported, the system automatically scans the data and leverages its AI/ML algorithm to automatically propose the data quality rules. This is a highly efficient feature that saves tons of time and helps apply the rules as appropriate to different data sets.
Along with automatic rules, the solution provides a rule catalog that you could choose and then customize for any particular rule. Solution also allows you to write your own custom SQL snippet that you could leverage to define any complex rules for any particular business scenarios.
This is a key feature of the solution that allows you to reconcile data between source systems- (in this case an excel file stored in shared drive or Google excel stored in cloud) and data ingested into your Snowflake or Google BigQuery or Redshift or Azure analytics system.
There are multiple ways you could configure your reconciliation setup. a) Wizard method b) Custom method c) API method
a) Wizard method – This is a very simple way of configuring alerts. This provides a simple drag and drop method of configuring entities, segments and measures that you want to reconcile.
b)Custom Query method – In this method, you could write your own SQL, embed any business logic or complex transformations in the form of SQL query. This is a very powerful method in which you configure once and the system reconciles automatically.
c)API method – API methods allows you to read data from any system (Ex Workday or SalesForce or third party systems such as D&B etc..) using a simple API. Once data is read from the API, that is reconciled automatically with source data.
As the system reconciles data between source and target systems and finds any data anomaly, the system automatically generates alerts and delivers them to the stakeholders designated for that particular reconciliation setup. Alerts could be delivered using your favorite method which includes email, text or Microsoft Team message. Anomalies are detected either using AI/ML algorithm or as per pre-defined threshold which could be customized as per the business requirement. Another key feature of the solution is automatic adoption i.e. as data changes over a period of time systems automatically adapts to the changed data and changes outlier detection methodology accordingly.
The 4DAlert solution comes with a set of predefined dashboards that provides an overall status of the health of data at any particular time or a trend over a period of time. These predefined dashboards are a good way of looking at the overall data reconciliation and data quality status and make decisions on what , why and when the state of the data improves or drops. These dashboards are customizable as per business needs or companies could create new dashboards using the data collected by the solution.
Manual data reconciliation is labor intensive, error prone and not repeatable. Without a solution like 4DAlert, companies dedicate manpower who spend hours and days in reconciling data which eventually does not cover all aspects of the reconciliation. 4DAlert automates the process end to end and provides a comprehensive capability to reconcile data and check data quality of the data in any analytics system.