Business Use Case
Data in many forms. We can love it or hate it, but we can’t leave it. In the corporate world, whether we agree or not, many valuable data sets stay in varieties of databases which include excel or Oracle, SQL server, SAP HANA, Datalake. With the introduction of cloud analytics now corporations have abundtant amount of capacity to ingest data into their cloud analytics platform built using Snowflake or Redshift or Google BigQuery.
To get the full benefit of these data sets, companies ingest these data sets to their analytics platform and integrate with other data sets on a regular basis. But there are a couple of challenges in getting real value out of the data. Main reasons for these challenges are
a) Data comes from databases which are not always a great ERP such as SAP or Oracle, therefore, data doesn’t always come with great quality.
b) Data sets ingested are high in number and ad hoc in nature, therefore, it is hard to reconcile the data to confirm all data came on time and in full.
The Challenge
To mitigate these challenges, companies adopt manual methods of reconciling data and checking data quality. Because this method data reconciliation and data quality check are manual, this requires custom jobs, investment of a plethora of resources and time. Everytime analytics team or business team creates ad hoc custom scripts to be sure data came in full and with good quality.
Now the question is, can we automate data reconciliation and avoid manual effort for each data set? Can we have a solution that could put checks and alerts when there are issues? Answer is 4DAlert solution, an automatic data reconciliation and data quality solution. Let’s look in detail how 4DAlert helps in automating manual data reconciliation steps.
To automate the CI/CD, schema compare, and database change deployment, the company looked for various options. Finally adopted the 4DAlert solution to automate and optimize the process. Let’s see how 4DAlert helped achieve the goal.
A Cloud Based Solution- 4DAlert
4DAlert is a cloud based solution that runs within a small virtual machine. Companies don’t need huge infrastructure to run the solution, rather an Azure VM or EC2 instance with as minimum as 2Vcore is enough to run the solution. Solution could also run inside docker and Kubernetes architecture without any issue. This gives an opportunity for companies to run the solution with a minimal infrastructure.
An Api Based Solution That Integrates With Most Modern Database Systems
4DAlert leverages an API based architecture which allows the solution to integrate with most modern analytics platforms and source systems. Whether you are adopting a Snowflake or Redshift or Azure Synapse or Databricks or Google BigQuery analytics platform or you are pulling data from Excels stored in on prem or in Google Sheets or SAP or Oracle ERPs 4DAlert could connect to all the systems seamlessly.
Creating a new connection is always easy. Solution provides an Wizard to connect to the system of your choice and allows you to connect to the system in no time. Also we add connectivity to new systems very frequently. If you have a system and that we don;t connect today, no worries. We would build a connector for you in as little as 2 weeks. Yes, you are right, we would create it in 2 weeks time.
Automatic Import of Metadata
As soon as you establish a connection, the solution does the rest. 4DAlert automatically reads the datasets available in the system(of course only the data sets to which you give access) and imports all metadata data i.e. Entity name, Columns, Data types etc.. After initial import, the solution syncs up the metadata on a periodic basis and keeps the structure in sync with any new additions or deletion of data sets. Isn’t that cool ?
In this method, we describe what the program should do, but don’t specify the control flow. For example, a declarative language might say “Prepare me chicken for dinner”.
Once meta data is imported, the system automatically scans the data and leverages its AI/ML algorithm to automatically propose the data quality rules. This is a highly efficient feature that saves tons of time and helps apply the rules as appropriate to different data sets.
Custom Data Quality Rules
Along with automatic rules, the solution provides a rule catalog that you could choose and then customize for any particular rule. Solution also allows you to write your own custom SQL snippet that you could leverage to define any complex rules for any particular business scenarios.
This is a key feature of the solution that allows you to reconcile data between source systems- (in this case an excel file stored in shared drive or Google excel stored in cloud) and data ingested into your Snowflake or Google BigQuery or Redshift or Azure analytics system.
There are multiple ways you could configure your reconciliation setup. a) Wizard method b) Custom method c) API method
a) Wizard method – This is a very simple way of configuring alerts. This provides a simple drag and drop method of configuring entities, segments and measures that you want to reconcile.
b)Custom Query method – In this method, you could write your own SQL, embed any business logic or complex transformations in the form of SQL query. This is a very powerful method in which you configure once and the system reconciles automatically.
c)API method – API methods allows you to read data from any system (Ex Workday or SalesForce or third party systems such as D&B etc..) using a simple API. Once data is read from the API, that is reconciled automatically with source data.