Proactive Data Monitoring – How to avoid incidents

Introduction

As data platforms become more complex, effective data monitoring becomes paramount. This is where proactive data monitoring comes into play. Unlike traditional reactive monitoring, which addresses issues after they occur, proactive data monitoring aims to identify and resolve potential data problems before they impact the business.

What is it?

Proactive data monitoring is the process of continuously monitoring to identify issues. It requires automated monitoring is integrated into the data platform so that an appropriate alert is sent to the responsible party when a problem is detected.

What it is not

It is not a manual process, it is not retrospective, and it is not spam that makes it hard to find the value. If the only method used for communicating monitoring insights is a PowerPoint slide deck, it cannot be considered proactive.

Why is it important?

Genuinely proactive monitoring is vital in data platforms for the following reasons:

How do you know if you are doing it?

In my opinion, there is a single measurement of proactive data monitoring. If you are mostly being told about a data issue by a user logging an incident, then you are not proactively monitoring and alerting.

Challenges

Detecting issues that have happened is easy and typically is an out-of-the-box feature with most solutions. Capturing events that have not happened is where the challenge starts, examples might be a pipeline that did not run due to a trigger that was disabled to conduct a maintenance task and was not re-enabled by mistake.

The consequence of this can be that the overnight, 8-hour ETL job is only restarted when a business user realises their data is 35 hours old and raises an incident.

Examples

There are many types of scenarios where issues need to be proactively captured and acted on. Here are just a handful of examples:

Costs that are exceeding budgeted levels.A Power BI report that consumes excessive CPU, thereby impacting platform performance.
Errors when a pipeline fails.A blocking scenario exists that is delaying pipeline completion.
Data that was not received from a third party.Data ingestion run time that is increasing and will impact business reporting in weeks if not optimised.
Pipelines that fail to finish but do not fail with an error.Rows counts are progressively diverging between source and target.
Pipelines that fail to start due to an orchestration issue.Anomalies in the load metrics that imply a data source issue (e.g. a sudden spike in UPDATES).
Erratic completion times suggesting contentionUsers accessing sensitive data which implies a security issue.
A transactional Fact has not been updated in the last 24 hours.A Dimension that has time variance issues.

These examples fall into the three categories below:

Key Components of Proactive Data Monitoring

Data Validation, Real-Time and Batch

Real-time monitoring and validation is the goal, but not always possible. A monitoring and reporting solution needs to be flexible enough to capture events as they happen along with scheduled checks for asynchronous processes.

Targeted Notifications

Notifications must be timely, and succinct and are received by the appropriate recipients. I have seen too many notifications ignored due to the recipients being overwhelmed with the stream of messages in their inboxes. Less is more!!

Notifications must be timely, and succinct and are received by the appropriate recipients. I have seen too many notifications ignored due to the recipients being overwhelmed with the stream of messages in their inboxes. Less is more!!

Notifications must be timely, and succinct and are received by the appropriate recipients. I have seen too many notifications ignored due to the recipients being overwhelmed with the stream of messages in their inboxes. Less is more!!


  • Comprehensive Reporting

Even with the advent of AI, the ability to visualise emerging trends is integral to monitoring a data platform.

Implementing Proactive Data Monitoring

To successfully implement proactive data monitoring, organisations need to:

Conclusion

The days of responding to issues, once a ticket has been raised are behind us, clients expect that Managed Data Service providers are avoiding the impact of a data incident.

At Altis Consulting, we have a wide range of solutions available that help us manage your data so you can manage your business, if you want to leverage the advantages described in this blog post, please reach out to blah@Altis.com.au