Data Quality Series Part 2: Ensuring data quality is about finding the right balance—over-cleaning can remove valuable insights, while evolving data demands flexibility. This blog post explores how businesses can define quality thresholds, manage costs, and leverage AI-driven automation to maintain consistency and usability.
When we talk to customers about their data quality challenges and needs, regardless of the industry or company size, we hear a few common themes:
In this blog, we’ll review each of these topics with guidance on where data leaders and their teams need to focus to build a strong and lasting data quality strategy.
Current Quality vs. Ideal Quality: Striking the Right Balance
The struggle between current quality and ideal quality often comes down to setting a threshold of desired quality. In traditional data systems, quality was often assessed in a silo, but today, businesses need to think about data quality in the context of its broader usage in achieving business outcomes. What’s the quality score threshold required to meet business needs?
Ultimately, it is necessary that the quality of data is adequate to support the correctness of decisions in the context of business goals. While pushing to achieve higher quality is important, it’s critical to balance quality with business goals, as perfection is not always necessary if the data serves its purpose.
The Risks of Over-Cleaning Data
While cleaning data is necessary, there’s a risk of over-cleaning, especially when the cleaning process removes important details. A great example of this is middle initials in names. If you clean this data too aggressively, you might lose valuable information, potentially leading to bias in the data. Furthermore, customer data might be incorrectly excluded if there’s a mismatch in the golden record, causing critical information, like address changes, to be missed.
In some cases, too much cleaning could unintentionally eliminate valid records that would have been useful. It’s important to remember that data quality should not just be about removing “bad” data but also about understanding which data is valuable to retain.
The Changing Nature of Data
Over the past decade, the landscape of data has drastically changed. The concept of a golden record—a single source of truth—has become more complex. With the rise of social data and real-time interactions, organizations now need to be more flexible in how they collect and use data.
When organizations look back at their data from 10 years ago, they must acknowledge that it may no longer be as relevant. The world has changed, and so has the data we use to make decisions. The need for more dynamic and up-to-date data has become more critical.
Data as an Asset and Its Cost
Data is often referred to as the new oil, but it comes with significant challenges. Organizations must grapple with the balance between how much data they collect, the regulatory limitations surrounding it, the cost of storing and cleaning it, and whether it will ultimately be useful. Moreover, when models are trained using data from one region, they may not translate effectively to another. For instance, a model trained on US data may not perform well with EMEA data due to cultural and regulatory differences.
Creating the Conditions for Consistent Data Quality
These challenges – how to define quality, thresholds for cleaning, data’s changing nature and the cost of cleaning data for different purposes – are only going to increase in complexity as we go forward.
No organization can meet a 100% quality threshold – doing so is overly cost prohibitive and would grind operations to a halt. Data leaders need to create a consistent policy approach and have clear guidelines on what quality means based on use case and role.
Data leaders also need to consider how to leverage AI and machine learning to automate many of the processes that inform data quality – classification, scoring, and sensitivity. Solutions that enable the automation of data quality processes can do the heavy lifting while containing costs, enabling the organization to scale its ability to deploy a consistent data quality framework across the business.
In our next blog on data quality, we’ll explore what data quality means in the age of GenAI and Agentic AI.
Author
View All Articles
Featured
Simplifying Complex Data Workloads for Core Operations and...
Creating Data Operational Excellence: Combining Services + Technology...
Top Authors
Christopher Keller
Maggie Laird
Joshua Wick
Steve Donovan
Rishu Shrivastava
Categories
Data quality is a crucial aspect of any organization’s operations, and its impact is growing as artificial intelligence (AI) and machine learning (ML) continue to evolve. However, determining what qualifies as "good enough" data can be a challenge.
Learn More
Data Quality Series Part 1: Discover how strong data quality fundamentals drive AI and GenAI success by ensuring accuracy, completeness, and consistency through end-to-end data management.
Grupo EULEN uses the Pentaho+ Platform to boost agility, streamline data workflows, track metrics, and drive faster, smarter decisions.
Mid-tier banks face unique challenges in data modernization, governance, and compliance due to budget and resource constraints, requiring tailored strategies to meet growing regulatory and AI demands.
Considering evolving regulations, data quality will always remain at the core of BFSI resilience and competitive advantage. BFSI organizations that invest in data quality will be able to join the world’s standards, stay on-side, and scale.