Data Quality Series Part 1: Discover how strong data quality fundamentals drive AI and GenAI success by ensuring accuracy, completeness, and consistency through end-to-end data management.
Per the Oxford English Dictionary, quality is defined as “the standard of something as measured against other things of a similar kind; the degree of excellence of something.”
Data quality is both a quantitative and qualitative measure of its excellence. Together, they provide real insight into the value of data. Quantitative measures, typically driven by statistical insights, are easier to measure, can be interpreted readily, and provide a level of clarity on the suitability of data.
Qualitative measures, when applied to data or information, typically are information that is subjective and open to interpretation. I like to consider qualitative as ‘in context of’ or ‘in reference to’ when applied to data quality.
When breaking down data quality, the most common framework is quality dimensions. Quality dimensions mix quantitative and qualitative evaluation models that can be measured in isolation but are most useful and powerful when they are brought together. Consider completeness, uniqueness, and consistency as a starting point for quantitative dimensions.
All of these lack external references so by themselves do not inform the appropriateness of data for a given use. This is where additional qualitative insights are needed, including accuracy, timeliness, and correctness (or validity). Timeliness provides details on data’s age. Correctness ensures that, for instance, a phone number provided for an individual in the US is indeed a valid US phone number with 10 digits. Continuing with this example, accuracy determines if the phone number given for an individual is their actual phone number. These are crucial elements that inform policy design and application that feed data quality scores.
It becomes very clear very quickly that without context, data quality efforts will fall far short of what organizations need, not only for core operations but also for AI and GenAI. This context, in many cases, relates to unstructured data, so crucial for AI and GenAI, and which we know most organizations struggle to organize, classify, analyze, and activate.
The potential gaps in this one small example are writ large when you consider a mid or large enterprise with hundreds of thousands of customer records. This is why hospitals, banks, or commercial enterprises of any size struggles with data quality when not using an end-to-end approach that leverages automation to apply policies, lineage, traceability, and quality across its data estate.
Pentaho considers and accounts for all of the above in our platform. It’s why we’re so focused on the relationships between data, the importance of accurately classifying data at the source, and the importance of carrying metadata properties throughout the lifespan of data.
In the next blog post, we’ll explore how these fundamentals impact the considerations teams must allow for to have a strong and scalable data quality strategy, how data quality is shifting in an AI world, and what data quality means when getting ‘data fit’ for an AI world.
Author
View All Articles
Featured
Simplifying Complex Data Workloads for Core Operations and...
Creating Data Operational Excellence: Combining Services + Technology...
Top Authors
Sandeep Prakash
Jon Hanson
Richard Tyrrell
Duane Rocke
Christopher Keller
Categories
With data scientists spending up to 80% of their time on prep instead of analysis, organizations risk massive opportunity costs—making automation and trusted data access essential to maximizing ROI.
Learn More
A modern data marketplace transforms how enterprises scale AI by bridging producers and consumers with trusted, governed data products that deliver speed, quality, and confidence.
New insurance fraud schemes are outpacing outdated defenses, but data-driven approaches like real-time analytics and cross-industry intelligence can help insurers protect profits, stay compliant, and rebuild customer trust.
Data lineage has become essential for AI success, giving organizations the ability to trace data from source to decision, ensure compliance, improve quality, and build trust in every outcome.
Discover how data governance and quality evolved from COBOL systems to modern AI-driven platforms—and why they’re vital to building trusted data today.