Data Quality Series Part 1: Discover how strong data quality fundamentals drive AI and GenAI success by ensuring accuracy, completeness, and consistency through end-to-end data management.
Per the Oxford English Dictionary, quality is defined as “the standard of something as measured against other things of a similar kind; the degree of excellence of something.”
Data quality is both a quantitative and qualitative measure of its excellence. Together, they provide real insight into the value of data. Quantitative measures, typically driven by statistical insights, are easier to measure, can be interpreted readily, and provide a level of clarity on the suitability of data.
Qualitative measures, when applied to data or information, typically are information that is subjective and open to interpretation. I like to consider qualitative as ‘in context of’ or ‘in reference to’ when applied to data quality.
When breaking down data quality, the most common framework is quality dimensions. Quality dimensions mix quantitative and qualitative evaluation models that can be measured in isolation but are most useful and powerful when they are brought together. Consider completeness, uniqueness, and consistency as a starting point for quantitative dimensions.
All of these lack external references so by themselves do not inform the appropriateness of data for a given use. This is where additional qualitative insights are needed, including accuracy, timeliness, and correctness (or validity). Timeliness provides details on data’s age. Correctness ensures that, for instance, a phone number provided for an individual in the US is indeed a valid US phone number with 10 digits. Continuing with this example, accuracy determines if the phone number given for an individual is their actual phone number. These are crucial elements that inform policy design and application that feed data quality scores.
It becomes very clear very quickly that without context, data quality efforts will fall far short of what organizations need, not only for core operations but also for AI and GenAI. This context, in many cases, relates to unstructured data, so crucial for AI and GenAI, and which we know most organizations struggle to organize, classify, analyze, and activate.
The potential gaps in this one small example are writ large when you consider a mid or large enterprise with hundreds of thousands of customer records. This is why hospitals, banks, or commercial enterprises of any size struggles with data quality when not using an end-to-end approach that leverages automation to apply policies, lineage, traceability, and quality across its data estate.
Pentaho considers and accounts for all of the above in our platform. It’s why we’re so focused on the relationships between data, the importance of accurately classifying data at the source, and the importance of carrying metadata properties throughout the lifespan of data.
In the next blog post, we’ll explore how these fundamentals impact the considerations teams must allow for to have a strong and scalable data quality strategy, how data quality is shifting in an AI world, and what data quality means when getting ‘data fit’ for an AI world.
Author
View All Articles
Featured
Simplifying Complex Data Workloads for Core Operations and...
Creating Data Operational Excellence: Combining Services + Technology...
Top Authors
Duane Rocke
Sobhan Hota
Christopher Keller
Maggie Laird
Joshua Wick
Categories
The EU AI Act is reshaping banking. See how Pentaho simplifies AI compliance and governance to help banks lead with trust and ethical innovation.
Learn More
Global privacy issues are becoming more complex by the day. Organizations can’t afford to be in the dark regarding the unique, multidimensional, and nuanced characteristics of existing and emerging regulations.
Mid-sized banks face a unique challenge in how to improve their Information and Communication Technology (ICT) risk management programs to meet the Digital Operational Resilience Act (DORA) requirements for resiliency against evolving digital threats.
Swisscom's Business Customers division searched for a unified platform for data integration and validation to achieve a 360-degree view of its operations. Pentaho Data Integration (PDI) was chosen for its comprehensive feature set, ease of use, and cost-effectiveness.
Yes, AI Was the Theme. But Underneath, It’s Clear We’re in A New Era of Data Management.