Definition
The term “Data quality” refers to the extent to which a given dataset serves its purpose. As a result, “high quality data” is data that represents real-world scenarios in a consistently accurate way.
Data quality is a key component of the Data management process, and more specifically of the Data governance process. Why? Because inconsistent – and therefore misleading – data results in inaccurate Analytics, which in turn results in poor decision-making.
This can cause significant economic damage for the company, damage the trust that stakeholders used to put in data, and weaken the organization’s overall Data culture. A study conducted in 2021 by the Gartner consulting firm indicated that poor data quality can represent a significant cost for organizations: $12.9 millions per year on average!
This is why inaccurate data must be immediately identified, documented and fixed: only then will business executives, data analysts and other end users be allowed to work with valuable information.
Data quality differs from Data integrity, which is a broader concept that refers to the combination of data governance and data protection mechanisms. Data integrity is the overall accuracy, consistency and completeness of a given dataset, but also its security and safety regarding GDPR compliance (and other regulations).
Several factors need to be taken into account if one wants to produce reliable and trustworthy datasets:
Data quality is measured with numerous metrics, among which:
Data errors – and many other data quality issues – are identified and fixed by Data quality managers, analysts and engineers.
Data quality management is a set of practices whose goal is to maintain and protect the data’s quality. These include: