Data quality

What data quality is and why it matters

The term “Data quality” refers to the extent to which a given dataset serves its purpose. As a result, “high quality data” is data that represents real-world scenarios in a consistently accurate way.

Data quality is a key component of the Data management process, and more specifically of the Data governance process. Why? Because inconsistent – and therefore misleading – data results in inaccurate Analytics, which in turn results in poor decision-making. 

This can cause significant economic damage for the company, damage the trust that stakeholders used to put in data, and weaken the organization’s overall Data culture. A study conducted in 2021 by the Gartner consulting firm indicated that poor data quality can represent a significant cost for organizations: $12.9 millions per year on average!

This is why inaccurate data must be immediately identified, documented and fixed: only then will business executives, data analysts and other end users be allowed to work with valuable information. 

Data quality isn't data integrity

Data quality differs from Data integrity, which is a broader concept that refers to the combination of data governance and data protection mechanisms. Data integrity is the overall accuracy, consistency and completeness of a given dataset, but also its security and safety regarding GDPR compliance (and other regulations).

Assessing Data quality

Several factors need to be taken into account if one wants to produce reliable and trustworthy datasets:

  • completeness: do datasets contain all the elements they are supposed to contain?
  • consistency: are there conflicts between equivalent data values in distinct datasets or systems?
  • uniqueness: are there duplicate data records?
  • timeliness and currency: is the data up-to-date, and is it available when needed?
  • validity: does the data contain the values it is supposed to contain, and does it have a proper structure? 
  • conformity: does the data match the standard data formats designed by the organization?

Data quality is measured with numerous metrics, among which:

  • the number of data errors that have been identified within a month or quarter;
  • the accuracy rate in datasets;
  • the degree of data completeness

Improving Data quality

Data errors – and many other data quality issues – are identified and fixed by Data quality managers, analysts and engineers. 

Data quality management is a set of practices whose goal is to maintain and protect the data’s quality. These include: 

  • setting data quality rules, metrics and improvement goals;
  • designing and implementing data quality improvement methods and processes;
  • using data quality software tools to fix data errors, enhance datasets and perform data cleansing operations (adding missing values, deleting duplicates, validating new data, etc.);
  • conducting regular assessments to monitor your data quality.
Husprey Logo

Learn more about Husprey

Husprey is a powerful, yet simple, platform that provides tools for Data Analysts to create SQL notebooks effortlessly, collaborate with their team and share their analyses with anyone.