- Abstract
- Introduction
- Data Quality
- Data Quality Frameworks
- Framework Development for the New Zealand Ministry of Health
- Aim of the Research
Abstract
This research in progress aims to develop a data quality strategy for the New Zealand Ministry of Health. Data quality requirements are increasing as a wider range of data becomes available and the technology to mine the data shows the value of data that is "fit for use". A data quality framework is the initial step towards an organisation-wide data quality strategy that aligns with the sector’s existing strategies and policies. The framework development builds on an existing framework developed by the Canadian Institute for Health Information, and takes into account current data quality literature and recognised Total Quality Management Principles. Further piloting of the framework through the assessment of national health collections will provide the information on which to base an effective data quality strategy for the Ministry of Health.
Introduction
Organisations are becoming more and more dependent on data. Virtually everything the modern organisation does both depends upon and creates enormous quantities of data. To meet the needs of the organisation, a comprehensive data management programme is essential.[ 1 ] Further, Levitin and Redman [ 2 ] see the need for management science for data, as data are different from other resources and require different management techniques.
The organisational structures needed to address these issues often do not exist. A contributing factor may be that there is no simple way to place economic value on data resources. A comprehensive, organisation-wide, data quality strategy could address many of these issues. As with most organisations, no formal data quality strategy is yet documented for the New Zealand Ministry of Health. The proposed Data Quality Framework will inform the overall development of a data quality strategy for the Ministry of Health. All outcomes of the development of the strategy will be documented for each collection in a "Data Quality Documentation Folder" to enable internal data users access to all information about a collection in one location. Chapter headings in the Data Quality Documentation Folder include:
- Outline of collection, including uses of data and original purpose for the collection
- Roles and responsibilities of Ministry staff managing collections
- Results of assessment of collection using data quality framework
- Action plan for quality improvement
- Current and previous data quality initiatives
- Meta data in relation to collection
- Correspondence relating to collection.
The documentation folder will be both paper- and web-based. This will build on the Ministry’s existing work on data quality, pulling together documentation and current practice and making this information immediately available in one location. To determine an appropriate framework, it is necessary to consider the meaning of data quality and previous work in this field as outlined below.
Data Quality
Klein and Rossin[ 3 ] note there is no single definition of data quality accepted by researchers working in the discipline. Data quality is a multidimensional concept[ 3 ] as data themselves are multidimensional.[ 4,5 ] Modern definitions of data quality have a wider frame of reference and many more attributes than the obvious characteristics of accuracy. Strong et al[ 6 ] take a consumer (people or groups who have experience in using organisational data to make business decisions) focused view that quality "data is data that is fit for use", and this view is widely adopted by the literature.[ 6-9 ] Redman comes to the following definition based on Joseph Juran:[ 5 ] "Data are of high quality if they are fit for their intended uses in operations, decision-making, and planning. Data are fit for use if they are free of defects and possess desired features."[ 10 ]
Tayi and Ballou[ 11 ] concur, noting that data considered appropriate for one use may not possess sufficient quality for another use, citing the trend towards multiple uses of data through data warehouses.
Wang et al[ 7 ] see the consumer as integral to the meaning of data quality and consider that consumers have a much broader quality conceptualisation than information systems professionals realise. They analysed what data quality means to consumers through a survey of data quality attributes and their dimensions as defined by consumers. This resulted in a comprehensive framework of data quality from data consumers’ perspectives built into a hierarchical framework of data quality.[ 7 ]
Therefore, a data quality strategy in an organisation must take into account the end user and allow that user to define the level of quality required to make the data useful. Requirements may be different for corporate data than for local data. First steps in any improvement process must be to identify the uses made of the data and by whom. A data quality strategy also needs to look forward to the future potential uses of the data.
Larry English notes the emerging discipline of "Enterprise Data Quality Management" (EDQM) whereby an organisation develops and adopts a set of consistent technology processes, which institutionalise data quality as a strategic asset, and business processes to make it a consistent competitive advantage.[ 12 ] This is similar to Total Quality Management (TQM) principles.[ 12 ] Data (or information) quality is recognised as one aspect of the TQM movement.
3. Data Quality Frameworks
Seminal works in the area of quality have defined various extensive frameworks to review systems within organisations. The frameworks all seek to assess areas where poor quality workmanship or inefficiencies may reduce the profitability of an organisation. At its most basic, a data quality framework is a tool for the assessment of data quality within an organisation.[ 7] The framework can go beyond the individual elements of data quality assessment, becoming integrated within the processes of the organisation. Willshire and Meyen[ 13 ] describe a data quality framework as "a vehicle that an organisation can use to define a model of its data environment, identify relevant data quality attributes, analyse data quality attributes in their current or future context, and provide guidance for data quality improvement". Eppler and Wittig[ 14 ] add that a framework should not only evaluate, but also provide a scheme to analyse and solve data quality problems with their proactive management.
In developing a data quality framework for the International Monetary Fund, Carson[ 16 ] notes that an assessment tool for data quality needs to be:
- Comprehensive in coverage of the dimensions of quality and the characteristics that might represent quality.
- Balanced between rigour desired by an expert and the bird’s eye view desired by a general data user.
- Structured but flexible enough to be applicable across a broad range of data collections.
- Able to lead to transparent results.
- Arrived at by drawing on best practice.
Both Willshire and Meyen[ 13 ] and Carson[ 16 ] found that a framework needs both objective and subjective attributes to be considered, using both objective and subjective measurement metrics, in order to reflect the contextual nature of data quality and the many potential users of that data.
Further steps to development include defining data quality attributes, the collection, measurement and analysis of the attributes. An evaluation should use at least two different methods to adequately determine the quality of data using both quantitative and qualitative measurements. The final step is to identify, evaluate and select effective remedies. Use of the framework is a step-by-step process. Steps include:
- Modelling existing data
- Defining data quality attributes
- Evaluating existing data quality levels
- Determining data quality priorities
- Identifying remedies
- Re-measuring and iterating.
An extensive review of data quality frameworks from the last 10 years[ 14 ] analyses seven conceptual frameworks, identifying common elements, differences and missing components of such frameworks, and outlines future direction in the development of data quality frameworks. The study found that existing data quality frameworks are often domain specific and either strong in objective or subjective measurements, but thus far no framework has been developed that is strong in both measurements at the same time. Frameworks also often fail to analyse the interdependencies between the various criteria within the framework.
Therefore, Eppler and Wittig[ 14 ] suggest the following should be included in any new framework development:
- A generic framework, not specific to a single application such as data warehouses or corporate communications
- A framework that shows interdependencies between the different quality criteria
- A framework that includes a list of problem areas and indicators, therefore going beyond a simple quality criteria list
- A framework that is at the same time theoretical and practical.
To assess the quality of organisational data, Wang and Strong,[ 7 ] defined a data quality framework that contained 20 quality dimensions. These were later reduced to 15 and assembled into four categories, as shown in table 1.
Table 1: Data quality categories and dimensions (Strong et al, 1997, p104)
| Category | Dimension |
| Intrinsic | Accuracy Objectivity Believability Reputation |
| Accessibility | Accessibility Access security |
| Contextual | Relevancy Value-added Timeliness Completeness Amount of data |
| Representational | Interoperability Ease of understanding Concise representation Consistent representation |
The framework has been validated across a range of industrial and government locations. Wang and Strong[ 7 ] suggest several ways in which this hierarchical framework can be applied, including the use of a questionnaire to measure perceptions of data quality, development of quality improvement methods to improve data quality and as a checklist during data requirements analysis. The appropriateness of this framework was tested in a study of the US healthcare industry, in which respondents confirmed the importance of all 15 dimensions.[ 17 ] Included in their assessment of frameworks is the Wang and Strong[ 7 ] framework noted above.
This framework was assessed using the following criteria - are there clear definitions, is there contextual positioning, mutually exclusive and collectively exhaustive criteria, is the framework concise, does it provide examples and tools? Table 3 shows the conclusions made by Eppler and Wittig[ 14 ] when assessing the Wang and Strong[ 7 ] framework, finding that overall, the framework is generic and balanced. The Wang and Strong[ 7 ] framework was the only one out the seven assessed that included a means of measurement and tools to assist with using the framework and offers both a solid foundation in existing literature and practical applications. The framework also stands out as being the only one to strike a balance between theoretical consistency and practical applicability.
Taking into account the above literature, a data quality framework for the Ministry of Health can be defined as:
A point in time assessment and measurement tool, integrated into organisational process, providing a benchmark for the effectiveness of any future data quality improvement initiatives and a standardised template for information on data quality both for internal and external users. The framework takes into account the Ministry’s Data Governance Policy, Information Systems Strategic Plan, and WAVE.
Framework Development for the New Zealand Ministry of Health
New Zealand’s public and private health care providers forward data on health care provision and claims-based data to two business units within the Ministry of Health, (Health Payments, Agreements and Compliance (HealthPAC) and the New Zealand Health Information Service (NZHIS)). Further data are collected through the national cervical screening and breast screening registers. There are currently 30 national health information systems including data warehouses. The Public Health Intelligence Unit undertakes both regular and one-off surveys, related to public health topics.
Health Payments, Agreements and Compliance (HealthPAC) services include:
- Payment and administration of agreements
- Payments to health providers for contracted services
- Clinical data collection from health provider claims
- Provision of information and reports relating to payment and other health data
- Audit and counter-fraud methodologies to ensure that health funds are applied legitimately and appropriately
- Patient eligibility administration.
Inconsistencies often exist across an organisation’s collections and business units in relation to data quality improvement initiatives, business rules, coding standards and tables and data definitions. Data moving from one collection to another in particular can provide considerable interoperability problems. 
Uses of the Framework
The primary use of the framework is to assist in the assessment of cross-organisational data quality. Where consistencies in data quality problems are found, improvement measures that provide the best return on investment are more likely to occur where improvements impact across all or most of the national collections. Data quality problems will be highlighted through the output report following assessment of a collection using the framework. Analysis of these reports from all the national collections will provide a prioritised list of "to do" projects.
A consistent assessment of quality over time will allow for the analysis of the effectiveness of data quality interventions, with assessment undertaken pre and post the intervention.
The framework will also be used to inform new data collections on what processes need to be instilled in the development phase that will contribute to the collection of high quality data. Prevention is better than re-work.
Aim of the Research
The aim of the New Zealand Data Quality Evaluation Framework (DQF) project is to deliver:
A Data Quality Framework that allows for the consistent and accurate assessment of data quality in all national health data collections held by the Ministry of Health, which will enable improved decision making and policy development in the health sector.
The framework is being developed to provide a common, objective approach to assessing the data quality of all health information databases and registries. The framework enables the identification and measurement of major data quality issues, standardises information on data quality for users and helps to identify priorities, which in turn leads to continuous improvements.
Click here for Part 2 of this paper. 
| >References | |
| 1. | Pautke RW, Redman TC. 2001 The organisation’s most important data issues. In Mario G. Piattini MG, Calero C and Genero MF (Editors). Information and Database Quality. Dordrecht, Kluwer. |
| 2. | Levitin AV. and Redman, TC. (1998) Data as a resource: properties, implications, and prescriptions. Sloan Management Review; 40 (1), 89-108. |
| 3. | Klein B, Rossin DF. (1999) Data errors in neural network and linear regression models: An experimental comparison. Data Quality; 5(1): 25. |
| 4. | Fox CJ, Levitin AV. (1996). Data and data quality. Encyclopaedia of Library and Information Science; 57(Supplement 20): 100-122. |
| 5. | Juran JM, Godfrey AB. (1999). Juran’s Quality Handbook. New York: McGraw-Hill. |
| 6. | Strong DM, Lee YW, Wang RY. (1997) Data quality in context. Communications of the ACM; 40(5):103-110. |
| 7. | Wang RY, Strong DM. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12 (4): 5-34. |
| 8. | US Department of Defence. DOD guidelines on data quality management. US: Defence Information Systems Agency; 2003:28. |
| 9. | Brown S. (2002). Data quality: Relatively critical and critically relative. Data Management Review. November. |
| 10. | Redman TC. (2001) Data quality: the field guide. Boston: Digital Press: 241. |
| 11. | Tayi GK, Ballou DP (1998). Examining data quality. Communications of the ACM; 41(2):54-57. |
| 12. | English L. (2000). Achieving enterprise wide data quality. Trillium Software System; 18. |
| 13. | Willshire MJ, Meyen D. (1997) A process for improving data quality. Data Quality; 3(1). |
| 14. | Eppler MJ, Wittig D. (2000) A review of information quality frameworks from the last ten years. In: Proceedings of IQ 2000 The Conference on Information Quality at: Massachusetts Institute of Technology (MIT), Boston, USA. October 20-22. |
| 15. | Porter ME. (1991) Towards a dynamic theory of strategy. Strategic Management Journal; 12: 954. |
| 16. | Carson CS. (2001). Toward a framework for assessing data quality. International Monetary Fund; 56. |
| 17. | Gendron MS, D’Onofrio MJ. (2001). Data Quality in the healthcare industry. Data Quality; 7:(1). |









.jpg)











