Yearb Med Inform 2011; 20(01): 112-120
DOI: 10.1055/s-0038-1638748
Working Group Contributions
Georg Thieme Verlag KG Stuttgart

Key Concepts to Assess the Readiness of Data for International Research: Data Quality, Lineage and Provenance, Extraction and Processing Errors, Traceability, and Curation

Contribution of the IMIA Primary Health Care Informatics Working Group
S. de Lusignan
1   IMIA Primary Healthcare Working Group Co-Chair, Primary Care and Clinical Informatics, University of Surrey, UK
,
S.-T. Liaw
2   General Practice, University of New South Wales, Australia
,
P. Krause
3   Software Engineering, University of Surrey
,
V. Curcin
4   Imperial College London
,
M. Tristan Vicente
5   St. George’s University of London
,
G. Michalakidis
6   Computing department, University of Surrey
,
L. Agreus
7   Center for Family and Community Medicine, Karolinska Institutet, Stockholm
,
P. Leysen
8   Faculty of Medicine, Dept. of Primary and Interdisciplinary Care, University of Antwerp
,
N. Shaw
9   ESRI Canada Health Informatics Research Chair / Scientific Director, Health Informatics Institute, Algoma University, Ontario, Canada
,
K. Mendis
10   IMIA Primary Healthcare Working Group Chair, University of Sydney, Australia
› Author Affiliations
Frank Sullivan and Mark McGilchrist for their comments on the manuscript; IMIA and EFMI for supporting their primary care informatics working groups. TRANSFoRm is supported by the European Commission DG INFSO (FP7 2477)
Further Information

Publication History

Publication Date:
06 March 2018 (online)

Zoom Image

Summary

Objective

To define the key concepts which inform whether a system for collecting, aggregating and processing routine clinical data for research is fit for purpose.

Methods

Literature review and shared experiential learning from research using routinely collected data. We excluded socio-cultural issues, and privacy and security issues as our focus was to explore linking clinical data.

Results

Six key concepts describe data: (1) Data quality: the core Overarching concept – Are these data fit for purpose? (2) Data provenance: defined as how data came to be; incorporating the concepts of lineage and pedigree. Mapping this process requires metadata. New variables derived during data analysis have their own provenance. (3) Data extraction errors and (4) Data processing errors, which are the responsibility of the investigator extracting the data but need quantifying. (5) Traceability: the capability to identify the origins of any data cell within the final analysis table essential for good governance, and almost impossible without a formal system of metadata; and (6) Curation: storing data and look-up tables in a way that allows future researchers to carry out further research or review earlier findings.

Conclusion

There are common distinct steps in processing data; the quality of any metadata may be predictive of the quality of the process. Outputs based on routine data should include a review of the process from data origin to curation and publish information about their data provenance and processing method.