Subscribe to RSS
DOI: 10.1055/a-2291-1391
Manual Evaluation of Record Linkage Algorithm Performance in Four Real-World Datasets
Funding This work was supported by the Patient-Centered Outcomes Research Institute grant number ME-2017C1-6425.
Abstract
Objectives Patient data are fragmented across multiple repositories, yielding suboptimal and costly care. Record linkage algorithms are widely accepted solutions for improving completeness of patient records. However, studies often fail to fully describe their linkage techniques. Further, while many frameworks evaluate record linkage methods, few focus on producing gold standard datasets. This highlights a need to assess these frameworks and their real-world performance. We use real-world datasets and expand upon previous frameworks to evaluate a consistent approach to the manual review of gold standard datasets and measure its impact on algorithm performance.
Methods We applied the framework, which includes elements for data description, reviewer training and adjudication, and software and reviewer descriptions, to four datasets. Record pairs were formed and between 15,000 and 16,500 records were randomly sampled from these pairs. After training, two reviewers determined match status for each record pair. If reviewers disagreed, a third reviewer was used for final adjudication.
Results Between the four datasets, the percent discordant rate ranged from 1.8 to 13.6%. While reviewers' discordance rate typically ranged between 1 and 5%, one exhibited a 59% discordance rate, showing the importance of the third reviewer. The original analysis was compared with three sensitivity analyses. The original analysis most often exhibited the highest predictive values compared with the sensitivity analyses.
Conclusion Reviewers vary in their assessment of a gold standard, which can lead to variances in estimates for matching performance. Our analysis demonstrates how a multireviewer process can be applied to create gold standards, identify reviewer discrepancies, and evaluate algorithm performance.
Protection of Human and Animal Subjects
No animals were used and all human involved were employees of the Indiana University.
Data Availability Statement
The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data are not available.
Authors' Contributions
S.J.G. and J.R.V. contributed to the conception, design, acquisition, and analysis for the work. H.X. and X.L. performed analysis and contributed to design. A.K.G. drafted the initial manuscript and contributed to analysis.
Publication History
Received: 26 September 2023
Accepted: 18 March 2024
Accepted Manuscript online:
20 March 2024
Article published online:
31 July 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Finnell JT, Overhage JM, Grannis S. All health care is not local: an evaluation of the distribution of emergency department care delivered in Indiana. AMIA Ann Symp Proc 2011; 2011: 409-416
- 2 Genevieve Morris GF, Scott A, Carol R. Patient identification and matching final report. Off Natl Coordinator Health Inform Technol Audacious Inquiry 2014 . Accessed May 31, 2024 at: https://www.healthit.gov/sites/default/files/resources/patient_identification_matching_final_report.pdf
- 3 Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med 2010; 2 (57) 57cm29
- 4 Just BH, Proffitt K. Do you know who's who in your EHR?. Healthc Financ Manage 2009; 63 (08) 68-73
- 5 College of Healthcare Information Management Executives. Summary of CHIME Survey on Patient Data-Matching. CHIME; 2012
- 6 Grinspan ZM, Abramson EL, Banerjee S, Kern LM, Kaushal R, Shapiro JS. Potential value of health information exchange for people with epilepsy: crossover patterns and missing clinical data. AMIA Annu Symp Proc 2013; 2013: 527-536
- 7 Kern LM, Grinspan Z, Shapiro JS, Kaushal R. Patients' use of multiple hospitals in a major US city: implications for population management. Popul Health Manag 2017; 20 (02) 99-102
- 8 HIMSS Applauds Senate in Removing Ban on Unique Patient Identifier from Labor-HHS Bill. 2021 https://www.himss.org/news/himss-applauds-senate-removing-ban-unique-patient-identifier-labor-hhs-bill
- 9 Hillestad R, Bigelow JH, Chaudhry B. et al. Identity crisis? Approaches to patient identification in a National Health Information Network. Santa Monica, CA: RAND Corporation; 2008
- 10 Bernstam EV, Applegate RJ, Yu A. et al. Real-world matching performance of deidentified record-linking tokens. Appl Clin Inform 2022; 13 (04) 865-873
- 11 Ross MK, Sanz J, Tep B, Follett R, Soohoo SL, Bell DS. Accuracy of an electronic health record patient linkage module evaluated between neighboring academic health care centers. Appl Clin Inform 2020; 11 (05) 725-732
- 12 The Sequoia Project BCBS. Person matching for greater interoperability: a case study for payers. The Sequoia Project: 2020
- 13 Society Healthcare Information and Management Systems. EPIC: care everywhere. 2008 https://www.himss.org/resource-environmental-scan/care-everywhere
- 14 Gilbert R, Lafferty R, Hagger-Johnson G. et al. GUILD: GUidance for Information about Linking Data sets. J Public Health (Oxf) 2018; 40 (01) 191-198
- 15 Pratt NL, Mack CD, Meyer AM. et al. Data linkage in pharmacoepidemiology: a call for rigorous evaluation and reporting. Pharmacoepidemiol Drug Saf 2020; 29 (01) 9-17
- 16 Privacy and Security Solutions for Interoperable Health Information Exchange Perspectives on Patient Matching: Approaches, Findings, and Challenges. Accessed May 31, 2024 at: https://digital.ahrq.gov/sites/default/files/docs/page/privacy-and-security-solutions-for-interoperable-hie-nationwide-summary.pdf
- 17 Harron KL, Doidge JC, Knight HE. et al. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol 2017; 46 (05) 1699-1710
- 18 Grannis SJ, Overhage JM, McDonald CJ. Analysis of identifier performance using a deterministic linkage algorithm. Proc AMIA Symp 2002; 305-309
- 19 Joffe E, Byrne MJ, Reeder P. et al. A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation. J Am Med Inform Assoc 2014; 21 (01) 97-104
- 20 Campbell KM, Deck D, Krupski A. Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a ‘basic’ deterministic algorithm. Health Informatics J 2008; 14 (01) 5-15
- 21 Beil H, Preisser JS, Rozier RG. Accuracy of record linkage software in merging dental administrative data sets. J Public Health Dent 2013; 73 (02) 89-93
- 22 Antonie L, Inwood K, Lizotte DJ, Andrew Ross J. Tracking people over time in 19th century Canada for longitudinal analysis. Mach Learn 2014; 95 (01) 129-146
- 23 Gupta AK, Kasthurirathne SN, Xu H. et al. A framework for a consistent and reproducible evaluation of manual review for patient matching algorithms. J Am Med Inform Assoc 2022; 29 (12) 2105-2109
- 24 About Us. 2021 https://www.ihie.org/about-us/
- 25 McDonald CJ, Overhage JM, Barnes M. et al; INPC Management Committee. The Indiana Network for Patient Care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. Health Aff (Millwood) 2005; 24 (05) 1214-1220
- 26 Conway RBN, Armistead MG, Denney MJ, Smith GS. Validating the matching of patients in the linkage of a large hospital system's EHR with state and national death databases. Appl Clin Inform 2021; 12 (01) 82-89
- 27 A Comparison of Blocking Methods for Record Linkage. 2014. Cham: Springer International Publishing;
- 28 Ruppert LP, He J, Martin J. et al. Linkage of Indiana State Cancer Registry and Indiana Network for Patient Care data. J Registry Manag 2016; 43 (04) 174-178
- 29 University Information Technology Services. About Carbonate at Indiana University. 2021 https://kb.iu.edu/d/aolp
- 30 Requesting SSA's Death Information. 2022 https://www.ssa.gov/dataexchange/request_dmf.html
- 31 Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc 1969; 64 (328) 1183-1210
- 32 Studies in Success: Duplicate Records Compromise EHR Investment. Just Associates; 2015
- 33 Black Book Research. Improving Provider Interoperability Congruently Increasing Patient Record Error Rates, Black Book Survey. 2018 https://blackbookmarketresearch.newswire.com/news/improving-provider-interoperability-congruently-increasing-patient-20426295
- 34 Cummins MR, Ranade-Kharkar P, Johansen C. et al. Simple workflow changes enable effective patient identity matching in poison control. Appl Clin Inform 2018; 9 (03) 553-557
- 35 Guillet F, Hamilton HJ. Quality Measures in Data Mining. Springer; 2007
- 36 Bailey SR, Heintzman JD, Marino M. et al. Measuring preventive care delivery: comparing rates across three data sources. Am J Prev Med 2016; 51 (05) 752-761
- 37 Studdert DM, Mello MM, Gawande AA. et al. Claims, errors, and compensation payments in medical malpractice litigation. N Engl J Med 2006; 354 (19) 2024-2033