Appl Clin Inform 2024; 15(03): 620-628
DOI: 10.1055/a-2291-1391
Research Article

Manual Evaluation of Record Linkage Algorithm Performance in Four Real-World Datasets

Agrayan K. Gupta
1   Indiana University School of Medicine, Indianapolis, Indiana, United States
,
Huiping Xu
2   Department of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, United States
,
Xiaochun Li
2   Department of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, United States
,
Joshua R. Vest
3   Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, United States
4   Department of Health Policy and Management, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, Indiana, United States
,
Shaun J. Grannis
1   Indiana University School of Medicine, Indianapolis, Indiana, United States
3   Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, United States
› Author Affiliations
Funding This work was supported by the Patient-Centered Outcomes Research Institute grant number ME-2017C1-6425.

Abstract

Objectives Patient data are fragmented across multiple repositories, yielding suboptimal and costly care. Record linkage algorithms are widely accepted solutions for improving completeness of patient records. However, studies often fail to fully describe their linkage techniques. Further, while many frameworks evaluate record linkage methods, few focus on producing gold standard datasets. This highlights a need to assess these frameworks and their real-world performance. We use real-world datasets and expand upon previous frameworks to evaluate a consistent approach to the manual review of gold standard datasets and measure its impact on algorithm performance.

Methods We applied the framework, which includes elements for data description, reviewer training and adjudication, and software and reviewer descriptions, to four datasets. Record pairs were formed and between 15,000 and 16,500 records were randomly sampled from these pairs. After training, two reviewers determined match status for each record pair. If reviewers disagreed, a third reviewer was used for final adjudication.

Results Between the four datasets, the percent discordant rate ranged from 1.8 to 13.6%. While reviewers' discordance rate typically ranged between 1 and 5%, one exhibited a 59% discordance rate, showing the importance of the third reviewer. The original analysis was compared with three sensitivity analyses. The original analysis most often exhibited the highest predictive values compared with the sensitivity analyses.

Conclusion Reviewers vary in their assessment of a gold standard, which can lead to variances in estimates for matching performance. Our analysis demonstrates how a multireviewer process can be applied to create gold standards, identify reviewer discrepancies, and evaluate algorithm performance.

Protection of Human and Animal Subjects

No animals were used and all human involved were employees of the Indiana University.


Data Availability Statement

The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data are not available.


Authors' Contributions

S.J.G. and J.R.V. contributed to the conception, design, acquisition, and analysis for the work. H.X. and X.L. performed analysis and contributed to design. A.K.G. drafted the initial manuscript and contributed to analysis.


Supplementary Material



Publication History

Received: 26 September 2023

Accepted: 18 March 2024

Accepted Manuscript online:
20 March 2024

Article published online:
31 July 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany