Methods Inf Med 2015; 54(01): 83-92
DOI: 10.3414/ME14-01-0046
Original Articles
Schattauer GmbH

Frequency Analysis of Medical Concepts in Clinical Trials and their Coverage in MeSH and SNOMED-CT[*]

J. Varghese
1   Institute of Medical Informatics, University of Muenster, Muenster, Germany
,
M. Dugas
1   Institute of Medical Informatics, University of Muenster, Muenster, Germany
› Author Affiliations
Further Information

Publication History

received: 22 April 2014

accepted: 05 October 2014

Publication Date:
22 January 2018 (online)

Summary

Background: Eligibility criteria (EC) of clinical trials play a key role in selecting appropriate study candidates and the validity of the outcome of a clinical trial. However, in most cases EC are provided in unstandardised ways such as free text, which raises significant challenges for machine-readability.

Objectives: To establish a list of most frequent medical concepts in clinical trials with semantic annotations. This concept list contributes to standardisation of EC and identifies relevant data items in electronic health records (EHRs) for clinical research. The coverage of the list in two major clinical vocabularies, MeSH and SNOMED-CT, will be assessed.

Methods: Four hundred and twenty-fivec linical trials conducted between 2000 and 2011 at a German university hospital were analysed. 6671 EC were manually annotated by a medical coder using Concept Unique Identifiers (CUIs) provided by the Unified Medical Language System. Two physicians performed a semi-automatic CUI code revision. Concept frequency was analysed and clusters of concepts were manually identified.A binomial significance test was applied to quantify coverage differences of the most frequent concepts in MeSH and SNOMED-CT.

Results: Based on manual medical coding of 425 clinical trials, 7588 concepts were identified, of which 5236 were distinct. A top 100 list containing 101 most frequent medical concepts was established. The concepts of this list cover 25 % of all concept occur-rences in all analysed clinical trials. This list reveals six missing entries in SNOMED-CT, 12 in MeSH. The median of EC frequency per trial has increased throughout the trial years (2000 –2005: 8 EC/trial, 2011: 14 EC/ trial).

Conclusions: Relatively few concepts cover one quarter of concept occurrences that represent EC in recent studies. Therefore, these concepts can serve as candidate data elements for integration into EHRs to optimise patient recruitment in clinical research.

* Supplementary material published on our web-site www.methods-online.com