Methods Inf Med 2017; 56(03): 230-237
DOI: 10.3414/ME16-01-0073
Paper
Schattauer GmbH

Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying

Stefan Kropf
1   Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Leipzig, Germany
2   Institute for Medical Informatics, Statistics and Epidemiology (IMISE), Leipzig University, Leipzig, Germany
,
Peter Krücken
3   Institute of Pathology, Leipzig University, Leipzig, Germany
,
Wolf Mueller
4   Department of Neuropathology, Leipzig University, Leipzig, Germany
,
Kerstin Denecke
5   Institute for Medical Informatics, Bern University of Applied Sciences, Bern, Switzerland
› Author Affiliations
Funding The paper mainly is a result of the Digital Patient Model Project (ICCAS), granted by BMBF (03Z1LN11).
Further Information

Publication History

received: 15 June 2016

accepted in revised form: 10 January 2017

Publication Date:
24 January 2018 (online)

Summary

Background: Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results.

Objectives: We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi- automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse.

Methods: Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML.

Results: Pathology reports (PRs) can be reliably structured into sections by a keyword- based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries.

Conclusions: Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.

 
  • References

  • 1 Walsh SH. The clinician’s perspective on electronic health records and how they can affect patient care. BMJ. 2004; 328 7449 1184-87.
  • 2 McDonald CJ. The barriers to electronic medical record systems and how to overcome them. J Am Med Inform Assoc. 1997; 4 (03) 213-221.
  • 3 Holzinger A, Geierhofer R, Errath M. Semantische Informationsextraktion in medizinischen Informationssystemen. Informatik-Spektrum. 2007; 30 (02) 69-78. German.
  • 4 Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JE. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 35: 128-44.
  • 5 Friedman C, Rindflesch TC, Corn M. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the national library of medicine. J Biomed Inform. 2013; 46 (05) 765-773.
  • 6 Taira RK, Soderland SG, Jakobovits RM. Automatic structuring of radiology free-text reports. Radiographics. 2001; 21 (01) 237-245.
  • 7 Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co- morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006; 6 (01) 30.
  • 8 Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. Medex: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010; 17 (01) 19-24.
  • 9 Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing of pathology reports. Stud Health Technol Inform. 2004; 107 (Pt 1): 565-572.
  • 10 Manning CD, Raghavan P, Schütze H. Scoring, term weighting and the vector space model. Introduction to Information Retrieval. 2008; 100: 2-4.
  • 11 Wollersheim D, Sari A, Rahayu W. Archetype- based electronic health records: a literature review and evaluation of their applicability to health data interoperability and access. Health Inf Manag J. 2009; 38 (02) 7-17.
  • 12 Gossen W, Goossen-Baremans A, van der Zel M. Detailed clinical models: a review. Healthcare Informatics Research. 2010; 16 (04) 201-214.
  • 13 Duftschmid G, Rinner C, Kohler M, Huebner-Bloder G, Saboor S, Ammenwerth E. The EHRARCHE project: Satisfying clinical information needs in a Shared Electronic Health Record System based on IHE XDS and Archetypes. Int J Med Inform. 2013; 82 (12) 1195-1207.
  • 14 Hanauer DA, Mei Q, Law J, Khanna R, Zheng K. Supporting information retrieval from electronic health records: A report of University of Michigan’s nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). J Biomed Inform. 2015; 55: 290-300.
  • 15 Natarajan K, Stein D, Jain S. et al. An analysis of clinical queries in an electronic health record search utility. Int J Med Inform. 2010; 79 (07) 515-522.
  • 16 Sobin LH, Gospodarowicz MK, Wittekind C. TNM classification of malignant tumours.. Chichester: John Wiley & Sons; 2011
  • 17 The open source HL7 API for java [cited 2016 March 1]. Available from: http://hl7api.source-forge.net/.
  • 18 Chalopin C, Lindner D, Kropf S, Denecke K. Archetype based patient data modeling to support treatment of pituitary adenomas. Stud Health Technol Inform. 2015; 216: 178-182.
  • 19 Kropf S, Chalopin C, Denecke K. Template and model driven development of standardized electronic health records. Stud Health Technol Inform. 2015; 216: 30-34.
  • 20 Garde S, Knaup P, Hovenga EJS, Heard S. Towards Semantic Interoperability for Electronic Health Records–Domain Knowledge Governance for openEHR Archetypes. Methods Inf Med. 2007; 46 (03) 332-343.
  • 21 XQuery 3.0: An XML Query Language, 2014 [cited 2016 July 11]. W3C Recommendation. Available from: https://www.w3.org/TR/xquery-30/.
  • 22 Wang X, Chase H, Markatou M, Hripcsak G, Friedman C. Selecting information in electronic health records for knowledge acquisition. J Biomed Inform. 2010; 43 (04) 595-601.
  • 23 Garla V, Re VL, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Brandt C. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc. 2011; 18 (05) 614-620.
  • 24 General Architecture for Text Engeneering [cited 2016 Nov 09]. Available from: http://gate.ac.uk/.
  • 25 Apache UIMA [cited 2016 Nov 09]. Available from: https://uima.apache.org/ruta.html.
  • 26 Dai HJ, Syed-Abdul S, Chen CW, Wu CC. Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields. Biomed Res Int. 2015; 2015: 873012.
  • 27 Available from: https://github.com/openEHR/CKM-mirror/blob/master/local/archetypes/entry/observation/openEHR-EHR-OBSERVATION.lab_test-histopathology.v1.adl [cited 2016 Nov 07].
  • 28 Available from: https://github.com/crs4/openEHR-v1.4/blob/master/archetypes/openEHR-EHROBSERVATION.laboratory_test-histopathology.v0.adls [cited 2016 Nov 07].
  • 29 Ma C, Frankel H, Beale T, Heard S. EHR query language (EQL) - A query language for archetype- based health records. Stud Health Technol Inform. 2007; 129 (Pt 1): 397-401.
  • 30 Available from: http://www.openehr.org/releases/QUERY/latest/docs/AQL/AQL.html [cited 2016 Nov 07].
  • 31 Kropf S, Uciteli A, Krücken P, Denecke K, Herre H. Querying standardized EHRs by a Search Ontology XML extension (SOX). Ontologies and Data in Live Sciences. Proceedings of the 7th Workshop of the GI Workgroup OBML. 2016
  • 32 Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998; 280 (15) 1347-1352.
  • 33 exist-db documentation - lucene index module [cited 2016 June 01]. Available from: http://exist-db.org/exist/apps/doc/lucene.xml.