Subscribe to RSS
DOI: 10.3414/ME13-02-0020
Toward a View-oriented Approach for Aligning RDF-based Biomedical Repositories
Publication History
received:
13 June 2013
accepted:
17 March 2014
Publication Date:
22 January 2018 (online)
Summary
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.
Background: The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique – known as element-to-element (e2e) mappings – is based on establishing 1:1 mappings between single primitive elements – e.g. concepts, attributes, relationships, etc. – belonging to the source and target schemas. However, due to the intrinsic nature of RDF – a representation language based on defining tuples < subject, predicate, object > –, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements – i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information.
Objectives: To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches.
Methods: Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs – that can be regarded as the equivalent of views in traditional databases –, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches.
Results: We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases – including different repositories of clinical trials and DICOM images – using the Health Data Ontology Trunk (HDOT) ontology as the target schema.
Conclusions: The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area – e.g. identification of disease biomarkers, or design of personalized therapies – heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-ofthe-art techniques.
-
References
- 1 Riddick G, Fine HA. Integration and analysis of genome-scale data from gliomas. Nat Rev Neurol 2011; 7 (08) 439-450.
- 2 Pfeifer B, Wurz M, Hanser F, Seger M, Netzer M, Osl M. et al. An Epidemiological Modeling and Data Integration Framework. Methods Inf Med 2010; 49 (03) 290-296.
- 3 Connor SC, Hansen MK, Corner A, Smith RF, Ryan TE. Integration of metabolomics and transcriptomics data to aid biomarker discovery in type 2 diabetes. Mol Biosyst 2010; 6 (05) 909-921.
- 4 Yi JM, Dhir M, Van Neste L, Downing SR, Jeschke J, Glöckner SC. et al. Genomic and Epigenomic Integration Identifies a Prognostic Signature in Colon Cancer. Clin Cancer Res 2011; 17 (06) 1535-1545.
- 5 Anguita A, Martin L, Perez-Rey D, Maojo V. A Review of Methods and Tools for Database Integration in Biomedicine. Current Bioinformatics 2010; 5 (04) 253-269.
- 6 Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform 2008; 41 (05) 687-693.
- 7 Schulz S, Boeker M, Stenzhorn H, Niggemann J. Granularity Issues in the Alignment of Upper Ontologies. Methods Inf Med 2009; 48 (02) 184-189.
- 8 Kawazoe Y, Ohe K. An Ontology-based Mediator of Clinical Information for Decision Support Systems. Methods Inf Med 2008; 47 (06) 549-559.
- 9 Pérez-Rey D, Maojo V, García-Remesal M, Alonso-Calvo R, Billhardt H, Martin-Sánchez F, Sousa A. ONTOFUSION: ontology-based integration of genomic and clinical databases. Comput Biol Med 2006; 36 7-8 712-730.
- 10 Goble C, Stevens R, Ng G, Bechhofer S, Paton N, Baker P, Peim M, Brass A. Transparent access to multiple bioinformatics information sources. IBM Syst J 2001; 40 (02) 532-551.
- 11 Mena E, Kashyap V, Sheth A, Illarramendi A. Observer: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distributed and Parallel Databases 2000; 8 (02) 223-271.
- 12 Martin L, Anguita A, Graf N, Tsiknakis M, Brochhausen M, Rüping S. et al. ACGT: advancing clinico-genomic trials on cancer - four years of experience. Stud Health Technol Inform 2011; 169: 734-738.
- 13 Brochhausen M, Spear AD, Cocos C, Weiler G, Martín L, Anguita A. et al. The ACGT Master Ontology and its applications - Towards an ontology-driven cancer research and management system. J Biomed Inform 2011; 44 (01) 8-25.
- 14 Graf N, Anguita A, Bucur A, Burke D, Claerhout B, Coveney P. et al. P-medicine: A solution for translational research?. Paed Blood Cancer 2012; 59 (06) 1101.
- 15 Anguita A, Escrich A, Maojo V. Fostering ontology alignment sharing: a general-purpose RDF mapping format. MedInfo. Copenhagen, Denmark: 2013
- 16 Sanfilippo EM, Schwarz U, Schneider L. The Health Data Ontology Trunk (HDOT). Towards an ontological representation of cancer-related knowledge. Proc. IARWISOCI2012. Athens: 2012
- 17 Flohr T, Schrauder A, Cazzaniga G, Panzer-Grümayer R, Van der Velden V, Fischer S. et al. Minimal residual disease-directed risk stratification using real-time quantitative PCR analysis of immunoglobulin and T-cell receptor gene rearrangements in the international multicenter trial AIEOP-BFM ALL 2000 for childhood acute lymphoblastic leukemia. Leukemia 2008; 22 (04) 771-782.
- 18 Aumueller D, Do HH, Massmann S, Rahm E. Schema and Ontology Matching with COMA++. Proc ACM SIGMOD Int Conf on Management of data. Baltimore, Maryland, USA: 2005: 906-908.
- 19 Lambrix P, Tan H. SAMBO - A System for Aligning and Merging Biomedical Ontologies. Web Semantics: Science, Services and Agents on the World Wide Web 2006; 4 (03) 196-206.
- 20 Ressler J, Dean M, Benson E, Dorner E, Morris C. Application of Ontology Translation. Lecture Notes in Computer Science 2007; 4825: 830-842.
- 21 Kolli R, Doshi P. OPTIMA: Tool for Ontology Alignment with application to Semantic Reconciliation of Sensor Metadata for Publication in SensorMap. IEEE Int Conf Sem Comp. 2008: 484-485.
- 22 Davies J, Grobelnik M, Mladenic D. OntoSTUDIO as a Ontology Engineering Environment. Semantic Knowledge Management. 2009: 54-57.
- 23 Cruz I, Palandri F, Stroe C. AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies. Proc of the VLDB Endowment 2009; 2 (02) 1586-1589.
- 24 Voyloshnikova E, Fu B, Grammel L, Storey M. BioMixer: visualizing Mappings of Biomedical Ontologies. ICBO; Montreal, Canada: 2012
- 25 Friedman M, Levy A, Millstein T. Navigational plans for data integration. Proc 6th Natl Conf Artif Intell Elev Innov Appl Artif Intell Conf Innov Appl Artif Intell. Orlando, Florida, USA: 1999: 67-73.
- 26 Ullman JD. Information integration using logical views. Proc of the Int Conf on Database Theory. Delphi, Greece: 1997: 19-40.
- 27 Levy AY, Rajaraman A, Ordille J. Querying heterogeneous information sources using source descriptions. Proceedings of the Twenty-second International Conference on Very Large Data Bases. Bombai, India: 1996: 251-262.
- 28 Calvanese D, Giacomo G De, Lenzerini M, Vardi MY. Query processing under GLAV mappings for relational and graph databases. Proc VLDB Endow; Riva del Garda. Trento: 2012: 61-72.
- 29 Knoblock CA, Szekely PA, Ambite JL, Goel A, Gupta S, Lerman K, Muslea M, Taheriyan M, Mallick P. Semi-automatically Mapping Structured Sources into the Semantic Web. Lecture Notes in Computer Science Volume 2012; 7295: 375-390.
- 30 Parundekar R, Knoblock CA, Ambite JL. Discovering Concept Coverings in Ontologies of Linked Data Sources. Lecture Notes in Computer Science Volume 2012; 7649: 427-443.