Appl Clin Inform 2024; 15(05): 1056-1065
DOI: 10.1055/s-0044-1791487
Research Article

Evolution of a Graph Model for the OMOP Common Data Model

Authors

  • Mengjia Kang

    1   Division of Pulmonary and Critical Care Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States
  • Jose A. Alvarado-Guzman

    2   Neo4j, Inc., San Mateo, California, United States
  • Luke V. Rasmussen

    3   Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
  • Justin B. Starren

    3   Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
    4   University of Arizona Health Sciences, Tucson, Arizona, United States

Funding This work was supported by grant 5U19AI135964 from the National Institute of Allergy and Infectious Disease of the National Institutes of Health.
 

Abstract

Objective Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions.

Methods We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts.

Results Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges.

Discussion To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed.

Conclusion We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.


Background and Significance

Systems biology research often depends on combining multiple data types from the electronic health record (EHR) for clinical variables to multi-omics data sets. Modeling efforts within systems biology depend on the ability to represent and process the multitude of relationships among these concepts. While significant research has been done using the traditional relational database management systems (RDBMS), graph databases have emerged as a promising technology for enabling and optimizing certain analyses, including graph algorithms like Centrality, Community Detection, Path Finding, and Node Embeddings.[1] [2] Leveraging graph algorithms alongside other nongraph approaches provides new opportunities to gain novel insights into biological processes.

Graph databases have been used successfully with biological data sources.[3] Queries using the Reactome[4] system required 93% less time than its RDBMS counterpart. However, clinical data sources have not received as much attention in the graph database community. Some notable examples included claims data, medications, and disease interaction,[5] [6] but not a full integration of all clinical data elements (labs, diagnosis, medications, visits, and procedures). One of the challenges is that EHR data are not stored natively in a graph format, and data warehouses for clinical operations and research typically optimize the transactional database schema into an optimized RDBMS schema for analytics.[7] Furthermore, modeling data for use in a graph database require careful preparation. While a naïve row-to-node conversion is possible—each row is a node, each column is an attribute, and each foreign key is an edge—the resulting graph is typically attribute heavy, resulting in suboptimal performance. This is because graph database engines are typically optimized to query knowledge that is represented in the topology of the graph, rather than in the attributes.[5] [8]

Within the realm of biomedical research using EHR data, there has been a trend toward the use of common data models (CDMs). CDMs allow different organizations running different EHRs, or the same EHR configured differently, to share a common structure and semantics for how their EHR data are represented. Although it requires more work upfront to transform the EHR data into the CDM, over time it supports broader portability of work.[9] Among the CDMs, the Observational Medical Outcomes Partnership (OMOP) CDM[10] has emerged as the preferred choice for many national initiatives.[11] [12]

Previous efforts using graph databases have primarily focused on local or bespoke data models, with some preliminary work focused on a CDM approach.[13] For graph databases to be more accessible for biomedical research, developing an approach to transform a popular CDM into a graph structure would facilitate their adoption. While there have been multiple studies evaluating methods for converting and harmonizing various types of EHR data into the OMOP relational schema,[14] [15] [16] [17] [18] there has been relatively little work evaluating the conversion from OMOP to other schemas,[19] [20] especially in the case of graph schemas.


Objective

Our objective is to develop a conversion of OMOP CDM data into a graph schema that optimally leverages the unique capabilities of graph database engines. Toward this goal, we developed and evaluated a series of approaches for converting OMOP CDM data into a widely used graph database.[21] We sought to make a generalizable graph model that could be applied to a variety of OMOP instances.


Materials and Methods

Study Cohort

This work was conducted as part of the Successful Clinical Response in Pneumonia Therapy (SCRIPT) study[22]—a multiyear systems biology study integrating clinical, transcriptomic, metagenomic, and bacterial genomic data to support machine learning on host pathogen interaction and pneumonia episode outcomes.[23] The SCRIPT cohort was recruited and consented at the Northwestern Memorial Hospital (NMH) and included 590 participants as of March 2022. To scale up the evaluation of the graph model, we conducted identical tests on the Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) database, comprising 134,145 patients in NMH admitted to the intensive care unit (ICU) between January 1, 2002 and December 31, 2021. This work was reviewed and approved by the Northwestern University Institutional Review Board.


Graph Database Platform Selection

To identify our selected graph database system, we compared several NoSQL databases—primarily focusing on Azure Cosmos DB[24] and Neo4j.[25] Cosmos DB is a scalable, multimodel NoSQL database developed by Microsoft, and Neo4j is an open-source graph database with numerous native graph analytic capabilities. Cosmos DB is a cloud-based solution charging by usage, although institutions may be able to get free credits to use it. Neo4j offers for download a free community version, allowing broader dissemination of this work. Our team found the Cypher query language used by Neo4j to be intuitive and well supported; also, Neo4j's database performance met our research needs. Cosmos DB had an overall lack of visualization features, and it uses the Gremlin API, which our team felt had a longer learning curve and had less community support being newer. Additionally, we evaluated other graph-based visualization tools like Cytoscape[26] and Gephi,[27] but found them lacking in the data storage and query features we need.


Data Source

Data were prepared by the Northwestern Medicine Enterprise Data Warehouse (NMEDW)[28]—a joint effort between the Northwestern University Feinberg School of Medicine and Northwestern Memorial Healthcare Corporation. The NMEDW has the infrastructure to create OMOP tables from data marts populated by our primary EHR (Epic), as well as from ancillary and legacy systems.

SCRIPT Data Source

SCRIPT data from the NMEDW were available for 9 of the 15 v5.3 OMOP Standardized Clinical Data Tables: Person, Provider, Observation_Period, Visit_Occurrence, Condition_Occurrence, Drug_Exposure, Procedure_Occurrence, Measurement, and Observation. The remaining OMOP tables were not provided as they were not populated at that time by NMEDW, or were deemed irrelevant for the SCRIPT study. The database included all historical data from the EHR, not only the data collected during SCRIPT admission. The OMOP tables generated by the NMEDW contained an extra concept name column to facilitate human inspection, which was included in the graph version of each OMOP table. Data were provided as a limited data set as defined by the Health Insurance Portability and Accountability Act Privacy Regulations.


CRITICAL Data Source

In alignment with the SCRIPT dataset, we extracted an identical set of nine tables from the CRITICAL OMOP database. Since the CRITICAL OMOP database did not include the concept names in the event tables, we preprocessed the dataset to include the concept names and then imported into the graph database. Similar to SCRIPT, this included all medical data for each individual in the cohort, not just data during ICU stays, and was also considered a limited data set.



Model Design

The mapping from OMOP CDM to the graph schema evolved over several iterations ([Table 1]).

Table 1

Evolution of graph models for OMOP CDM

Version

Description and changes

Rationale for update

Strengths

Weaknesses

Version 1

Naïve interpretation of relational table to node and “HAS_EVENT” as edges with Person node as the network center

N/A

Intuitive structure

Network is not easily traversable

Version 2

Added the ASSOCIATED_DURING_VISIT relationships with both Person and visit node as the center; updated demographics from node attributes to node

To directly link the encounter and event nodes

Easier network traversal than Version 1; smaller database size compared to later versions

Most information still stored as attributes

Version 3

Added the corresponding concept nodes for instances table

To allow the database to support semantics queries

Most frequently queried information and relationships moved to graph edges. Vocabulary-based queries enabled

Increased database size on disk compared to Version 2

Version 4

Transformed the node from unique entity to entity occurrences; moved edge attributes to node attributes; added provider nodes and edges

To encode the patient events network to the graph topology as much as possible

Simplified terminology management. More complex ontology-based queries enabled

Increased database size on disk compared to Version 3

Abbreviations: OMOP, Observational Medical Outcomes Partnership; CDM, common data model.


We started with reviewing the OMOP CDM ([Fig. 1]) relational database tables and created one spreadsheet per table to list the column names and metadata with SCRIPT-specific details. Three team members (M.K., L.V.R., and J.B.S.) independently reviewed each table to decide whether each column should be a node or edge attribute, and whether it should be a unique index. Decisions were then reviewed and discussed during weekly team meetings to reach consensus. In general, we only selected the primary key and foreign key with some important properties like concept name, start date, and end date to include in our graph schema. We used the arrows web app (http://www.apcjones.com/arrows/#) to record and visualize the schema versions. We started with a naïve graph model approach, transforming each table into a node type and columns in each table as node attributes. The schema is similar to the star schema used in relational databases, with the Person table in the center and instances table around. The instances tables (Condition, Drug_Exposure, Measurement, Procedure, Observation) were directly connected to Person and had no connection to Visit_Occurrence. Anticipating that there will be frequent research questions regarding demographics, we represented Gender, Race, and Ethnicity as independent nodes instead of as node properties. We designated this as Version 1 ([Fig. 2]).

Zoom
Fig. 1 Observational Medical Outcomes Partnership (OMOP) common data model (CDM) v5.3.1. Highlighted are the OMOP tables included in the graph modeling.
Zoom
Fig. 2 Graph schema Version 1. Version 1 represented a naïve and direct translation of the Observational Medical Outcomes Partnership (OMOP) relational structure into a graph structure.

Iterating on Version 1, in Version 2 ([Fig. 3]) we recognized that encounter (Visit_Occurrence) nodes are usually critical to clinical data analysis. To support this, we added an ASSOCIATED_DURING_VISIT relationship from other instances nodes to the encounter nodes, making Person and Visit_Occurrence nodes the center of the network.

Zoom
Fig. 3 Graph schema Version 2. Version 2 added computed links to both Person and VisitOccurence to improve query efficiency.

For Version 3 ([Fig. 4]), we added OMOP vocabulary (specifically concepts) to the model. This allowed the individual instance nodes to not duplicate the concept information for each node. Although OMOP uses a central Concept table, we split the concepts by data type, creating Measurement_Concept, Observation_Concept Visit_Concept, Procedure_Concept, Condition_Concept, and Drug_Concept nodes. Previous versions did not include the Provider node linked with the rest of the network. In this version, we added the relationship between each instance's nodes and Provider nodes. Specifically, we linked from Provider to Measurement, Observation_Occurrence, and Procedure_Occurrence nodes through a RESPONSIBLE_FOR edge, linked to Condition_Occurrence through a CAPTURED edge, and linked to Drug_Exposure through an INITIATED edge. The Provider nodes were then directly connected to instances nodes without the need to traverse from Visit_Occurrence nodes to other clinical events.

Zoom
Fig. 4 Graph schema Version 3. Version 3 abstracted out concepts as edges rather than attributes.

In version 4 ([Fig. 5]), we revisited the decision to have multiple concept nodes, and aligned on a single Concept node that mimics the original OMOP design. This simplified the schema and database creation without affecting database performance. We also included the other OMOP standardized vocabulary tables (Vocabulary, Concept, Concept_Class) as nodes in our schema. To build the network between the vocabulary tables, we added two edges between each of the vocabulary nodes, allowing easy traversal from one vocabulary node to another and back. We also implemented self-directed relationships; the RELATED_TO edge captures the relationships in the concept_relationship table and can be used to describe sematic distances. (In this version, we chose a generalized relation, rather than the subtyped relations in the concept_relationship table.) The NEXT edge links the visit sequence and builds up patient journey.

Zoom
Fig. 5 Graph schema Version 4. Version 4 was designed to maximize the amount of information represented as graph topology and minimize the use of attributes. Note that for testing purpose, we only loaded the identical entities for Versions 1 and 4, which means that the vocabulary tables were not loaded during the performance test.

Model Comparison

We selected Versions 1 and 4 for performance comparison. These two versions were selected because they represented implementations of the two major strategies for relational to graph conversion, specifically naïve recapitulation of the relational knowledge structure (Version 1) versus graph optimized (Version 4). Comparison metrics included database load time, query runtime, and database size. For test purposes, we did not include provider and vocabulary tables because (1) Version 1 did not support separate vocabulary tables and (2) because of this, the test queries could not utilize ontology-based queries. The test Neo4j database instance was set up on a server running AMD EPYC 7452 32-Core Processor. We used Neo4j Community Edition v4.4.15 for our evaluation.

We created comman-separated value (CSV) extracts from our source OMOP relational database and used neo4j-admin import to populate the database. Following the import, we added indexes for the Person and instance nodes (Condition_Occurrence, Drug_Exposure, Measurement, Procedure_Occurrence, Observation, Visit_Occurrence) for both Versions 1 and 4 models. Note that the ATC4 node and DRUG_ATC4 edge were added to both Versions 1 and 4 at the testing stage, when we wanted to check the medication classes rather than individual types. We did not create any edge index considering the two versions have significantly different edges, in both types and numbers. Two testing queries were developed based on actual questions posed by clinicians on the study team. The queries were built to represent queries that we felt would exercise the graph model relationships.

We developed scripts to automate benchmarking multiple iterations of database loading. Database loading was run 30 times each for both models, with the database recreated for each iteration. Each of the two testing queries was executed five times manually on SCRIPT and CRITICAL databases with Versions 1 and 4.



Results

Following the described iterative process, we refined a graph database model for the OMOP CDM, resulting in four separate versions of the schema. Diagrams of the node and edge relationships for each version of the schema are shown in [Figs. 2] [3] [4] [5], and a summary of each version along with strengths and weaknesses identified during development is presented in [Table 1]. We have made available a public repository on GitHub (https://github.com/NUSCRIPT/OMOP_to_Graph) with code and instructions on how to build a Neo4j graph database for the OMOP CDM, and benchmarking scripts for database loading and query running time.

After loading the same OMOP CDM dataset of 590 SCRIPT patients into the graph database, Version 1 resulted in 16,690 nodes and 8,091,100 edges, while Version 4 had a total of 8,088,034 nodes and 17,011,820 edges. The mean database loading time was 17.4 (standard deviation [SD]: 0.9) seconds for Version 1 and 28.8 (SD: 2.4) seconds for Version 4, which was statistically significant at p < 0.01. For the CRITICAL cohort with 134,145 patients, Version 1 resulted in 198,286 nodes and 1,498,570,382 edges, while Version 4 had a total of 749,011,396 nodes and 1,703,560,910 edges. The mean database loading time was 481.2 (SD: 74.9) seconds for Version 1 and 3,344.6 (SD: 468.0) seconds for Version 4 (also statistically significant at p < 0.01).

As can be seen in [Table 2], the relative performance of the two versions depended on the nature of the query. For the test on the SCRIPT cohort, Question 1 (find patients with specific diagnosis, procedure, and drug prescription) was approximately nine times faster on Version 1 than on Version 4. In contrast, Question 2 (find the most co-prescribed drugs) was approximately 26-fold faster on Version 4 than on Version 1. As for the CRITICAL cohort, Question 1 was about 49 times faster on Version 1 than on Version 4, while Question 2 was about threefold faster on Version 4 than on Version 1. We will discuss possible reasons for this observation in the next section.

Table 2

Cypher query running time

Model version

Test query number

SCRIPT database

Mean (SD) execution time (ms)

CRITICAL database

Mean (SD) execution time (ms)

1

1

268.0 (75.3)

967 (63.89)

2

121,414 (538.4)

5,839,586.40 (24,086.19)

4

1

2,763.8 (275.0)

48,639 (16,193.95)

2

4,633.2 (1,116.6)

2,118,551.25 (25,507.15)

Abbreviations: CRITICAL, Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning; SCRIPT, Successful Clinical Response in Pneumonia Therapy; SD, standard deviation.


Note: 1. Test query 1 is to find the patients who had spontaneous pneumothorax diagnosis, had a medication prescribed of dexamethasone, and had a “chest”-related procedure.


2. Test query 2 is to find the top 10 most frequently co-prescribed drugs.



Discussion

In this work, we present an efficient mapping from the OMOP CDM to a graph database. In the course of developing the schema, we demonstrated that naïve transformations from a relational to a graph schema can result in impaired performance depending on the queries performed.

We found that both Versions 1 and 4 of our model have strengths and weaknesses. Database build time for both versions took less than a minute and within an hour to load for SCRIPT and CRITICAL cohort, respectively, using the neo4j-admin import method. Version 1 was roughly 10 seconds faster on SCRIPT cohort and about 6 times faster on the CRITICAL cohort; however, Version 4 included about 8 million more nodes and 7.9 million more edges on SCRIPT cohort and 749 million more nodes and 205 million more edges than Version 1. Not surprisingly, the simpler query (e.g., Condition X and Drug Y) was faster on the simpler and smaller Version 1. Also, since Version 1 is a naïve translation of the relational model, it is consistent that it would perform well on a query type at which relational databases excel. On the other hand, Version 4 performed better in answering the question of finding the top 10 co-prescribed drugs during the same visit—a query that relies on information stored in the graph topology.

Although we lack vision into the inner working of the graph database engine, graph databases are typically optimized for storing and querying for information that is represented by the topology of the graph, rather than as attributes. In choosing which features of data to model as edges, it is important to think ahead of what questions the team wants to ask from the graph database. The tradeoff is that storing information in edge attributes takes more disk space than storing the same as node attributes. In addition, Version 4 is able to support more complicated queries regarding patient journey (e.g., find and visualize the sequence of patient diagnosis, medication prescription, and procedures in a certain month) that model Version 1 cannot. We note that in this work, we purposefully chose questions that both schemas can support to ensure a fair comparison between the two models.

Given the popularity and wide adoption of the OMOP CDM, there have been a variety of studies comparing the performance of various OMOP transforms. These have included transforming OMOP to Fast Health Interoperability Resources (FHIR)[29] and querying OMOP using i2b2[30]; OHDSI itself supports multiple RDBMS, including high-performance and cloud systems including Apache Spark, Google BigQuery and AWS Redshift, and additional work has evaluated columnar stores for OMOP.[31] These approaches have similarly demonstrated high-throughput query capabilities, but still rely on RDBMS schemas. Given that certain questions may be optimally solved using a graph database, we believe this work offers a solution that supplements (as opposed to replacing) RDBMS implementations that most institutions have in place. We believe our approach, which is openly available and can run on a free version of Neo4j, will facilitate broader adoption and evaluation of graph databases for the OMOP CDM. Previous works in the biomedical field that have leveraged graph databases have been primarily focused on knowledge representation with Gene Ontologies (GO) and Human Phenotype Ontology (HPO)[32] or other knowledge graphs[33] for drug discovery or health service claim data.[5] While Park et al suggested a modeling method to transform relational database to graph database, it requires the transition from relational database to a third normal form (3NF) relational database and then to 3NF Equivalent Graph Transform (3EG) they also only focused on the claims data instead of clinical data.[5] While there has been work in the Scalable, Standard based Interoperability Framework for Sustainable Proactive Post Market Safety Studies (SALUS) project to generate OMOP schema data from a Resource Description Framework (RDF) representation,[34] we have not identified prior work in the health and biomedical space that has focused on the conversion of relational OMOP data into graph schemas.

We acknowledge several limitations within our work. First, there have been many comparisons between RDBMS and Graph databases.[4] [35] Instead of adding one more comparison between the relational databases and graph databases, we focused on designing a graph database from the OMOP relational schema and discussed the strengths and weaknesses of the various graph schemas. In this study, evaluation was performed on cohorts of 590 and 134, 145 patients from a single institution, which are significantly smaller than a typical EDW. We intentionally used real-world, moderate-size datasets, the SCRIPT and CRITICAL databases, which can be easily implemented on the free downloadable version of Neo4j. We believe this strategy makes this work immediately applicable to many ongoing projects that utilize the OMOP CDM. Additionally, we focused on a subset of the OMOP standardized clinical tables and vocabulary; however, those tables represent the majority of clinical data categories, and ones that are included in most OMOP instances. Those wishing to extend this model to all OMOP tables can readily leverage the same “edge-centric” strategy to the other tables. Furthermore, we selected only two of the four versions we developed for performance evaluation. Based on our experience querying the various versions, we have no reason to believe that performance of Versions 2 and 3 would lie on a continuum between Versions 1 and 4. Finally, although the graph database was only implemented in a single graph database platform, we have no reason to believe that an edge-centric mapping strategy would not be applicable to other graph database systems.

Building upon this work, we will continue working with the SCRIPT project researchers to apply clustering, classification, and prediction algorithms to the graph database to contribute novel insights into the biological processes of pneumonia patients, patient trajectories, and associations between drug and diseases.


Conclusion

We developed a method of transforming OMOP CDM databases to graph databases. Our experimental results show our final model performed better than the initial naïve relational-to-graph version with respect to code simplicity, and query time on complex queries. The use of graph databases in conjunction with RDBMS and other analytic approaches offers more tools to researchers to identify new biological insights.


Clinical Relevance Statement

This work illustrates the implementation method of a graph database for EHRs that excels in answering clinical questions and finding potential patterns behind highly connected datasets using graph algorithms. It sets an example for clinical researchers to transform the OMOP CDM to the graph database and the method can be immediately applied to any OMOP database.


Multiple-Choice Questions

  1. When implementing a graph database for EHR, which of the following is most recommended?

    • Think ahead of the clinical questions to ask.

    • Normalize the database to 3NF.

    • Convert the entities to nodes and relationships to edges.

    • Search previously published graph database design papers.

    Correct Answer: The correct answer is option a. A graph database can perform the best when the schema is optimized for the most frequently asked queries or graph algorithms; b is not necessary; c is in general a good practice but not always correct. Graph schema is flexible, so starting from whiteboard targeting on the research questions rather than referring to other previous schemas is most recommended in the article.

  2. Which of the following is the type of question that Version 4 of our graph schema can answer but Version 1 cannot?

    • Basic pattern matching/filtering questions.

    • Community detection questions.

    • Ontology-related questions.

    • Patient diagnosis classification.

    Correct Answer: The correct answer is option c. Explanation: All the questions described in a, b, and c can be done in both versions, but Version 1 does not have specific vocabulary nodes and properties, and therefore cannot answer ontology-related questions.



Conflict of Interest

J.A.A.-G. is an employee of Neo4j. J.A.A.-G. joined the project after Neo4j was selected as our graph database engine and did not play a role in that decision. All other authors have no relevant conflicts to disclose.

Acknowledgements

We thank Dong Fu and the Feinberg School of Medicine Information Technology department for technical assistance. We also wish to thank Dr. Leah Welty for statistical guidance. We thank Dr. Yuan Luo and his team for providing the CRITICAL database for performance test. We also thank Dr. Richard Wunderink for providing guidance on clinical questions to test the database performance. We thank Dr. Nicholas Soulakis for his help in provisioning the on-prem Neo4j database server. Finally, we thank Dr. David Stumpf for providing optimization suggestions on Cypher queries.

Protection of Human and Animal Subjects

This study was conducted in accordance with the ethical standards of the institutional review board (IRB). All procedures involving human participants were reviewed and approved by the IRB of Northwestern University (STU00204868 for SCRIPT study and STU00212016 for CRITICAL study).



Address for correspondence

Mengjia Kang, MS
Feinberg School of Medicine, Northwestern University
Chicago, Illinois
United States   

Publication History

Received: 23 August 2022

Accepted: 27 August 2024

Article published online:
04 December 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom
Fig. 1 Observational Medical Outcomes Partnership (OMOP) common data model (CDM) v5.3.1. Highlighted are the OMOP tables included in the graph modeling.
Zoom
Fig. 2 Graph schema Version 1. Version 1 represented a naïve and direct translation of the Observational Medical Outcomes Partnership (OMOP) relational structure into a graph structure.
Zoom
Fig. 3 Graph schema Version 2. Version 2 added computed links to both Person and VisitOccurence to improve query efficiency.
Zoom
Fig. 4 Graph schema Version 3. Version 3 abstracted out concepts as edges rather than attributes.
Zoom
Fig. 5 Graph schema Version 4. Version 4 was designed to maximize the amount of information represented as graph topology and minimize the use of attributes. Note that for testing purpose, we only loaded the identical entities for Versions 1 and 4, which means that the vocabulary tables were not loaded during the performance test.