Transforming OpenMRS Data into OMOP CDM: A Journey Towards Interoperable Health Records

OpenMRS plays a vital role in healthcare delivery across diverse implementations, helping clinicians record patient information, manage visits, and track conditions over time. As healthcare organizations increasingly look to data for insights, a challenge has emerged: how can this rich, localized data be shared, analyzed, and understood across systems and countries?

This is where the OMOP Common Data Model (CDM) comes in. As a global standard for structuring observational health data, OMOP makes it possible to turn fragmented clinical records into powerful tools for research, population health, and evidence-based policy. For OpenMRS, aligning with OMOP opens up new frontiers in data interoperability and reuse, connecting local care delivery with global health discovery.

The OpenMRS-OMOP CDM mapping project was born out of this need. What started as a technical ambition to align an open-source medical records system with a global data standard evolved into a cross-community collaboration focused on interoperability, reusability, and research readiness.

In this article, we walk through the journey: the motivations that sparked the effort, the approach taken by the engineering team, the mapping logic behind key tables, and the outcomes that are already making OpenMRS data more interoperable than ever before.

The goal: A standardized, research-ready data model

OpenMRS is used in hospitals and clinics across different countries to manage patient health records. These systems store data in a relational format, but the structure and naming conventions are often tailored to each implementation, making it difficult to reuse the data for broader research or analytics.

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is an open-source data standard developed by the Observational Health Data Sciences and Informatics (OHDSI) community. OMOP defines a way to consistently represent clinical data so that it can be used for observational research and large-scale analytics without having to clean and reformat it each time.

The OMOP CDM is maintained by the OHDSI community, a global collaborative network with strong participation in the US, Asia-Pacific, and Latin America. Interest in OHDSI is also growing in Africa, where at least two countries are already working to translate their health data into the OMOP format.

Mapping OpenMRS data to OMOP allows implementers to unlock access to tools like ATLAS, a web-based platform for cohort discovery, treatment pathway analysis, and outcome studies. More importantly, it means that OpenMRS users can contribute to and benefit from global research networks without needing to rework their data.

Early questions and practical constraints

Before writing any transformation logic, the team needed a deep understanding of the OMOP Common Data Model: what tables exist, how they’re structured, and how OpenMRS data could fit into them. The CDM includes dozens of tables, like person, condition_occurrence, visit_occurrence, and observation, each with required fields and standard vocabularies.

To stay focused, the team began with a minimal subset of tables that covered essential clinical data. These included person, visit_occurrence, condition_occurrence, observation, note, and location. Starting small allowed for experimentation and validation without becoming overwhelmed by the model’s full complexity.

Navigating OpenMRS-specific challenges

The OpenMRS data model is flexible and expressive, but this also introduced complexity in mapping it to OMOP. For instance, OpenMRS observations are highly customizable and can represent everything from numeric lab values to free-text notes. The team had to decide how to split these into OMOP’s separate measurement, observation, and note tables.

In many cases, OMOP expected fields that OpenMRS does not capture, such as note_title or language_concept_id. When no direct equivalent existed, the team used reasonable defaults or left values blank. Throughout, the goal was to make the mappings consistent, interpretable, and future-proof.

Building the ETL: Iterating in public

To explore how OpenMRS data could be mapped into the OMOP Common Data Model, the team began with a set of SQL transformation scripts. These helped prototype how key tables like person, observation, condition_occurrence, and note could be populated using OpenMRS fields and concepts. Every transformation decision was documented on the OpenMRS Wiki, including source fields, filtering rules, concept mappings, and hardcoded values.

This open documentation helped others in the OpenMRS and OHDSI communities follow the logic and offer feedback. One of the earliest complexities the team encountered was how to handle concept mappings. Rather than using OpenMRS concept IDs directly, the team built a mapping table based on the CIEL vocabulary, using SAME-AS relationships to link OpenMRS concepts to standard OMOP IDs. This ensured that data remained reusable and interpretable outside the OpenMRS ecosystem.

As confidence in the mappings grew, the team began experimenting with tools to automate and structure the transformation pipeline. Early attempts using DBT proved limiting, as DBT only supports single-database connections, and OpenMRS uses MariaDB while OMOP typically lives in PostgreSQL.

Experimenting with SQLMesh and future integration approaches

To overcome these limitations, the team began using SQLMesh, a transformation framework that supports multiple database engines and allows flexible, version-controlled data modelling. Using SQLMesh, they developed transformation models that describe how OpenMRS entities map to OMOP tables. The framework manages these as views or tables and supports automated deployment of changes.

This approach allowed the team to start building an ETL pipeline without having to rewrite the entire OpenMRS data layer. The models are open-sourced at jayasanka-sack/openmrs-to-omop, and the engineering notes describe how each table, like observation_period, note, and condition_occurrence was structured, filtered, and mapped.

To bridge the gap between OpenMRS in MariaDB and OMOP in PostgreSQL, the team considered using SymmetricDS or similar replication tools. One alternative approach is to transform OpenMRS data into an OMOP-like format within MySQL, then replicate that to PostgreSQL for integration with OHDSI tools.

As of now, the team is working on dockerizing the setup to make the entire pipeline more out-of-the-box. This would allow new implementations to get started faster and with fewer environment-specific dependencies. They’re also automating the data flow to keep the OMOP database in sync with the OpenMRS instance, reducing the manual effort required to maintain up-to-date exports.

Validating the OMOP CDM output

Once the transformations were in place, the team focused on validating the generated OMOP tables. First came structural validation: checking that required fields existed, column types matched OMOP specs, and values appeared as expected. Where data was missing or incorrectly typed, they updated the SQLMesh models accordingly.

Then came semantic validation. For example, in condition_occurrence, the team ensured that only non-voided data was included and that each condition was mapped using a SAME-AS relationship in the CIEL vocabulary. In note, they filtered for encounter type 8 and ensured the value_text field was not null.

To visualize the results, they set up Achilles, which profiles the data, and connected it to ATLAS, the OHDSI web interface. Initial connection issues were resolved by adjusting the connection string and schema configuration. Once corrected, they were able to view person-level profiles, cohort statistics, and more, confirming that the OMOP export was not only structurally sound but ready for research.

In addition to ATLAS and Achilles, OHDSI provides a full suite of open-source tools, including packages for data quality checks, vocabulary mapping, and large-scale analytics. These tools are actively used by researchers around the world to explore treatment pathways, conduct safety studies, and generate real-world evidence. Explore the full OHDSI tool ecosystem.

What does this mean for OpenMRS and the future?

The successful mapping of OpenMRS data to the OMOP Common Data Model opens up new possibilities for data interoperability and research. OpenMRS implementations can now experiment with OHDSI’s analytical stack, bringing their data into alignment with a global ecosystem of tools and researchers.

This effort lays the foundation for more reusable ETL pipelines, data quality validation, and federation with other OMOP-based datasets. It also means that implementers can explore these benefits without changing how data is collected in clinics.

As the community builds on this work, future improvements may include a more automated pipeline, better terminology mapping, and support for additional OMOP entities. But even at this stage, the OMOP integration project is already expanding what OpenMRS can do and where its data can go.

Transforming OpenMRS Data into OMOP CDM: A Journey Towards Interoperable Health Records

One thought on “Transforming OpenMRS Data into OMOP CDM: A Journey Towards Interoperable Health Records

  1. Wow,
    @Prince Onyeanuna! There is a lot going on with OMRS and OHDSI! Thank you! Just got off a call with another team that had also done an ETL from the Bahmni version of OpenMRS to the OMOP CDM. The Africa OHDSI working group is actually very active with regular meetings every two weeks. There have already been multiple datasets transformed in Africa, including the LAISDAR project in Rwanda (16 hospitals) and the APHRC DHS (survey) database as well as a dataset in South Africa. The largest opportunity is probably the Malawi effort to convert their entire EPGAF data lake into OMOP.

    It is interesting to hear the specifics around the challenges with converting the EAV data model to Postgres and the OMOP CDM. I am also interested in the semantic mapping since CIEL is a non-standard OMOP terminology (currently) and there may be instances where there are SAME-AS CIEL codes that do not map to a SAME-AS SNOMED code required for the CDM. We are exploring how CIEL can not just map “uphill” to the CDM, but also allow for CIEL-level specificity to be included in the CDM (storing the CIEL code initially in the SOURCE_CONCEPT field).

    One request of the OHDSI Africa Working Group is for a good demo data set that can be used for testing and training. They have a link to the GIT repo with our 5000 patient demo data. Did you use this data for your development work? Medications will be a key element to add to the ETL next. Although medication ERAs are complex, it is a key area for research and we should be looking to close that gap sometime.

    Africa Chapter Meeting: Monday, June 9 at 10am ET (https://ddec1-0-en-ctp.trendmicro.com/wis/clicktime/v1/query?url=https%3a%2f%2fteams.microsoft.com%2fl%2fmeetup%2djoin%2f19%253ameeting%5fMzQ5NWRiMDUtZWNhNy00OTY1LWE3NWMtOGM4MmY0OWZmZmNm%2540thread.v2%2f0%3fcontext%3d%257b%2522Tid%2522%253a%2522a30f0094%2d9120%2d4aab%2dba4c%2de5509023b2d5%2522%252c%2522Oid%2522%253a%25229d4c783b%2df934%2d4ba7%2db54d%2d2dbeaa089094%2522%257d&umid=9c066dc4-fa77-4e4b-bae9-6c5c1f564bca&auth=3eee6d57317a38631073652579c10dc620ca2b41-ace37fd8db23d6433cb8237c4fb244574d822d17) Meetings are traditionally scheduled for every other Monday at 10am ET

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to top