Patient Matching Project


Mentor: Shaun Grannis Contributor: James Egg Intern: Saketh Bhamidipati

Contents

Introduction

Record linkage is the task of identifying pieces of scattered information that refer to the same thing. Patient matching is a specific application, in which we try to identify records that belong to the same patient among different data sources. These sources can range from patient data collected at different hospitals to external information from governmental institutions, such as death master file etc.

One of the interesting and challenging aspects of this project is to deal with erroneous data, for instance when your name is misspelled or your birth date is entered incorrectly. These kinds of things often happen in reality, and we can account for them by using flexible distance metrics and statistical models.

Why is then record linkage important and what are the benefits?

Well, we are living in an exciting period of globalization, where computers and internet make world-wide collaboration easy and necessary. Patient linkage and data aggregation techniques will allow medical institutions to store their own data, yet at the same time work together with others to offer better treatment to patients.

For instance, patients often forget their test results at home, or old tests get lost eventually. Imagine that all your medical records are stored in digital format, and when you go to Hospital A, a doctor there can examine your tomogram taken 4 years ago at Hospital B where your name was misspelled by the clerk

I hope that record linkage functionality will be a step forward to increase collaboration between OpenMRS implementers.

Project Plan

A. User interface enhancements (Improve OpenMRS UI component layout and labeling)

Deliverable(s): Functional modification to OpenMRS de-duplication user interface code and modified workflow code Length of time: 4-6 weeks. Framework for this task:

      Select specific modifications to implement (2-7 days)
      Design modifications (1 week)
      Implement Modifications (2-3 weeks)
      Test Modifications (1 week)
      Deploy modifications on link.regenstrief.org (2-7 days)
      UI Modifications:
      1. Basic UI component modifications:
              a. Change "run report" link into a button
              b. Include descriptions (for the Radio buttons: "Ignore", "Must Match", and "Should Match") in the patient matching configure --> matching strategies
              c. On match configuration screen, use radio buttons: "Ignore", "Must Match", and "Should Match" ("Must Match" == blocking, "Should match" = Include)
              d. Change "Matching Configuration" labels to "Matching Strategy"
              e. Change "Configuration Name" labels to "Strategy Name"
      2. Process display modifications:
              a. While process is running, display a sequential list of steps, and demarcate them as they are completed
              b. While process is running check status every 5 seconds (not 30) and don't show countdown to next step
              c. While process is running display a [% complete] indicator before the status message, based on number of steps to be performed
              d. Create summary statistics after a report is complete: (e.g., Total number of patient records in the OpenMRS patient table, Total number of distinct patient groups, etc.)
      3. Functional modifications:
              a. Implement ability to schedule multiple matching tasks, select different matching strategies, who gets emailed, etc.
              b. Implement drag and drop interface for creating matching strategies (complex, may defer)

B. Develop feature selection framework for duplicate identification process

Deliverable(s): Description of approach to automatically selecting de-duplication features; functinoal feature selection code Length of time: 4-6 weeks (longer?) Framework for this task:

      Gather requirements for feature selection ("which features?" -- e.g., blocking variables, string comparators, # of u-value samples, etc.)
      Select matching features to be automatically selected (define scope that fits into GSoC timeline)
      Design process to characterize data, which will guide feature selection (identify data metrics)
      Create strategy for incorporating data metrics into a feature selection process
      Implement feature selection process for features in-scope for GSoC
      Test feature selection process
      Deploy feature selection process

C. Maintain link.regenstrief.org as a test environment

Deliverable(s): Up-to-date functional OpenMRS instantiation for development and testing Length of time: Ongoing during GSoC Framework for this task:

      Ad-Hoc: Work with Shaun and James to maintain link.regenstrief.org during GSoC

D. Ensure availability of appropriate data for testing record linkage

Deliverable(s): Appropriate data set(s) for testing record linkage Length of time: First ~4 weeks of GSoC Framework for this task:

      Work with Shaun and James to understand requirements of data sets for testing linkage (1-2 weeks)
      Work with James to review process for bulk loading test data into OpenMRS
      Work with Shaun and James to create or obtain new data sets as needed for development and testing (1-2 weeks)

Patient Matching Blog

http://emrin.blogspot.com

Resources

Browse module code