Patient Matching Analyzers Project
Mentors: Shaun Grannis, James Egg and Gary Teichrow
Assigned to: Nyoman Ribeka
Contents |
Abstract
In order to create a comprehensive, aggregate view of an individual's clinical status, patient-specific data must be matched and aggregated across many data scattered sources with many different patient identifiers.
Project Plan
Project A: Incorporate the "non-match rate estimator" (random sampler) into the Patient Matching GUI work flow
Current patient matching application use Expectation Maximization algorithm (EM) to estimate the weight of each field in a record. The initial value of non-match rate that will be used by this algorithm can be supplied by the user. This part of the project will aim to add a new feature that will enable user to use the random sampling technique to generate this initial value.
- Add new parameter to the current blocking run indicating when to use the random sampling process
- Gather requirements on what kind of parameters, where to include and how it will affect the current record matching application
- Add logic to accept or redo the random sampling process
- When a user run the random sampling process, the result might be varying with a certain variance and confidence interval. This logic would enable user to redo the random sampling process when the result is not satisfying
- Add logic to choose whether the user will use random sampling or not
- This is the main logic to incorporate the random sampling process into the patient matching application. With this logic, user will be able to pick how the initial value for the EM process generated, whether manual input value or random sample generated value.
Project B: Develop and implement framework for identifying duplicates in OpenMRS patient table
- Evaluate advantages/disadvantages of two potential approaches to identifying duplicates in the OpenMRS patient table. Those approaches are:
- Develop an OpenMRS de-duplication module that leverages the OpenMRS patient matching module with specific de-duplication business rules
- Develop de-duplication process within WebReach's EIS framework that incorporates OpenMRS patient matching module algorithm in WebReach's EIS framework
- *Decide* which of the two general approaches to take
- OpenMRS module
- EIS framework
** After deciding the pathway forward the following processes should be implementable in either an OpenMRS module or EIS framework. Once a decision is made, the specifics for each context can be fleshed out **
- Define business requirements for de-duplication
- Batch mode vs. real-time
- For discussion: I would advocate for batch mode initially
- May eventually (but not initially) want to call a "duplicate check" function before creating new patients
- Determine UI requirements/Administrative Functions/Workflow within OpenMRS
- For discussion: Given time constraints, the end point for this GSoC project may be a simple report that lists potential duplicates, grouped by similar identity.
- Many more potential requirements
- Batch mode vs. real-time
- Review OpenMRS data model as it relates to identifying patient data:
- Evaluate OpenMRS patient table(s) structure to understand how all relevant identifying patient data is stored in OpenMRS
- Develop a process to access OpenMRS patient-identifying data
- Develop process to adapt matching algorithm for use against the OpenMRS patient table. (May need to extract algorithm from OpenMRS patient matching module and place in EIS framework.)
- Develop process to evaluate each patient record for potential duplicates. From a high level, this process may look like:
- Read patient records from OpenMRS patient table(s)
- Match each patient record against other potentially matching patient records
- Identify potential duplicates (using probabilistic algorithm derived from OpenMRS patient matching module)
- Develop process to order potential duplicates by identity group, so that human review is easier.
- For example, if rec1 = rec3 and rec3 = rec21 and rec1 = rec17, then {rec1,rec3,rec17,rec21} are all the same entity and should be listed in the same “potential duplicate group” for easier review by the administrative user
- Develop process to allow user to review patient groups (e.g., simple list report vs. complex, merge-purge interface)
- (Time permitting -- this could get complicated) Develop processes to merge ("join")/purge ("invalidate") patient records in OpenMRS. These processes must interface with recent API changes related to voiding/invalidating/retiring OpenMRS data.
Deliverables
Midterm
Midterm deliverables will be the completion of project A and some part of project B (item Review OpenMRS data model of project B)
Final
Simple report (list) of the potentially duplicate patient records inside the OpenMRS database
