Synchronization and Data Consistency


This section offers the exhaustive list of various data conflicts that may be encountered during the data synchronization in the proposed sync system.

Contents

Introduction

This section is dedicated to the discussion on data consistency requirements for synchronization. In theory of distributed systems, a lot has been written on design of strongly and weakly consistent systems. What is not so clear is how those concepts should apply to clinical practice.

Discussion

In attempt to shed light on what 'weakly-consistent' may mean in terms of EMR, we hope to introduce detailed discussion on the clinical importance of data consistency in specific scenarios. For example: what, if any, actions should be taken if clinician A modified an obs record on the previous encounter that clinician B entered into the system a couple of weeks prior? Is it clinically significant? More generally, how is the 'clinical significance' of change events determined? Consequently, in case of synchronization, what guarantees, if any, should be placed on a) keeping track of and b) presentation of events deemed clinically significant but occurred at different locations during the time that a given node was disconnected from the rest of the systems.

  • Perhaps I'm not understanding the question posed here, but obs might not be the best example of the relevant point. We have a "baked-in" process for the auditing of altered obs. That is, obs are never "deleted" per se. They are simply voided (via a flag) and another obs row is written, with the idea being that we have record of all reported values for a given observation. The primary reason for this, is that treatment decisions could and likely would be made in-between the time the original value was reported in this case, and the ultimate correction and/or change of this value later on. Bottom line: all changes to obs must make their way to children databases in the same way as their original values -Paul
  • Maros: You are right in that obs isn't a good example of simple data loss when history isn't tracked - clearly if we didn't have any history, then the 'last one in wins' (*not* the last one modified) and we have no idea what the previous value was. However I find obs interesting in demonstrating that the the question of 'consistency', at least theoretically, goes beyond record version tracking and history. In the example above when clinician A changes the obs record, OpenMRS today would flag the obs value as voided and clinician B would see that if he/she were to open the encounter where obs change was made. However given that it happened two weeks ago, it may no longer be the latest encounter for that patient so I wonder what is the likelihood that clinician B when he/she next time sees the patient would realize that the change has occurred: in other words, is it a reasonable expectation that clinician B would always review all patient history? Or is it more likely that the clinician would only review data 'since last time he/she saw the patient' and if so how do we determine the changes 'since last time clinician saw the patient chart'? Let's note that this isn't a unique to allowing offline data access; this scenario is equally valid in fully connected system. To me the relevance to synchronization comes from the fact that in disconnected systems people are forced to formalize the system behavior in these scenarios. In obs case, I can see how we could support several ways of handling this scenario, assuming we keep history in 'right order': a) do nothing, simply collect data b) collect data and present a summary of changes since last sync-ed c) restrict some operations only to 'online' mode d) implement some version of 'logical lock' on parts of the data model(say obs and orders) and any changes to the records without acquiring a lock globally first would be deemed as 'tentative' only -- not sure how workable this would be if some children sites could be offline for days at the time.

It is worthwhile to note that these usecases are not unique to synchronization: any of these 'concurrent updates' could happen while two clinicians are editing the same record online at the same time. Of course, being online the likelihood of such conflicts is far smaller as compared to sync scenarios however the consequences are the same. It is not the uniqueness of the scenarios and consequences, rather their probability that makes us think of these explicitly in context of synchronization. That said, common mechanims such as pessimistic (i.e exclusively lock record while update is in progress) or optimistic ('intent' locks) concurrency are readily available in today's database systems to deal with these problems in 'connected' cases. It is equally clear that none of these mechanisms are useful in case of loosely connected systems.

Interestingly, in some aspects of the data model the notion of the 'change event' is already 1st class citizen: it is accepted practice that meds are never changed in place, rather discontinue/change order sequence is followed and presented to the clinicians. As such, the fact that the meds have changed - the change event - is modeled and presented explicitly. Similar is the case for observations and ability to change the existing observation by voiding the previous one. Clearly, this approach would not work universally: we cannot keep track and display a change history of *every* field -- more than anything UI would be intolerably cluttered with data that people would find largely irrelevant.

At this point we are working on a hypothesis that being able to provide ordered global 'history' of record changes made in the network of the installations is the fundamental requirement for the implementation. One can then imagine various 'conflict' detection/alert mechanisms on top of that implementation ranging from simple summary of changes made on the server while client was offline to supporting custom conflict resolution rules/extensions that can be invoked when data sync is performed.

One of the interesting consequences of having fine grained change history is that the actual database record conflicts will be non-existent: records are never deleted; every change is recorded and the 'current' value of any given record is determined by the global ordered record history that is established during sync session(s). Of course, we obviously did not 'solve' the problem, rather made a conflicts 'logical' vs. physical.

Use cases to consider

Following are a handful of use cases to ground the above discussion.

Concurrent updates of the same record @ child and parent

Precondition:

  • Child CH1 is disconnected from parent node P

Steps (chronologically):

  1. CH1 updates an OpenMRS record R to R1(the new state of the record) @ time T1 locally
  2. P updates the same record R to R2 at time T2, T2 > T1 (i.e. T2 is more recent)
  3. CH1 connects and initiates sync

Expected behavior:

  • The 'last' edit wins: R2 is the final state of the record at Parent. CH1's changes (R1) are put into record history. @ parent state R has been written to history also. The current state of the record (i.e. R2) is 'pulled' over to CH1. Record histories are exchanged during the sync.

Other comments/notes:

  • Time of sync should not matter: the changes are to committed according to when they occurred; not when the sync was initiated.
  • Depending on which data are affected, we may want to notify someone of a possible conflict. Perhaps we could derive some time of "conflict notification" scale that could be attributed to specific types of data, basing the response to conflicts on these metadata. For example, if level 4 or higher conflicts mandate notification, level 8 or higher conflicts are refused until moderated, and level 10 conflicts are refused or ignored. Is that too complicated? I would think that on the "conflict notification" hierarchy, patient addresses would be minor, names a little higher, labs results even higher, and orders higher still.
  • Paul: I think a good general rule for most of this is: the time in which a datum is entered (along with the encounter time) is the most important determinant of priority. The most recent edit should become the last modification to the record, trumping edits that have happened previously. While this won't always work out beautifully, it's in general the safest bet.


Variation: Concurrent updates to the same record @ two satellites

Precondition:

  • Children nodes CH1 and CH2 are disconnected from the same parent node P

Steps (chronologically):

  1. CH1 updates an OpenMRS record R to R1 @ time T1 locally
  2. CH2 updates the same record R to R2 @ time T2 locally
  3. CH1 initiates sync session @ T3 and transfers R1 to Parent
  4. CH2 initiates sync session @ T4 and transfers R2 changes to Parent

Expected behavior:

  • The 'last' edit wins: R2 is the final state of the record at Parent and CH2. CH1's changes (R1) are available in the change history log.

Other comments/notes:

  • What, if any, action (ie. notification) should occur @ CH1 on the next sync beyond propagating R2 state to it? Should there be a 'notification' to user to point out that R1 state has been overridden while offline?
  • If records are versioned, could there be an option to review the versions of the record to allow for merging of data? Is it unlikely that R1 contains any data 'lost' when the newer R2 state was accepted at the parent that should be added to create R3?
  • Burke: I think the most powerful solution would be to inject some domain knowledge in this process — i.e., merging of changes will be safe and helpful for some data, but may be dangerous for others. Conflicts in things like patient demographics, concept attributes, and form fields can probably be merged into an R3. Longitudinal data like observations and orders will not affect the same row, but introduce two (or more) rows that may be contradictory.

Modify Orders/Meds

Precondition:

  • CH1 and CH2 are disconnected from the same parent node P

Steps (chronologically):

  1. CH1 updates existing regimen record R @ time T1 locally: stops existing order; adds new one O1
  2. CH2 attempts to update regimen for the same patient @ time T2 locally; thus also stopping existing order and create new one O2
  3. CH1 initiates sync session @ T3 and transfers data to Parent
  4. CH2 initiates sync session @ T4 and transfers data changes to Parent

Expected behavior:

  • Both orders O1 and O2 are active. Should there be some sort of notification to clinician stating this?

Other comments/notes:

  • In this case, strictly ER speaking, no record collision will exist, however there is at least a logical conflict: patient regimen has been changed twice independently of each other. At least theoretically, this is problematic since order O2 has been entered (and accepted by the system) while O1 is already entered @ CH1 however it is not visible @ CH2. This, at least theoretically, breaks the basic read/write consistency: a clinician @ CH2 is making a regimen change based on 'out-of-date' data. How should this be handled by the system? Practically speaking, as long as CH2 is offline, there is of course nothing to do: we need to allow the entry of the change order. However, one can argue that until the CH2 is committed and all nodes (ie. CH1, CH2, and P in this case) have been updated the change orders should be of 'tentative' status -- i.e. until the full distributed commit is carried out the users should be in some way alerted to the fact that at least a potential for conflicting orders exists. This is very much at heart of 'write stability' guarantees in dist. computing literature. In practical terms, how should OpenMRS behave under this scenario? What are the clinicians' expectations?
  • Burke: I would expect that the system notify someone to work out any possible conflicts. When dealing with any order,s but especially treatments for HIV and MDR-TB, potential conflicts like this could literally be a matter of life and death.

Add new patient

Precondition:

  • Children CH1 and CH2 are disconnected from the same parent node P

Steps (chronologically):

  1. CH1 creates new patient record @ time T1 locally:
    1. new patient.patient_id is assigned @ S1
    2. new patient_identifier.identifier is assigned @ S1
  2. CH2, independently of S1 admits the same patient into care
  3. CH1 initiates sync session @ T3 and transfers data to Parent
  4. CH2 initiates sync session @ T4 and transfers data changes to Parent

Expected behavior:

  • Patient is registered within parent node P at T3 and the new patient record at time T4 is recognized as a duplicate — i.e., all new person records should be checked against existing patient records for a match before being presumed to be "new" to parent node P.

Other comments/notes:

  • variations:
    • the same patient was added at CH1 and P; same expected behavior
    • patient records are the same person (based on patient matching criteria), but two different identifiers were used
      • Burke: in this case, the 2nd identifier would be added as an alias (both identifiers attached to the same patient)

Patient Discontinue/exit from care

Precondition:

  • Children CH1 and CH2 are disconnected from the same parent node P

Steps (chronologically):

  1. CH1 exits patient PAT1 from care @ T1
  2. CH2, unknowingly, changes orders/meds/program -- essentially any clinical info @ T2
  3. CH2 syncs @ T3
  4. CH1 syncs @ T4

Expected behavior:

  • tbd.

Other comments/notes:

  • Again, strictly speaking, there is no DB record-level conflict to speak of. Of course, there is at least logical conflict: clinically significant events have been made all while the patient record has been discontinued. What should happen? One possibility: patient records status stays discontinued @ T4, however CH1 is notified after sync that there was a change to the PAT record while offline. Presumably CH1 at that point can review and take any action needed. Furthermore, next time CH2 syncs, its sync change 'review' should also contain info showing the PAT status change.
  • Burke: One could argue that the patient is not, in fact, discharged if they're receiving care at CH2; however, the proper way to handle this particular scenario would probably be either a manual conflict resolution tool
  • Paul: Discontinued globally, or discontinued from this site? In theory, as Burke has said above, they're not discontinued if they're still receiving care. Ideally, keeping patients in a program, is event-driven. If a patient is removed/discontinued from a program, and then a later event makes an order for that patient, in theory, that order should be an event which places that patient back into a program.