User talk:Maros


Contents

Data Synchronization and Replication: General Thoughts and Considerations

The following is a semi-loose collection of thoughts on the topic of 'data replication'. It is intended to, hopefully, provoke some discussion on the use cases, design, and consequences. As such, take it with a grain of salt.

Use Cases

a) Server to server replication
The classic definition of DB replication between several ‘static’ but more/less equivalent DB server installations. The servers are distributed (perhaps to facilitate local access throughout large geo region), but essentially all server roles, and usage scenarios are analogous. Considerations/Questions:

  • (Relatively) small number of static nodes
  • Change-sets potentially large; performance considerations
  • All replication semantics are a possibility (i.e. single master, bi-directional repl, custom conflict resolution)
  • contextual replication probably not an issue
  • Not intended as a solution to replicate to mobile clients with limited hardware capabilities or network throughput
  • How closely in sync the DBs are expected to be? Near real-time? Do we need event-driven change mechanism (i.e. if 'x' changes then sync is automatically initiated)? Can the repl. be user initiated? Can repl. be performed during DB maintenance in 'off-line' mode?

b) Disconnected Client
Roaming client; semi or fully disconnected. Upon the connection to the sever is established, local changes are replicated to the ‘master’ data store. Considerations/Questions:

  • ‘minimalist’ paradigm: only a few essential pieces of the client software with no data conflict possibilities are updatable in disconnected mode (i.e. entry of new patient demographics, new observations, etc.). Most features ‘read-only’. This allows for simple store-and-forward design on the client, no need for conflict resolution rules and replication.
  • ‘Fully’ featured off-line access: full replication of local data storage, contextual replication most likely required (meaning, replicate all 'static' data, but only subset of relevant patients and observations available in the master DB)
  • What is the frequency and size of the data sync? General expectations is "we sync often large amounts of data with lots of conflicts" or "occasionally with small changes with a few conflicts" or anything in between?

Common Design Patterns

  • Master-Master (and various variations on the ‘pub-sub’ theme):
    • Changes identified via timestamps or separate ‘change log’ (i.e. shadow change tables).
    • Data sync-ed between the publisher and subscriber by the replication manager based on the record of the changes since last sync-ed
    • conflict resolution mechanism necessary
    • Can use app-level abstractions to apply changes to the remote DB (i.e. Hibernate session.replicate() )
  • Single Master.
    • ETL
    • Change/transactional log replay
  • Weakly consistent Replication and peer-to-peer systems
    • This is quite interesting and relevant when building highly distributed systems over less-than reliable networks when 'weak' data consistency is acceptable (i.e. multiple copies of data exist in the system and are at different points of consistency; ultimately all updates are propagated however, not all at the same time)
    • This design is essential, IMHO, if the requirement is to have large network of loosely connected 'masters' -- strong (i.e. ACID-like) consistency is not realistic under those conditions)
    • a wonderful reference example for this type of system design: http://www2.parc.com/csl/projects/bayou/
  • ...and everything in between, literally

Considerations, Complications, & Consequences

  • Contextual replication design: how to define replication subset of the data relevant to a given subscriber? This can be quite elaborate in general-purpose systems with replication; meaning if the users are allows to extend data model and introduce new semantics how does define partitioning of the data in semantically meaningful way. Practically, keep it simple approach works well: define finite ways the data can by partitioned and limit replication to those boundaries (i.e. by clinic, by patient, by geographic location, by diagnosis, by clinician, etc.)
  • data model lifecycle. How much control do we have over the future of the schema and the rollout procedures? How 'open' is the data model? Can we assume that all future changes are backward compatible? If not, then what are acceptable procedures for the upgrades? Should the implementation sites be able to update/extend the schema on their own? If so, should repl. be 'extensible' to those additions/changes?
    • I think that this is actually a really complicated problem. But one way to cheat it is to say that data models must match (based on the global property) before synchronization can take place. If that's acceptable, that might be one way to cut down on this issue. User:Callen
    • I agree in that it adds not trivial amount of logic. In my previous life, we deal with this by having a set of concrete rules around schema life cycle: minor revs of schema are always backward compatible; these are handled as part of schema sync which is done as part of initial handshake between client and server during sync. If the schema versions are incompatible, sync is aborted. minor changes included mainly adding/removing columns and tables, changing data types. In case of major schema change we had to force mandatory site-wide client sync, updated the server, and then during the first connect of the client migrated the client schema. User:Maros


  • Data semantics consistency: detection and resolution of ‘business’ rules conflicts in the replicated data – how to detect and deal with business ‘rule’ violations beyond the simple E-R model
  • Sync across heterogeneous data stores

What are the requirements with respect to replicating data across heterogeneous data stores? RDBS-specific replication usually sufficient for ‘server’ to ‘server’ replication. But clearly does not work if two different data stores are to be synchronized.

  • Replication Semantics: Unidirectional vs. bidirectional, single master vs. federated vs. contextual?

In case of ‘mostly’ unidirectional, is ETL an attractive and simple alternative?

  • Conflict resolution: Is simple user-only acceptable given the expected usage, even if it is, how do we present the conflicts to the user?
  • Performance
    • Network performance and size of the dataset: What are expected ‘master’ DB data set and ‘replication’ change set sizes: client-app based solution without any explicit version tracking or change/transaction log can quickly become impossible as master data set increases.
    • Replication latency expectations: can replication occur on scheduled intervals or does it have to be ‘real time’? Locking and conflict resolution implications.
    • common complication wrt size of the changesets: it is fairly easy to initially 'scan' for changes in realtime (i.e. select * from foo where timestamp > last_sync-ed approach). This works great for small 'master' DBs and small number of clients. However, with larger changesets, this *may* get out of hand with respect to the realtime load on the system. If so, the shift to contextual replication and/or pre-building of changesets may be needed. Both of the latter approaches however have significant complications.
    • Practically speaking, are we aiming for close-to-real-time repl. or 'batch'/scheduled/user-initiated is OK? How fast is fast enough: 5 mins or 5 sec for 1000 patients with 100 observations each, over 256kbs?

Implementation options

To name a few:

  1. 'Direct DB-level' replication: look at this as purely E-R data store replication problem and rely on what RDBMS has to offer. This makes it no more/less than MySQL, Oracle, or MSSQL replication.
  1. Add change-set state mechanism, i.e. Timestamps or changelog.

Do we envision we need change log for auditing purposes in future? If so, this may be a good time to add that too. Add formal sync manager to handle replication. A few options to consider:

    1. Add timestamps, and/or trigger(s)
    2. Use hand-coded sync manager to replicate based on last-sync timestamps, either using hibernate’s session.replicate() or simply hand-code CRUD.
    3. http://www.funambol.com/opensource/ looks like a reasonable framework; also has MySQL DB connector
    4. http://opensource.replicator.daffodilsw.com/ looks OK on paper but I know nothing about it
    5. rDS feature of db4o: http://www.db4o.com/about/productinformation/features/drs.aspx
  1. ETL or a variation on ETL. Replicate subset of data tables in ETL fashion, perhaps allow for bi-directional replication of limited subset of ‘read mostly’ tables with no-conflict data based on the context rules.

Design Specifics [DRAFT]

This section speaks to the specifics of replication for openMRS.

Scenarios, Use Cases

The primary scenario we are trying to address:

  • Bidirectional synchronization of data between one parent and several satellite OpenMRS installations. Under this scenario, there is exactly one parent installation with several satellites. While satellite installations are fully functional OpenMRS server installations logically they are children of the parent node. Specifically, depending on the scope of implementation, we may choose to restrict some operations (such as create new clinic) to parent node only. Every satellite node is associated with exactly one and always the same parent. Furthermore, the parents can be connected to other parents forming tree topology where satellites are leaf nodes and parents are internal tree nodes. Other desired characteristics and limitations:
  1. parent and satellites can create, update, delete data
  2. there is a 'defined' set of conflict resolution rules and process (i.e. merge procs or policies) for resolving them (what that is is TBD.)
  3. in scope of initial release, we conflict resolution will support detection of logical duplicates: i.e. if patient with the same patient identifier (not the same DB PK) is entered at a parent and satellite; the sync process would detect such conflict
  4. All nodes are are assumed to have same DB schema
  5. Replication is triggered by an event (user intervention or scheduled event) and isn't expected to be real-time or near real-time
  6. While replication is in progress, system, or portions of the system should be operational
  7. Transactional consistency: in other words, atomic 'write' operations performed on satellite in scope of DB transaction should be applied at parent also in scope of the transaction (note this isn't the same as supporting DB isolation levels across patent and satellites)
  8. Performance considerations: we will set specific perf expectations with respect to network latency, and impact to parent/satellite OpenMRS installations. In general, at this point, we are targeting scenarios where parent may have 10s (not 100s) of satellites with total patient population of max 100,000s (not 1,000,000s)
  9. Additional desired functionality which however may not be initially included:
    1. Synchronization of modules data: We would like to expose sync facilities to module developers.

In addition to this primary scenario, the other urgent (alas less complex) use case is Remote Data Entry. Here disconnected (or loosely connected) client application synchronizes its local data store with a parent OpenMRS installation. Desired characteristics and limitations:

  1. Local client app is 'skeleton' OpenMRS install that allows entry of forms and their submittal to the parent server.
  2. Minimal subset of the data model is replicated to the client (i.e. clinic/patient list) to support form entry.

Roadmap

Subject to change at this point based on the GSoC and summer resource availability. That said:

  1. Milestone 1: Bi-directional parent-satellites sync per primary scenarios section
  2. Milestone 2: Release of Remote Data Entry tool using the sync framework from step 1
  3. Milestone 3: Sync for modules

Future considerations:

  • Sync across compatible but not equivalent DB schema(s); DB schema synchronization
  • Contextual replication (i.e. based on location hierarchy)
  • Advanced conflict detection and resolution (i.e. patient matching alg as part of conflict resolution during the sync)
  • data sync protocol in tree/graph network topology of loosely connected systems: bayou (http://www2.parc.com/csl/projects/bayou/)

Scope, Goals and Non-Goals

TODO, see the description of user needs and roadmap and you get an early idea

Implementation Plan [DRAFT]

High-level project plan and tasks bellow. The general idea is to begin with the basic framework for change detection and serialization that can be refined in future.

Phase 1: Sync 'Lite' implementation

Goal

Skeleton framework for detecting changesets, serialization, transfer, deserialization, and hooks for conflict resolution.

Primary use cases

  • Server to server data transfer (bi directional). In two installation of OpenMRS one is acting as 'parent' and the other one as 'satellite'; see design section for scenario description.
  • Remote patient data entry. Form data entry is performed at the remote site with occasional connectivity to the main OpenMRS 'server'. The remote site is running a 'skeleton' version of the OpenMRS software locally to allow entry and storage of the forms locally while the site is disconnected from its corresponding server. When online, the data entered locally is to be transfered to the parent server and in return local site receives minimal updates to common data (i.e. concepts, clinic and patient list).

Design

Key elements of solution:

  • Changeset tracking and detection at the satellite and parent
  • Changeset transfer and transfer protocol; call it a transfer packet
  • Parent endpoint logic: receive transfer packet and transfer packet decode to satellite changeset
  • Apply Changeset @ parent while maintaining transactional boundaries
  • Conflict detection and resolution, merge procedures, status/failure/rollback notifications
  • Apply parent changeset at satellite

Consequently, several key elements of design:

  • changeset detection mechanism
  • changeset data serialization (for example XML serialization of org.openmrs)
  • transactional consistency of the serialized entities (what is the proper sequence of 'write' operations)
  • data transfer protocol
  • parent endpoint: web service and parent endpoint implementation: mechanism for parsing the entities and re-applying them to the server via hibernate
  • conflict detection and resolution

Schedule

Tentative schedule, a simple task list with durations, really.

  • Task: use cases details, assumptions
    • Duration: 1-2 weeks
    • Description: take time to write down detailed use cases and design doc (mainly what is and what is not in scope and assumptions made; i.e. list what will need to be addressed as part of phase 2) for the sync lite
  • Task: XML Serialization of API, JUnit fx to make sure this works correctly
    • Duration: 2+ weeks
    • Description:
  • Task: Changeset detection
    • Duration: 1 week
    • Description:
  • Task: Transactional consistency, figure out exact sequencing of changeset, inserts, updates, deletes; begin to develop endcase scenarios and corresponding unit tests
    • Duration: 2+ weeks
    • Description:
  • Task: Changeset/transfer format spec (result of previous step)
    • Duration: 1-2 weeks
    • Description:
  • Task: transfer protocol
    • Duration: 1-2 days
    • Description: HTTP web service. Clearly, the protocol should be abstracted away from the changeset def but at this point simple HTTP will do.
  • Task: service endpoint skeleton
    • Duration: 1 week
    • Description: Leverage the dictionary import work.
  • Task: service endpoint implementation
    • Duration: 1 week
    • Description: Apply changeset to master.
  • Task: final validation, perf tuning
    • Duration: 1-2 weeks
    • Description: ensure comprehensive endcase coverage via unit tests, perf tuning of changeset generation, transfer and application of changeset (i.e. network load, DB locking, etc.)

Other misc thoughts & parking lot

High-Level Design for single master bi-directional repl

TODO: this is work in progress

Let us first begin with high-level description of the algorithm for a single master bi-directional replication:

  1. prepare: connect to server, get current_server_time, determine last_sync_time, establish new last_sync_time (note that this may not be simply a single timestamp from the server, however in any case, we need a mechanism to identify 'since last syc-ed' changes for a given table/object)
  2. build the local changeset
    1. select last_sync_time from local DB
    2. select all local changes since last_sync_time (i.e. updates, inserts)
    3. note: deletes must be tracked separately

If we never really delete anything, does this matter?

    1. transaction scope is important: we may need a map (similar to hibernate) that defines the transaction semantics.
  1. apply changes to server
    1. Begin commit sequence using the tx/commit map against the server data store
    2. resolve any conflicts, if needed, note if we need user intervention to resolve we must do this in two-phase-commit-like fashion (i.e. first see if conflicts can be encountered, if yes allow users resolve and then replay the commit sequence; this can be difficult and slow to orchestrate)
    3. abort, rollback if errors found
  2. <<schema upgrade detection would go here >>
  3. build server changeset
    1. select remote changes since last_sync_time and less than the current_server_time, note again deletes must be tracked and handled separately
  4. apply server changeset locally:
    1. apply deletes to local DB
    2. apply updates/inserts
    3. resolve conflicts, if needed
  5. cleanup: update last_sync_time structure(s) locally, enable contraints

Changesets

TODO

Conflict Resolution

TODO

Schema update

TODO

Open Issues

The big design issues to settle:

  • schema and app version synchronization: is this a goal? IMHO, for server site-to-site we can make compatibility of versions a prereq (we need a mechanism to track it, but that should be easy enough via comparing app/db versions). For the client apps, this maybe quite a bit more problematic. In that if there are roaming clients, it is rather unreasonable for everyone to keep all their laptops up to date at all times. Seems like a mechanism to minimally deal with the following is needed:
    • push compatible schema updates to clients
    • and the ability to 'build' a client DB app from scratch via 'sync' between empty client DB and the server.
  • client-server scenario & contextual replication: how do we define the subset of a data being replicated?
  • conflict resolution framework: how ambitious do we want to/need to be? IMHO, the big decision is: do we need 'user-level' mechanism? Meaning, do we need actual UI where non-technical folks can visualize the changesets and conflicts or is it enough to indicate that there is a conflict and let them decide how to resolve it via one of predefined ways (i.e. we say: 'update of obs for patient Y failed because patient record has been updated on the server, would you like to: a) ignore b) update c) abort?' ). While, obviously, there is a great number of variations in between, there is a fundamental difference in scope, effort, complexity between the two approaches. we need to decide which one we are aiming for.

Smaller things:

  • required schema changes:
    • use of globally unique PKs (i.e. GUIDs) for tables to be sync-ed
    • mandatory time-stamping of records
    • delete_log table
  • more still to come, a million of them...

...and everything else: