Generating Unique Key Values in Distributed Databases

A synchronization system contains distributed databases. How do you ensure that new records created on these databases have unique key values? In a single DB system, people usually use AUTO_INCREMENT columns (e.g. in MySQL) or sequence objects (e.g. in Oracle) that produce unique sequence numbers. In a multi-DB environment, special treatments have to be done to AUTO_INCREMENT columns and sequences otherwise different DBs might generate same primary key values that would conflict with each other when new records are synced from one DB to another.

Non-Conflicting AUTO_INCREMENT for MySQL Databases

NOTE: AUTO_INCREMENT does not apply to Oracle, which uses sequences. See next section for using sequences in a sync system.

AUTO_INCREMENT is the standard way for MySQL applications to generate unique key values. MySQL 5 introduces a couple of server variables, auto_increment_increment and auto_increment_offset, which make it possible to keep using AUTO_INCREMENT columns in a multi-DB system.

By default, auto_increment_increment and auto_increment_offset both have a value of 1. The inserted value is the least in the series [1, 2, 3…] that is greater than the maximum existing value of the column. When auto_increment_increment and auto_increment_offset take on non-default values, the series is no longer [1, 2, 3…], but calculated using the following formula:

auto_increment_offset + N × auto_increment_increment

where N is a non-negative integer value in the series [0, 1, 2, 3, …].

 

For example, with auto_increment_increment = 10 and auto_increment_offset = 5, the values are drawn from series [5, 15, 25…].

 

Following are the steps to take for using AUTO_INCREMENT in Pervasync.

 

  1. Choose an auto_increment_increment value. The value should be at least one greater than the number of local DBs you plan to deploy. For example, you may choose 1000 if your system is expected to have hundreds of local DBs.
  2. On central DB and all local DBs, edit your MySQL Server configuration file (usually called my.cnf or my.ini) to set auto_increment_increment and auto_increment_offset values. All MySQL servers should use a same auto_increment_increment value and a different auto_increment_offset value. For example, on central DB you may put

    auto_increment_increment=1000

    auto_increment_offset = 1

    while on the first local DB you may put

    auto_increment_increment=1000

    auto_increment_offset = 2

On central Db, keep the AUTO_INCREMENT column definitions intact. The AUTO_INCREMENT flag of the columns will be synced to local DB.

 

That’s it! From now on, do inserts as you normally do on a single DB without worrying about unique key violations. Nevertheless, if for some reason you prefer not to use the AUTO_INCREMENT feature, you can use the Pervasync “Sync Sequence” feature described in next section.

 

Sync Sequence

The idea of the Pervasync “sync sequence” feature is to divide a sequence into contiguous partitions/windows and distribute the partitions to central DB and local DBs. Database applications can draw globally unique values for unique keys from the sequence partitions. When a local DB is about to run out of the numbers, a new range is synced to the local DB.

Following are the steps to use Sync Sequences.

  1. Determine central DB sequence partition range. Normally you create a sequence for each unique key column. The central DB sequence has to be modified or re-created so that it will only occupy a partition. The START WITH value should be 1 greater than the maximum existing value of the column. The MAXVALUE of the sequence should be big enough so that central DB app would not easily run out of numbers, while at the same time it shouldn’t be too big so that local DBs have more room for their partitions. Manually create (or modify) the central DB sequence, for example, on Oracle server:

    CREATE SEQUENCE pvsdemo.task_id_seq START WITH 1000 MAXVALUE 1000000000;

     

    and on MySQL server

    CALL pvsadmin.create_sequence(‘pvsdemo’, ‘task_id_seq’, 1000, 1000000000);

  2. Publish the sync sequence on web admin console. Here you need to supply a sequence name, task_id_seq in our example and add it to a schema, pvsdemo in our example. In addition, you need to supply a “Start Value” and a “Window Size”. The start value should be one greater than the central DB sequence partition MAXVALUE. In our case, it should be 1000000001. The window size determines the number of values available in one local DB sequence partition. We use 1000000 for our example.
  3. After the first sync, the sequence partitions will be automatically created on local DBs. This is in contract to central DB, where you create the sequences manually. To use the sequences, call their NEXTVAL methods. For example, to insert into a table with column col1 as primary key, for Oracle database you do the following in your client application.

    INSERT INTO pvcdemo.table1(c1, c2) VALUES

    (pvcadmin.task_id_seq.nextval, ‘hello world’);

     

For MySQL database you do the following in your client application.

INSERT INTO pvcdemo.table1(c1, c2) VALUES

(pvcadmin.sequence_nextval(‘schema1’, ‘task_id_seq’), ‘hello world’);

For SQLite databases on Android or Blackberry, use the SyncClient method to retrieve the sequence next value and use it as key value for a new record. Note that the first argument is the logical sync schema name, not the client schema/database name. See API Javadoc for more details.

long idValue = pevasync.client.SyncClient.sequenceNextval(“schema1”, “task_id_seq”);

Other Options

We believe that AUTO_INCREMENT and sync sequence are the best choices for most situations. Still, there are other options that may fit your specific needs. We list them below.

  1. Only allow transactions to happen on central DB. Device local DBs are made read-only, i.e. for queries only. Believe it or not, there are systems that adopt this model.
  2. Use randomly generated numbers for key values. The length of the random numbers has to be long to reduce the possibility of collisions.
  3. Use a composite key. The sync client API has a method that returns the sync client ID. Apparently this is a globally unique number. You can use the client ID as the first column of a primary key. You then use a second column that takes locally unique values. Let’s call it LUID column. You define the client ID column and the LUID column together as your composite primary key. The values of the composite key are globally unique.
  4. Compute a GUID using client ID and LUID. For example, you could use this formula:

    GUID = <client ID> * 1000000000 + LUID

  5. Map local UID with GUID. This is the technique used by OMA DS (aka SyncML), and Activesync. Locally created records are assigned an LUID. During Activesync synchronization, a GUID for the same record is generated on server by mapping the LUID and synced back to client. For OMA DS, client sends an LUID back to server for every server sent record. A map table of LUID and GUID is maintained on server.

The mapping methods may be OK for simple PIM (Personal Information Management) sync. However, for enterprise applications that have large amount of data, a lot of tables and complex referential relationships between tables, the mapping would cause performance and maintenance problems. Hence, Pervasync does not support mapping method.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s