Replication versus Synchronization

Database synchronization is closely related to database replication. In fact, sometimes people use the terms interchangeably. However, there are big differences between them. Understanding the differences will help you understand the different approaches used for solving replication and synchronization problems.

Database Synchronization Is Not Replication

Replication is mostly used in situations where identical replicas of the complete data set are maintained for high availability and performance. Replicas can often work independently as backups for each other. On the other hand, synchronization is often between a more temporal sub-set of data and a more persistent full-set of data, both of which are integral parts of a system. For instance, parts of a file could be buffered in-memory by an operating system and are “synchronized” with the file on hard disk. Another example is the synchronization of the data in a CPU cache memory with the data in the main memory. In both cases people use the term “synchronization”, not “replication”.

In Pervasive Computing, devices maintain a data cache that is a small subset of the data stored on centralized servers. Changes to the cache are temporary and should eventually be propagated to the servers and the server should refresh the caches with up-to-date data in central databases. Clearly, this is a synchronization process, not a replication process.

Replication Techniques Won’t Work for Synchronization

In traditional database replication schemes, physical transactions on each node are recorded and played back on all the other nodes. This technique would only work if each node has a replica of the full-set data.

There is also a stability issue with physical transaction based replications when the number of nodes goes up. Transactions on different replicas may conflict with each other. To handle this, cross system locking or complicated conflict resolution schemes are needed. In fact, they are used in eager replication and lazy replication respectively [3,5].

Eager replication synchronously updates all replicas as part of one atomic transaction. This is also called synchronous replication or pessimistic replication as it incurs global locking and waiting. This scheme is not suitable for Pervasive Computing since the locking and waiting are simply not feasible in an environment where there are lots of nodes/devices. All the devices may not even be connected to the network at a same time, let alone be locked at a same time to let the transaction go through.

In contrast to eager replication, lazy replication allows updates of any replicas without locking others. The local transactions then propagate to other replicas in background. This scheme is also called asynchronous replication or optimistic replication since it is based on the assumption that conflicts will occur rarely. Here each host must apply transactions generated by the rest of the hosts. Conflicts must be detected and resolved. Since the transactions cannot be simply un-done at their origin nodes, usually manual or complicated ad hoc conflict resolutions are required at the destination nodes.

Gray et al. [3] showed that the traditional transactional replication has unstable behaviors as the workload scales up: a ten-fold increase in node number or data traffic gives a thousand fold increase in deadlocks or reconciliation. A system that performs well on a few nodes may become unstable as the system scales up to even a dozen of nodes.

The traditional database replication schemes are clearly not suited for Pervasive Computing which involves hundreds or even thousands of nodes in one system. On the other hand, we can exploit the unique characteristics of Pervasive Computing to construct a synchronization scheme that may not be suitable for traditional replication situations but works well for Pervasive Computing.

Sync solutions usually employ a client-server model, instead of the peer-to-peer model as used in replication. Clients all communicate with the server directly. Clients do not sync with each other directly.

In synchronization, client and server usually exchange accumulated record level changes, instead of physical transactions as used in replication. Change tracking is the first thing you have to face in synchronization.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s