Sync Common Data into Multiple MySQL Databases

All we need is an easy explanation of the problem, so here it is.

My Situation

I have MySQL databases, each with the same schema (table structure and relationships exactly the same).

Sync Common Data into Multiple MySQL Databases

The data in the Green database is shared by the other three. I have a nightly process that finds changes in the Green database and UPSERTs them into the other three databases.

Think of the Green database as reference data updated by a separate team, but there is a lot of it and it changes every day.

Sync Common Data into Multiple MySQL Databases

The end result is that each of the Blue, Red, and Purple databases are a mix of updatable records unique to it and the recods from the Green database that may be referenced but not modified.

Sync Common Data into Multiple MySQL Databases

The Problem

The custom syncing process is messy and unreliable in ways that are not important for the question.

My Question

Is there a better way to combine the shared (green) data with other databases? I’d like to have minimal impact on the application data access layer that is designed for the interleaved data that is made possible by the nightly syncing process.

Idea Considered

I thought about leaving the Green data in its own schema (on the same MySQL server) and creating a view, (for queries) for each table in my other databases to give the illusion of the original merged tabes:

CREATE VIEW blue.vw_table001 AS
SELECT * FROM blue.table001 UNION 
SELECT * FROM blue.table001;

This would allow me to simply replace my Hibername mapping to the original tables with mappings to the corresponding view. Then writes would reference the actual tables as they do today.

However, this creates two problems…

  1. I’d have to eliminate all foreign keys because records in the Blue database can reference records in the Green database.
  2. I’m told the performance of combining the data this way will not be good. It may be important to know that there are some complex queries that join across several tables that would now be joining across views.

Final Thoughts

While it would be nice to not replicate the Green data into the other three (and there could be many more than three someday), I’m OK with that as long as the mechanism is relatively simple and reliable (my java solution is pretty complicated and error-prone).

Also, you may be wondering about primary key collisions between records created in the Green database and each of the other three. Our nighly sync logic records the Green records’ original ID in a special column and then assigns a locally unique ID to the records.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

It seems to me that row-based replication (instead of statement-based and default mixed ones) can be a proper solution in that case.

Native mysql replication, in fact, do not require identical tables on both master and slave sides. The only requirement is that for each replicated INSERT slave should not have the row with the same PRIMARY/UNIQUE keys. And vice versa – for each DELETE/UPDATE the corresponding rows should exist.

Row-based replication propagates row changes directly, as is. If row containing ID=xxxxxx was deleted on master, then the same row will be deleted on slave. If a row with an autoincremented column was inserted on the master, then the same row with the same A/I value will be inserted on the slave, despite of the slave’s autoincrement counter.

You need a multicolumn primary keys:
hostID -- rowID -- col1 -- col2...
PRIMARY (hostID, rowID) with hostID=Green|Red|Blue|Purple… will guarantee that PK will be unique for any row across the system. As a result replication events from master will never interfere with the local data on the slave. You can safely manipulate the ‘local’ rows interleaved with replicated master’s data until you limited to the rows having local hostID. Also this approach is easily extendable for arbitrary number of slaves.

May be that is not a best solution (a lot of code modification required) but it definitely will work.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply