Sunday, January 19, 2014

Can MySQL Replication catch up

So replication was recently improved in MySQL 5.6. However, people are still using 5.1 and 5.5 so some of those improvements will have to wait to hit the real world.

I recently helped move in this direction with a geo-located replication solution. One part of the country had a MySQL 5.1 server and the other part of the country had a new MySQL 5.6 server installed.

After dealing with the issues of getting the initial data backup from the primary to the secondary server (took several hours to say the least), I had to decide could replication catch up and keep up. The primary server had some big queries and optimization is always a good place to start. I had to get the secondary server pulling and applying as fast as I could first though.

So here are a few things to check and keep in mind when it comes to replication. I have added some links below that help support my thoughts as I worked on this.

Replication can be very I/O heavy. Depending on your application. A blog site does not have that many writes so the replication I/O is light, but a heavily written and updated primary server is going to lead to a replication server writing a lot of relay_logs and binary_logs if they are enabled. Binary logs can be enabled on the secondary to allow you to run backups or you might want this server to be a primary to others.

I split the logs onto a different data partition from the data directory.
This is set in the my.cnf file - relay-log

The InnoDB buffer pool was already set to a value over 10GB. This was plenty for this server.
The server was over 90,000 seconds behind still.

So I started to make some adjustments to the server and ended up with these settings in the end. Granted every server is different.

mysql> select @@sync_relay_log_info\G
*************************** 1. row ***************************
@@sync_relay_log_info: 0
1 row in set (0.08 sec)

mysql>  select @@innodb_flush_log_at_trx_commit\G
*************************** 1. row ***************************
@@innodb_flush_log_at_trx_commit: 2
1 row in set (0.00 sec)

mysql> select @@log_slave_updates\G
*************************** 1. row ***************************
@@log_slave_updates: 0

mysql> select @@sync_binlog\G
*************************** 1. row ***************************
@@sync_binlog: 0
1 row in set (0.00 sec)

mysql> select @@max_relay_log_size\G
*************************** 1. row ***************************
@@max_relay_log_size: 268435456

 I turned binary logging off as I monitored different settings and options to help replication catch up. It took a while. Some of the settings you see above may or may not have been applied as I worked within this time frame. Yet it did catch up to 0 seconds back. Now you might notice that a lot of these settings above relate in and around the binary logging.  So I ran a little test. So, I restarted and enabled the bin logs. I checked in on the server later and found it 10,000+ seconds behind.  So I once again restarted and disabled the bin logs.  It caught up (0 seconds behind) with the primary server in under 15 minutes. I used Aurimas' tool as I watched it catch up as well. If you have not use it before, it is a very nice and handy tool.

What this all means is the primary server must be ACID compliant. With this set up you are also depending on the OS for cache and clean up. This is server will be used as a primarily read server to feed information to others. It also means that yes, geo-located replication can stay up to date with a primary server.

What if you need to stop the slave, will it still catch up quickly?

How and why do you stop the slave is my first response.  You should get into the habit of using  STOP SLAVE SQL_THREAD; instead of STOP SLAVE;  This allows the relay logs to continue to gather data and just not apply it to your primary server.  So if you can take advantage of that it will help reduce the time it takes for you to populate the relay logs later. 

Some additional reading for you:

No comments:

Post a Comment

@AnotherMySQLDBA