Slow replication with TokuDB (open table cache not used during replication)

sergiu.hlihor · September 3, 2018, 5:42am

We have a setup where we replicate from MySQL 5.6 master to a MySQL 5.7 slave. Recently we changed all tables from slave server to TokuDB. Before, the server was able to keep up, while now the server falls behind. When executing “show full processlist”, the slave node is almost always caught in state “Opening tables”. Executing periodically “show global status like ‘opened_tables’;” shows that a large number of tables are opened continuously and then executing “show open tables from databaseX” shows already all existing tables as being open.

Master database: 5.6.28-76.1, slave: 5.7.22-22
Database settings:
open_files_limit
table_definition_cache = 200 000
table_open_cache = 200 000
table_open_cache_instances = 20

Replication settings:
Master:
binlog-format = mixed
sync_binlog = 1
expire_logs_days = 30
max_binlog_size = 500M
binlog-ignore-db = information_schema
binlog-ignore-db = performance_schema
binlog-ignore-db = mysql

Slave:
log-slave-updates = 1
replicate-ignore-db = information_schema
replicate-ignore-db = performance_schema
replicate-ignore-db = mysql
slave-parallel-workers = 24

The MySQL process shows over 40 000 open file handlers and total amount of tables across all databases is less than 50 000. Based on behavior, it looks like during replication, MySQL instance is not using the open table cache or there is some kind of funky locking that prevents it to be used efficiently. To be noted that value of opened_tables increases at such a fast rate that in one week it almost caught up with the value from master database which has an uptime of 1.5 years.
The database still has tables running also with InnoDB.

Is there any special finetuning setting that I can apply or this is a bug?

sergiu.hlihor · September 3, 2018, 3:35pm

I’ve opened a bug: [url][PS-4809] Open table cache not used during replication when tables are using TokuDB engine - Percona JIRA and I found a workaround. It appears there is a bad implementation of table open cache

Leonardo_Erpi · September 4, 2018, 3:31am

Hi there, I checked in with a colleague. You have maybe discovered that increasing the table open cache is the only way around.

It’s one of the areas of difference between the InnoDB and TokuDB storage engines.

sergiu.hlihor · September 4, 2018, 4:06am

So far the workaround is to leave table_open_cache untouched and decrease table_open_cache_instances. Before, at a rate of 10000 entries per cache instance, I had a miss ratio of almost 10% and also a very high overflow ratio. Now with 50000 entries per cache instance I have so far 0 overflows and table open cache misses stopped incrementing. The replication is now working even slightly faster than with InnoDB, and catching up at a rate of 4.5-5 x real time.

It feels like a very bad implementation of the cache though. I would have assumed the table is allocated to only one cache instance.

Topic		Replies	Views
Slave replication when using both TokuDB and InnoDB engines - any caveats? Percona Server for MySQL 5.7	0	422	November 7, 2018
Data in the Toku DB table are lost Percona Server for MySQL 5.7	0	556	October 31, 2018
table_cache on db with many tables Other MySQL® Questions	2	642	December 7, 2007
Too many open tables and increasing indefinately Percona Server for MySQL 5.7 percona	19	3271	December 17, 2021
Advice for backing up slave servers. Percona XtraBackup	2	501	November 21, 2011

Slow replication with TokuDB (open table cache not used during replication)

Related topics