Slow replication with TokuDB (open table cache not used during replication)

We have a setup where we replicate from MySQL 5.6 master to a MySQL 5.7 slave. Recently we changed all tables from slave server to TokuDB. Before, the server was able to keep up, while now the server falls behind. When executing “show full processlist”, the slave node is almost always caught in state “Opening tables”. Executing periodically “show global status like ‘opened_tables’;” shows that a large number of tables are opened continuously and then executing “show open tables from databaseX” shows already all existing tables as being open.

Master database: 5.6.28-76.1, slave: 5.7.22-22
Database settings:
table_definition_cache = 200 000
table_open_cache = 200 000
table_open_cache_instances = 20

Replication settings:
binlog-format = mixed
sync_binlog = 1
expire_logs_days = 30
max_binlog_size = 500M
binlog-ignore-db = information_schema
binlog-ignore-db = performance_schema
binlog-ignore-db = mysql

log-slave-updates = 1
replicate-ignore-db = information_schema
replicate-ignore-db = performance_schema
replicate-ignore-db = mysql
slave-parallel-workers = 24

The MySQL process shows over 40 000 open file handlers and total amount of tables across all databases is less than 50 000. Based on behavior, it looks like during replication, MySQL instance is not using the open table cache or there is some kind of funky locking that prevents it to be used efficiently. To be noted that value of opened_tables increases at such a fast rate that in one week it almost caught up with the value from master database which has an uptime of 1.5 years.
The database still has tables running also with InnoDB.

Is there any special finetuning setting that I can apply or this is a bug?

I’ve opened a bug: and I found a workaround. It appears there is a bad implementation of table open cache

Hi there, I checked in with a colleague. You have maybe discovered that increasing the table open cache is the only way around.

It’s one of the areas of difference between the InnoDB and TokuDB storage engines.

So far the workaround is to leave table_open_cache untouched and decrease table_open_cache_instances. Before, at a rate of 10000 entries per cache instance, I had a miss ratio of almost 10% and also a very high overflow ratio. Now with 50000 entries per cache instance I have so far 0 overflows and table open cache misses stopped incrementing. The replication is now working even slightly faster than with InnoDB, and catching up at a rate of 4.5-5 x real time.

It feels like a very bad implementation of the cache though. I would have assumed the table is allocated to only one cache instance.