Suspected bugs when multithreaded replication is enabled in MySQL 5.7

Sorry,Recently, I am implementing a major version upgrade for several sets of databases… I have encountered a suspected bug, but oralce’s bug handling of the community version looks very slow. Can you help me see it? Thank you.

Environment :
OS:rhel 6.5 x64
MySQL:5.7.21-23
Melloc:jemalloc 5.1.0
Replication:master <=> master(MTR 4 threads) + keepalived


Problem :
The first node(primary) is running normally.
The secondary node(read only), memory usage has been rising.
The EVENT “sql/Log_event”(slave_sql), memory usage has been rising.

[HTML]
root@150203 [(none)]> select THREAD_ID, EVENT_NAME, COUNT_ALLOC, SUM_NUMBER_OF_BYTES_ALLOC/1024/1024, CURRENT_NUMBER_OF_BYTES_USED/1024/1024 from performance_schema.memory_summary_by_thread_by_event_name order by CURRENT_NUMBER_OF_BYTES_USED desc limit 10;
±----------±---------------------------±------------±------------------------------------±---------------------------------------+
| THREAD_ID | EVENT_NAME | COUNT_ALLOC | SUM_NUMBER_OF_BYTES_ALLOC/1024/1024 | CURRENT_NUMBER_OF_BYTES_USED/1024/1024 |
±----------±---------------------------±------------±------------------------------------±---------------------------------------+
| 1 | memory/innodb/buf_buf_pool | 240 | 31455.00000000 | 31455.00000000 |
| 507 | memory/sql/Log_event | 18229868 | 7494.16463280 | 7494.08160686 | <<< is it normal?
| 1 | memory/innodb/hash0hash | 92 | 2356.76754761 | 897.66287231 |
| 1 | memory/innodb/os0event | 5899209 | 765.12567902 | 765.12567902 |
| 506 | memory/innodb/std | 37745532 | 1584.49591064 | 191.84428406 |
| 1 | memory/innodb/log0log | 11 | 64.01335907 | 64.01335907 |
| 508 | memory/innodb/std | 8794028 | 393.10567474 | 38.30834961 |
| 507 | memory/innodb/std | 1762132 | 73.96929932 | 8.95660400 |
| 509 | memory/mysys/IO_CACHE | 6 | 8.03128242 | 8.03128242 |
| 508 | memory/mysys/IO_CACHE | 6 | 8.03128242 | 8.03128242 |
±----------±---------------------------±------------±------------------------------------±---------------------------------------+
10 rows in set (0.03 sec)
[/HTML]

[HTML]
root@150203 [(none)]> select thread_id, user, current_allocated, total_allocated from sys.memory_by_thread_by_current_bytes;
±----------±--------------------------------±------------------±----------------+
| thread_id | user | current_allocated | total_allocated |
±----------±--------------------------------±------------------±----------------+
| 1 | sql/main | 32.42 GiB | 34.08 GiB |
| 507 | sql/slave_sql | 7.42 GiB | 12.10 GiB | <<< The slave_sql thread consumes memory constantly, is it normal?
| 506 | sql/slave_io | 198.30 MiB | 100.58 GiB |
| 38914 | root@localhost | 8.53 MiB | 21.64 MiB |
| 510 | sql/slave_worker | 8.06 MiB | 8.21 MiB |
| 511 | sql/slave_worker | 42.74 KiB | 42.90 KiB |
| 39514 | root@10.25.150.203 | 16.84 KiB | 28.59 MiB |
| 39515 | root@10.25.150.203 | 16.84 KiB | 40.66 MiB |
| 39516 | root@10.25.150.203 | 16.84 KiB | 24.54 MiB |
| 3225 | repl@10.25.150.202 | 13.05 KiB | 63.97 KiB |
| 31 | innodb/buf_dump_thread | 192 bytes | 1.74 MiB |
| 30 | innodb/srv_worker_thread | 0 bytes | 0 bytes |
| 39 | sql/signal_handler | 0 bytes | 0 bytes |
| 40 | sql/compress_gtid_table | 0 bytes | 0 bytes |
| 2 | sql/thread_timer_notifier | 0 bytes | 0 bytes |
| 3 | innodb/io_read_thread | 0 bytes | 33.24 KiB |
| 4 | innodb/io_ibuf_thread | 0 bytes | 0 bytes |
| 5 | innodb/io_read_thread | 0 bytes | 27.16 KiB |
| 6 | innodb/io_log_thread | 0 bytes | 0 bytes |
| 7 | innodb/io_read_thread | 0 bytes | 5.67 KiB |
| 8 | innodb/io_read_thread | 0 bytes | 50.03 KiB |
| 9 | innodb/io_read_thread | 0 bytes | 44.77 KiB |
| 10 | innodb/io_read_thread | 0 bytes | 11.35 KiB |
| 11 | innodb/io_read_thread | 0 bytes | 22.70 KiB |
| 12 | innodb/io_read_thread | 0 bytes | 22.70 KiB |
| 13 | innodb/io_write_thread | 0 bytes | 1.68 MiB |
| 14 | innodb/io_write_thread | 0 bytes | 384.44 KiB |
| 15 | innodb/io_write_thread | 0 bytes | 650.33 KiB |
| 16 | innodb/io_write_thread | 0 bytes | 223.01 KiB |
| 17 | innodb/io_write_thread | 0 bytes | 294.46 KiB |
| 18 | innodb/io_write_thread | 0 bytes | 218.54 KiB |
| 19 | innodb/io_write_thread | 0 bytes | 274.54 KiB |
| 20 | innodb/io_write_thread | 0 bytes | 242.21 KiB |
| 21 | innodb/page_cleaner_thread | 0 bytes | 0 bytes |
| 23 | innodb/srv_error_monitor_thread | 0 bytes | 0 bytes |
| 24 | innodb/srv_lock_timeout_thread | 0 bytes | 0 bytes |
| 25 | innodb/srv_monitor_thread | 0 bytes | 18.92 MiB |
| 26 | innodb/srv_master_thread | 0 bytes | 14.24 MiB |
| 27 | innodb/srv_worker_thread | 0 bytes | 0 bytes |
| 28 | innodb/srv_purge_thread | 0 bytes | 0 bytes |
| 29 | innodb/srv_worker_thread | 0 bytes | 0 bytes |
| 32 | innodb/dict_stats_thread | -160 bytes | 218.36 KiB |
| 509 | sql/slave_worker | -5122994 bytes | 58.06 MiB |
| 508 | sql/slave_worker | -7894888734 bytes | 28.29 GiB |
±----------±--------------------------------±------------------±----------------+
44 rows in set (0.28 sec)
[/HTML]

my.cnf uploaded

my.cnf.txt (5.27 KB)

Event:sql/Log_event uses memory to rise indefinitely, is it normal?
This issue will occur in the most recent version 5.7.22~23.

Hello there, thanks for your question, I will see if I can get one of the team to take a look. Can I check though - did you upgrade to 5.7.22-23 or is that just that you have ascertained the problem remains? Thanks.

I have upgraded 6 mysql instances.
Two of them were upgraded to 5.7.22(in-place method), and the other four instances were upgraded to 5.7.23(mysqldump method).
Each of the two instances is a replication group, M-M replication(MTR, 4 threads), and all instances used jemalloc-5.1.0.
All instances, the memory usage of all instances is keeping growing, and the growth of all secondary nodes is much faster than their primary node.
About a week after the instance is started, the secondary node will start using swap. About two weeks, it will eat half of the swap.

This problem, after I removed the jemalloc-5.1.0 configuration, everything is stable until now.

Hello peng19832 does this mean that you are good now? Or do you still need assistance please?

I’m not 100% sure from the previous answer if you mean that you have removed the jemalloc-5.1.0 config and that as a result the problems have gone.

Yes, after removing jemalloc-5.1.0, the service is stable.
However, I don’t know why there is a memory leak after using jemalloc.
Does this mean that jemalloc-5.1.0 has an unknown bug.

One thought is Transparent Huge Pages. If you are using jemalloc, you MUST disable Transparent Huge Pages or you will experience extreme memory fragmentation over time which will eventually lead to an OOM situation. This is due to how jemalloc uses a hole-punching technique to release memory back to the kernel. See https://www.percona.com/blog/2014/07/23/why-tokudb-hates-transparent-hugepages/

It is also possible that jemalloc 5.1.0 has some problem that we have not yet seen.

Ok, thank you GeorgeLorch.
I saw this post before the upgrade.
However, in the instances using jemalloc-5.1.0, regardless of whether I disabled THP or not, I encountered a problem with the suspected memory leak until I removed jemalloc.
Now I removed the jemalloc from all instances.