Bug or Feature? Some strange things about proxysql replication monitoring

PavelS · January 22, 2022, 7:47am

Hello Everyone,
This post is a little investigation of some weird things I’ve faced up with my proxysql installation. And I didn’t sure about conclusion. Is it feature or bug?

Anyway,

We are using pretty simple mysql+proxysql installation. We have 3 mysql server’s (master + 2 replicas) and couple proxysqls on application servers. Simple as it is.

On mysql we are using pt-heartbeat to measure a replication lag.
On proxysql’s side everthing is also simple:

mysql> select * from mysql_replication_hostgroups;
+------------------+------------------+------------+--------------------+
| writer_hostgroup | reader_hostgroup | check_type | comment            |
+------------------+------------------+------------+--------------------+
| 10               | 11               | read_only  | Common replication |
+------------------+------------------+------------+--------------------+

This setup works fine for years, until proxysql 2.3.2 was realeased. When I’ve installed recent 2.3.2 for tests all servers become SHUNNED for some reason.

mysql> select hostgroup_id,hostname,status from runtime_mysql_servers where hostgroup_id=11 or hostgroup_id=10;
+--------------+------------+---------+
| hostgroup_id | hostname   | status  |
+--------------+------------+---------+
| 10           | 10.10.0.55 | SHUNNED |
| 11           | 10.10.0.53 | SHUNNED |
| 11           | 10.10.0.54 | SHUNNED |
| 11           | 10.10.0.55 | SHUNNED |
+--------------+------------+---------+

I’ve tried a many ways to find a root of this issue. Pretty clear it is all about replication. But what is the main cause?

After hours I find some interesting things. It is a rare circumstances.

Our project have setup a replication and heartbeat about 10 years ago. During that years many servers was replaced. There was a lot of records about old servers in test.heartbeat table(we are using test db for a heartbeat daemon)

+----------------------------+-----------+----------------+-----------+-----------------------+---------------------+
| ts                         | server_id | file           | position  | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+----------------+-----------+-----------------------+---------------------+
| 2012-05-24T00:50:24.001090 |         3 | log_bin.000041 | 377519349 | NULL                  |                NULL |
| 2017-07-06T09:59:45.000520 |       106 | log_bin.000223 | 432800495 | log_bin.000790        |           310894215 |
| 2017-08-29T03:17:29.000600 |       112 | log_bin.000418 |  53579985 | log_bin.000193        |           107686777 |
| 2018-07-29T10:51:46.000440 |        28 | log_bin.002854 |  31313746 | NULL                  |                NULL |
| 2018-08-08T08:09:52.000600 |        27 | log_bin.001472 | 206351451 |                       |                   0 |
| 2019-04-08T18:39:49.000800 |        36 | log_bin.001529 | 385530635 | log_bin.002187        |            74147128 |
| 2019-08-23T05:24:33.000590 |        37 | log_bin.005040 |   9683159 |                       |                   0 |
| 2020-08-05T06:29:29.000890 |        44 | log_bin.013246 |    241877 |                       |                   0 |
| 2021-08-18T06:19:41.005030 |        49 | log_bin.015948 | 209660214 |                       |                   0 |
| 2021-12-18 11:54:43        |        53 | NULL           |      NULL | NULL                  |                NULL |
| 2021-12-18 11:54:48        |        54 | NULL           |      NULL | NULL                  |                NULL |
| 2022-01-22T16:03:36.003960 |        55 | log_bin.006671 |  81197987 |                       |                   0 |
+----------------------------+-----------+----------------+-----------+-----------------------+---------------------+
12 rows in set (0.00 sec)

Recently @renecannao made this commit, which broke up our proxysql setup:

https://github.com/sysown/proxysql/commit/2a5121e52f98cee7b61302d26f46aa0ef8e10809

On previuos versions proxysql asks for minimal time difference, which works great with our setup. Old records made no difference on that query’s result
Now it is selects for max time difference, and as result, replication lag is about 10 years.

This change is important for multimaster environment, as far as i understand. But what about single master? Now I have to cleanup records about all off old masters. May cleanup flag or something should be included into pt-heartbeat?

I’m afraid, someday our project will replace mysql servers once again. Or we will change a master due maintence or something. And proxysql will fail.

I’ve wrote a doc in our wiki about this issue. But…
How we can mitigate it wihtout docs? Could someone give me an advice?

Thank you in advance

BTW Is new query for mesuring replication lag a Bug or a Feature? Let’s discuss

matthewb · January 23, 2022, 2:11pm

100%, all of this should be posted as a github issue on ProxySQL. ProxySQL’s version change broke your setup (not pt-heartbeat), and you should be asking there why the change and how to correct your situation.

PavelS · January 24, 2022, 10:27am

Hi, thank you for an answer. I’ll try to write it out on github

Topic		Replies	Views
ProxySQL and replication threads error ProxySQL	1	994	April 4, 2023
proxysql looses server after restart of mysqld (percona xtradb cluster) ProxySQL	8	2578	February 12, 2018
Proxysql Error Lost connection to mysql server ProxySQL	4	5494	December 29, 2020
Shunned Proxysql nodes and SST errors lead to Node unable to initialize Percona XtraDB Cluster 8.x mysql	7	1403	October 6, 2022
Is it possible to use percona server with Proxysql for replication? ProxySQL	2	1211	December 29, 2021

Bug or Feature? Some strange things about proxysql replication monitoring

Related topics