Pt-archiver prevent binlogs

Hi everyone,
I’m currently using an environment with an older Percona 5.7 cluster, and I’m facing the challenge of needing to drop a very large table—approximately 3.5 TB. To minimize impact, I’m planning to use pt-archiver to clean up the table in small batches.

However, there’s an additional complication: the developers have enabled row-based binary logging in this cluster. If I’m not wrong this might significantly increase the volume of binlogs generated.

I noticed that pt-archiver has a DSN option that, if set to true, disables binary logging with SQL_LOG_BIN. Would using this option effectively prevent the cleanup from generating excessive binlogs? Or is there a more efficient approach that I might have overlooked?

Thanks in advance for your insights!

Yes it would. SQL_LOG_BIN=0 is a session variable. pt-archiver will set this session variable at the start and it will only affect pt-archiver. Other sessions/connections will continue to get binlog’d as usual.

Keep in mind too that disabling binlogs means if there are any replicas, these rows will not get deleted from any replicas. If you are using the --purge option in pt-archiver, the binlog entry for the deletes should be small and not cause too much bloat. You can monitor how much binlog space you are using and PURGE BINARY LOGS TO ... to clean up excessive binlogs.

1 Like

Hi Matthew,

Thank you so much for your feedback.
I tested the option you suggested, but it seems it’s not working as expected, or I may have made a mistake in the process. Here’s what I did:

I started the archiver with a Kubernetes job using the following options:

pt-archiver --source h=db01-pxc-db-haproxy,u=$DB_USER,p=$DB_PASSWORD,P=3306,D=basedb,t=ftp_send_log,b=1 --progress 1000 --where "1=1" --statistics --charset=UTF8mb4 --limit=1000 --txn-size 1000 --primary-key-only --purge --sleep 1

Then, I checked the binlog with:

mysqlbinlog ./mysql-bin.000132 | grep -i -E 'Table_map|insert|update|delete'

However, I still see the deleted rows appearing:

#240823 10:00:04 server id 1  end_log_pos 47430607 CRC32 0x009a9f08  Table_map: `basedb`.`ftp_send_log` mapped to number 890
#240823 10:00:04 server id 1  end_log_pos 47431233 CRC32 0x192e2dbf  Delete_rows: table id 890 flags: STMT_END_F
#240823 10:00:04 server id 1  end_log_pos 47431351 CRC32 0xe722146f  Table_map: `basedb`.`ftp_send_log` mapped to number 890
#240823 10:00:04 server id 1  end_log_pos 47431977 CRC32 0xe0cea8d5  Delete_rows: table id 890 flags: STMT_END_F

Using b=1 to b=true doesn’t make a difference.
Do you have any suggestions on what might be going wrong?

prefix the pt-archiver command with PTDEBUG=1 and run it. See if you can find anything related to sql_log_bin in the output.