Pt-Archiver changes binary data when using --bulk-insert option

Pt-Arciver have issue when archiving binary data columns with --bulk-insert option and having UTF8 charset defined.
Binary data stored in destination table contain bad (changed) values.

When --bulk-insert is not used and data internally are not stored in file this issue can be avoided but it hurts performance and i would love ti use batching.
When --no-check-charset is used instead UTF8 it 's ok in some cases but risky cause disabling this check may cause text to be erroneously converted from one character set to another.

Destination table is created from source DB show create table command and contain identical structure:

CREATE TABLE ncr_event_store_archive_v1 (
event_id binary(16) NOT NULL,
event_name varchar(255) COLLATE utf8_unicode_ci NOT NULL,
payload longtext COLLATE utf8_unicode_ci NOT NULL COMMENT ‘(DC2Type:json_array)’,
occurred char(27) COLLATE utf8_unicode_ci NOT NULL,
dispatched char(27) COLLATE utf8_unicode_ci DEFAULT NULL,
created_at datetime DEFAULT NULL,
updated_at datetime DEFAULT NULL,
causation_id binary(16) DEFAULT NULL,
causation_name varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (event_id),
KEY dispatched_created_at (dispatched,created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Command used for archiving is:

“pt-archiver --source h=mariadbY,P=,D=seven,t=ncr_event_store,u=root,p=pass --dest h=mariadbY,P=,D=seven_archive,t=ncr_event_store_archive_v1,u=root,p=pass --where “ncr_event_store.created_at < ‘2018-10-16’” --limit=3000 --bulk-insert --commit-each --skip-foreign-key-checks --pid=”/var/run/pt-archiver-pid-ncr_event_store" --sentinel=“/var/tmp/pt-archiver-sentinel-ncr_event_store” --charset=“UTF8” --no-delete --why-quit --statistics --progress=100000 --analyze=ds"

Pt-Archiver version: pt-archiver 3.0.12

1 Like

Hello, thanks for this report, I will ask the team to take a look. I need to check in on the status of a bug fix as it’s possible that it’s relevant here.

Hello, do you have any news regarding this issue?