Pt-Arciver have issue when archiving binary data columns with --bulk-insert option and having UTF8 charset defined.
Binary data stored in destination table contain bad (changed) values.
When --bulk-insert is not used and data internally are not stored in file this issue can be avoided but it hurts performance and i would love ti use batching.
When --no-check-charset is used instead UTF8 it 's ok in some cases but risky cause disabling this check may cause text to be erroneously converted from one character set to another.
Destination table is created from source DB show create table command and contain identical structure:
CREATE TABLE ncr_event_store_archive_v1 (
event_id binary(16) NOT NULL,
event_name varchar(255) COLLATE utf8_unicode_ci NOT NULL,
payload longtext COLLATE utf8_unicode_ci NOT NULL COMMENT ‘(DC2Type:json_array)’,
occurred char(27) COLLATE utf8_unicode_ci NOT NULL,
dispatched char(27) COLLATE utf8_unicode_ci DEFAULT NULL,
created_at datetime DEFAULT NULL,
updated_at datetime DEFAULT NULL,
causation_id binary(16) DEFAULT NULL,
causation_name varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (event_id),
KEY dispatched_created_at (dispatched,created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Command used for archiving is:
“pt-archiver --source h=mariadbY,P=,D=seven,t=ncr_event_store,u=root,p=pass --dest h=mariadbY,P=,D=seven_archive,t=ncr_event_store_archive_v1,u=root,p=pass --where “ncr_event_store.created_at < ‘2018-10-16’” --limit=3000 --bulk-insert --commit-each --skip-foreign-key-checks --pid=”/var/run/pt-archiver-pid-ncr_event_store" --sentinel=“/var/tmp/pt-archiver-sentinel-ncr_event_store” --charset=“UTF8” --no-delete --why-quit --statistics --progress=100000 --analyze=ds"
Pt-Archiver version: pt-archiver 3.0.12