Pt-Arciver have issue when archiving binary data columns with --bulk-insert
option and having UTF8 charset defined.
Binary data stored in destination table contain bad (changed) values.
When --bulk-insert
is not used and data internally are not stored in file this issue can be avoided but it hurts performance and i would love ti use batching.
When --no-check-charset
is used instead UTF8 it 's ok in some cases but risky cause disabling this check may cause text to be erroneously converted from one character set to another.
Destination table is created from source DB show create table
command and contain identical structure:
CREATE TABLE ncr_event_store_archive_v1
(
event_id
binary(16) NOT NULL,
event_name
varchar(255) COLLATE utf8_unicode_ci NOT NULL,
payload
longtext COLLATE utf8_unicode_ci NOT NULL COMMENT ‘(DC2Type:json_array)’,
occurred
char(27) COLLATE utf8_unicode_ci NOT NULL,
dispatched
char(27) COLLATE utf8_unicode_ci DEFAULT NULL,
created_at
datetime DEFAULT NULL,
updated_at
datetime DEFAULT NULL,
causation_id
binary(16) DEFAULT NULL,
causation_name
varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (event_id
),
KEY dispatched_created_at
(dispatched
,created_at
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Command used for archiving is:
“pt-archiver --source h=mariadbY,P=,D=seven,t=ncr_event_store,u=root,p=pass --dest h=mariadbY,P=,D=seven_archive,t=ncr_event_store_archive_v1,u=root,p=pass --where “ncr_event_store.created_at < ‘2018-10-16’” --limit=3000 --bulk-insert --commit-each --skip-foreign-key-checks --pid=”/var/run/pt-archiver-pid-ncr_event_store" --sentinel=“/var/tmp/pt-archiver-sentinel-ncr_event_store” --charset=“UTF8” --no-delete --why-quit --statistics --progress=100000 --analyze=ds"
Pt-Archiver version: pt-archiver 3.0.12