Pt-archiver When archiving data, when encountering table data containing line breaks, an abnormal export occurs

I encountered a peculiar exception while archiving data using the pt-archiver tool. After investigation, I discovered that a varchar field in my source table contained a line break in its data. This altered the original data when it was written to a temporary file in /tmp/, resulting in a write error when inserted into the target table.
This is my table structure; the line break corresponds to the description field:

CREATE TABLE `o_bank_statement` (
  `id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `account_id` varchar(128) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL COMMENT '收款帐号ID',
  `pipeline_id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '通道ID',
  `bank_code` varchar(16) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '银行编码',
  `currency_id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '币种ID',
  `currency` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '货币',
  `description` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '流水说明',
  `amount` decimal(65,30) DEFAULT NULL COMMENT '流水金额',
  `balance` decimal(65,30) DEFAULT NULL COMMENT '餘額',
  `ref_no` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '流水号',
  `channel_ref_no` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '產品流水號',
  `pay_account` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '付款账户',
  `payer` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '交易人',
  `payer_handle` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '終端錢包(gcash/paytm/wechat/alipay)',
  `order_id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT '0' COMMENT '业务订单ID(關聯的業務訂單ID)',
  `use_time` datetime DEFAULT NULL COMMENT '上分时间',
  `grab_time` datetime DEFAULT NULL COMMENT '流水同步时间',
  `transaction_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '銀行流水ID',
  `transaction_time` datetime DEFAULT NULL COMMENT '流水交易时间',
  `transaction_date` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '流水交易日期',
  `transaction_type` tinyint NOT NULL COMMENT '流水类型(cr./dr)(平帳0/入帳1/出帳2)',
  `ip` varchar(25) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT '' COMMENT '同步IP',
  `status` tinyint DEFAULT '0' COMMENT '业务状态(是否被上分,0未,1已上分)',
  `ext_param` json DEFAULT NULL COMMENT '扩展参数',
  `remark` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `sign` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '唯一校验',
  `type` enum('1','2','3','4','5','6','7','8','9') CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '1',
  `trace_code` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '追踪码',
  `creator` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `updater` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '更新时间',
  `deleted` bit(1) DEFAULT b'0',
  `tenant_id` int DEFAULT '1',
  `rchrg_status` tinyint NOT NULL DEFAULT '1' COMMENT '上分状态',
  `c_from` tinyint DEFAULT NULL,
  `match_check` tinyint DEFAULT '0' COMMENT 'match check',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='卡商收款卡银行流水表';

The data that exports the exception is as follows:

1872326726156582914	8140573190	1866767202188722178	NG0307	1675073574825164802	NGN	FMW:M/FIP/238791419964906763/Jacob UYI\
/AKU MICROFINANCE BANK/PN7915983	3000.000000000000000000000000000000	0.000000000000000000000000000000	100014241226180041678735491707	\N	\N	Jacob UYI\	\N	1872325423778455553	2024-12-26 17:00:52	2024-12-26 17:00:51	1fda1cd1-bbbc-4c7f-82f6-4d2953ba3fcd	2024-12-26 17:00:42	2024-12-26	1	192.168.1.1561{"channelId": "", "productTypeId": "1681736822077063170"}	已上分	C474BD8916F0D41154F0D87B1A436E6F	1	\N	1001	2024-12-26 17:00:52	1001	2024-12-26 17:00:52		1	1	\N	0

1872326747534946305	7907055127	1869385989027221505	IN0015	1715745001241653249	INR	PAY/CR/651224350915/SAMUNURU SAI KRISHNA VARA PRASAD RAJU/Payment from PhonePe5000.000000000000000000000000000000	\N	651224350915	\N	\N	samanuru.krishna@ibl	\N	1872326543833784321	2024-12-26 17:01:05	2024-12-26 17:00:57	IBL4ae99909f5d945efa338284264d00647	2024-12-26 17:00:47	2024-12-26	1	192.168.1.158	1	{"channelId": null, "productTypeId": "1772257007167418370"}	已上分(审核)	943677DF4BB8549D6A8202DB1C241D8E	1	Payment from PhonePe	1001	2024-12-26 17:00:57	1001	2024-12-26 17:01:05		1	1	1	0

You can see that the data in the first row with ID 1872326726156582914 is misplaced:

FMW:M/FIP/238791419964906763/Jacob UYI /AKU MICROFINANCE BANK/PN7915983 should correspond to the description field,

192.168.1.156 corresponds to the ip field,

1 corresponds to the status field,

{"channelId": "", "productTypeId": "1681736822077063170"} corresponds to the ext_param field.

Now, the data in the ip/status/ext_param fields has been merged, resulting in an error when inserting into the new table:

DBD::mysql::st execute failed: Invalid JSON text: "Invalid value." at position 0 in value for column 'o_bank_statement_history.ext_param'. [for Statement "LOAD DATA LOCAL INFILE ? INTO TABLE paynow-trading-statement.o_bank_statement_history`CHARACTER SET utf8mb4(`id,account_id,pipeline_id,bank_code,currency_id,currency,description,amount,balance,ref_no,channel_ref_no,pay_account,payer,payer_handle,order_id,use_time,grab_time,transaction_id,transaction_time,transaction_date,transaction_type,ip,status,ext_param,remark,sign,type,trace_code,creator,create_time,updater,update_time,deleted,tenant_id,rchrg_status,c_from,match_check)" with ParamValues: 0='/tmp/INi5GMBObjpt-archiver'] at /usr/bin/pt-archiver line 7166.

My pt tool version: 3.7.0

mysql version: gcp sql 8.0.41

Hello @baker

Could you provide with the command that you are using for pt-archiver?

@Yunus ok

pt-archiver \
--source 'h=192.168.0.2,u=root,p=2HAcs?ziTBKe%[#6,D=trading-statement,t=o_bank_statement,L=1' \
--dest 'h=192.168.0.2,u=root,p=2HAcs?ziTBKe%[#6,D=trading-statement,t=o_bank_statement_history,L=1' \
--where "create_time < '2025-06-07 00:00:00'" \
--limit 5000 --txn-size 5000 --progress 5000 \
--statistics --bulk-insert --charset=utf8mb4 \
--no-check-charset --no-check-columns --no-delete

You should mask your IP, passwords from the comment.

Could you try without bulk-insert and see if that works fine?

@Yunus How is a single row insert implemented? Is it implemented using a statement like :

insert dst_table select * from src_table

I’ve written an archiving tool similar to pt-archiver, and I haven’t encountered any issues using the above statement.

When you use bulk insert it uses LOAD DATA

You can try without bulk insert, the insert queries should be shown in processlist how it executes.