Wide character in print at /usr/bin/pt-archiver line 6700.

Hi all,

I’m trying to archive some parts of our production DB but pt-archiver don’t want to run properly, it fails with “Wide character in print at /usr/bin/pt-archiver line 6700.”
Our DB has some data encoded in different than UTF8 encodings.

I tried to use “–[no]check-charset” flag but then it’s only getting 20 first rows and stopping to work.

Is there any way to overcome that?


Thank you in advance,
Denis

Hi Denis thanks for your question.
Can I just check what version of Percona Toolkit you are running please? I found reference to a bug here https://jira.percona.com/browse/PT-940 which suggests the issue is fixed. It seems to be something that occurs intermittently.
There are a couple of suggestions in the bug post too. If you have a test environment though, would be best to check those out there first.
Anyway, let me know about the version, Percona Server or MySQL versions, environment etc and I’ll see if I can get someone to check that for you.
Thanks!

Hello,

And thank you for prompt feedback!

pt-archiver I’m using is 3.0.10, MySQL server has 5.6.34 version.

The environment is Ubuntu 16.04.03 LTS 64 bit.

I’m running the “pt-archiver” command using “–charset” flag with type of “utf8” but in its short form: “-A; type: string”, below is command example (sensitive data has been removed):

nohup pt-archiver --source u=,p=’’,h=,D=, t=,A=utf8 --file <output_file> --where “’” --progress 10000 --statistics --limit 1000 --txn-size 500 > <log_file>.log 2>&1

I haven’t tried yet one of the following (taken from the link you’ve shared above):

I’ll do.

Thank you once again.

You’re welcome. If you possibly can, if you could download PT 3.0.11 (the latest version) and try that “just in case”.
Meanwhile, I will bring this to the attention of the tech team here in case anyone has an alternative suggestion.
Thanks!

Hi Lorraine,

I installed the latest version of PT and seems it’s running now without any issue.

I’ll post back here if this issue occurs again.

Thank you for your help, it’s much appreciated.

You’re welcome, I’m glad it’s sorted. :slight_smile:
Please do post back if it recurs, but from the bug database it appears that we’re satisfied we’ve fixed that one.

Hello Lorraine,

I’m still facing the same issue even when using recent PT version which is 3.0.11. It’s failing when trying to get data with text like 讠m㸋> and ⱳ⬘ᦢ

Is there anyway to overcome that?

Thank you in advance.

Hello le_den, thanks for the update. OK it sounds as though the original bug is not quite fixed in all cases. Are you able to create a small reproducible case by any chance? I feel that we probably need to update our bug database with a test case. Let me know? As it appears to be intermittent that’s always one of the trickiest things to track, but a test case could help a great deal.

Hello Lorraine,

There is the .csv file with 4 examples, two first are those where the PT is failing on while the last two are being processed without any issue.

Hope you will find this useful.
Thank you.

Thank you le_den, I will get this to the relevant techs and see if they can reproduce. Appreciated.

Hi,

Could you share the output of SHOW CREATE TABLE please?

Regards

Hi,

There you go:

[B]CREATE TABLE log (
UID varchar(255) COLLATE utf8_bin DEFAULT NULL,
numb1 varchar(255) COLLATE utf8_bin DEFAULT NULL,
numb2 varchar(255) COLLATE utf8_bin DEFAULT NULL,
created_at datetime NOT NULL,
updated_at datetime NOT NULL,
state varchar(40) COLLATE utf8_bin DEFAULT NULL,
field1 varchar(50) COLLATE utf8_bin DEFAULT NULL,
content varchar(1000) COLLATE utf8_bin DEFAULT NULL,
id bigint(20) NOT NULL AUTO_INCREMENT,

PRIMARY KEY (id),
UNIQUE KEY index_log_on_uid (uid),
KEY index_log_on_created_at (created_at),
KEY index_log_on_state (state),
KEY field1_idx (field1),
) ENGINE=InnoDB AUTO_INCREMENT=1876946846 DEFAULT CHARSET=utf8 COLLATE=utf8_bin[/B]

Best Regards,
Denis

Hello Percona team,

Do you have any update?

Thank you.

Hi,

Can you use mysqldump to send us the data examples?
I am not being able to import your data.

Hi Carlos,

Attached is a mysql dump file . please try to import it and let me know how that goes.

Thanks.

example.txt (1.67 KB)

I cannot reproduce the issue.
I imported the data. Here are some rows, including one of the conflicting ones.

mysql> select * from test.log\G
*************************** 1. row ***************************
UID: NULL
numb1: 3533
numb2: 3533
created_at: 2009-10-17 21:58:00
updated_at: 2009-10-17 21:58:00
state: received
field1: something13
content: ......p.....!..........m^..t...J........Z..*.{......{.....K5./#.I.%......~|q......#.k..w...x...&l..w..2.._..pH1.V0......h.Iz.i..I.B......h.
id: 28
*************************** 2. row ***************************
UID: NULL
numb1: 123456
numb2: 3533
created_at: 2009-10-17 21:58:00
updated_at: 2009-10-17 21:58:00
state: received
field1: something1
content: ....../.F<.;
id: 31
*************************** 3. row ***************************
UID: NULL
numb1: 789012
numb2: 3533
created_at: 2009-10-17 23:28:00
updated_at: 2009-10-17 22:41:00
state: received
field1: something2
content: ⱳ⬘ᦢ
id: 32
*************************** 4. row ***************************

Then I ran pt-archiver as follows:


bin/pt-archiver --source h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=log --charset=utf8 --file a.txt --where "1=1" --progress 10000 --statistics --limit 1000 --txn-size 500
TIME ELAPSED COUNT
2018-08-23T08:54:43 0 0
2018-08-23T08:54:43 0 13
Started at 2018-08-23T08:54:43, ended at 2018-08-23T08:54:43
Source: A=utf8,D=test,P=12345,h=127.0.0.1,p=...,t=log,u=msandbox
SELECT 13
INSERT 0
DELETE 13
Action Count Time Pct
deleting 13 0.0020 53.79
select 2 0.0005 13.18
commit 1 0.0001 2.93
print_file 13 0.0001 2.73
other 0 0.0010 27.37

and the output file has:


cat a.txt
\N 3533 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something13 ......p.....!..........m^..t...J........Z..*.{......{.....K5./#.I.%......~|q......#.k..w...x...&l..w..2.._..pH1.V043....h.Iz.i..I.B......h.
\N 463456 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something1 ....../.F<.;
\N 479012 3533 2009-10-17 23:28:00 2009-10-17 22:41:00 received something2 ⱳ⬘ᦢ
\N 4833 3533 2009-10-17 21:58:00 2010-10-17 00:41:00 received something3 ???
\N 1111 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something4 ......p.....!.....v..$0......B.Q....p.....u>..7.l.A.l.sKIf.....-I.bQ...G.&....T.M..8j.=....c....*.....*..C..m.KL.J49{r.D'..J8L:.....L.*...F.
\N 22222 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something5 ......p.....!........).H..8....I....1.....t....[....l...4_.Bs3........-...I.B.U6.#_...k9...2..09.....B4)X?...ylw.H50....a..>.c(..:.)z.='5...
\N 5144 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something6 ...c.........g......
\N 52555 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something7 ...}..#.Q..0
\N 9990 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something8 ...c..p.....!..........N.P...v...>.....Cb...2M..w)....a...m...-.|D.M5.Y.H......{.nZN.|....k.0..u..R%3.)...q.5.W.y.53.+p...y....N.Vv+.7'..=F.
\N 5433 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something9 ...o....)..H
\N 5533 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something10 ........sH(|
\N 5633 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something11 ............
\N 3533 3533 2009-10-17 21:58:00 2009-10-17 21:58:00 received something12 "...}..p.....!.....?-......GP..H.+.""..v.a.Kc..t_.X.-~/H.....L.TU.}..#......Q....-..t...;..X+...../....I.....u.;f.57Z.mo.f.......Q.....>.V...."

Could you run a quick test for me?
Could you add this under the

use strict;

line? (~ line 51)

use utf8;

And run the program again?
Regards

Hi Carlos,

Where should I add this line?

Thanks.

In pt-archiver under the first “use strict”. (line 51 )

Hello Carlos,

I’ve tried but Im getting the same error and this time the number of line is changed, it is occuring on last line of part of code above:


if ( $archive_fh ) {
trace('print_file', sub {
print $archive_fh $escaped_row, "\n"

This is how looks the code after I added new line after line 50:


use strict;
use utf8;

Regards,
Denis

Hello,

Thanks for you patience and your willing to help.
May I ask you to do one more test please?

Add 2 binmode lines as follows: (lines 6649 & 6650)


6646 my $need_hdr = $o->get('header') && !-f $archive_file;
6647 $archive_fh = IO::File->new($archive_file, ">>$charset")
6648 or die "Cannot open $charset $archive_file: $OS_ERROR\n";
6649 binmode STDOUT, ":utf8";
6650 binmode $archive_fh, ":utf8";
6651 $archive_fh->autoflush(1) unless $o->get('buffer');

Regards