pt-query-digest errors report understanding

Dear Colleagues,

Could you please assist me in understanding the nature of errors in report
and how they are critical.

Here is what I have at the moment:
I’ve collected tcpdump

sudo tcpdump -s 65535 -x -nn -q -tttt -i any -c 2000000 port 3306 >/tmp/mysql_dump_2704_0555.txt
then analyzed it
pt-query-digest --type tcpdump /tmp/mysql_dump_2704_0555.txt >/tmp/mysql_dump_digest_2704_0555.txt

TCP errors session were reported to /tmp/pt-query-digest-errors.qvGoT9E

There are a lot of

reason_for_failure => ‘no server OK to previous command’,

state => ‘awaiting_reply’,

and

reason_for_failure => ‘got server response before full buffer’,

and sometimes non-empty server_retransmissions

I haven’t found good explanation for them.

Should I worry about them? Are they critical and affect my DB?
Can you point me out for description and further troubleshooting steps?

Thank you.

P.S. I have 2.2.14 pt-query-digest and Percona-Server-server-56-5.6.32

Could you please try a newer version?
The current Percona Toolkit version is 3.0.2 and there were many changes since 2.2.14 was released (2015-04-14).
You can download the latest version from https://www.percona.com/software/database-tools/percona-toolkit

Regards

oh. I’ve removed tcpdump already as it occupied too much space(
But I’ve check diff of code briefly and seems there were no changes for these section (mentioned errors detection ones).

And also - these are not errors in code but detected errors during queries processing (as far as I understand)

So just trying to find explanation of what was detected in my case

Please consider that for this answer, I have no examples to work with, but as far I can see, that message comes from MySQLProtocolParser: https://github.com/percona/percona-toolkit/blob/3.0/lib/MySQLProtocolParser.pm#L407
I don’t remember all the specific updates we did in that library, but that might be the source of the problem.
I can mention this issue that can be related: https://bugs.launchpad.net/percona-toolkit/+bug/1402776.

Regards

I’ll try to update
but it is production server

will get back to you with results

just an idea - these errors were observed due to tcp packets fragmentation. weren’t they ?

It seems like there are missing (maybe skipped for some reason) packets.
If you could provide a tcp dump as an example, I can take a look.
Thanks.

Hi Carlos,

I’ve updated toolkit to percona-toolkit-3.0.2-1.el7.x86_64.rpm

Here is fresh grep about reasons from error report and tcpdump screenshot - http://joxi.ru/Dr8y9ERc4Y8qxm
(can’t provide you with raw dump as it could contain customer’s private data).

Still top reasons are

reason_for_failure => ‘no server OK to previous command’,

state => ‘awaiting_reply’,

and

reason_for_failure => ‘got server response before full buffer’,

and some other

So once again my questions:

  1. what could be a reason for

reason_for_failure => ‘no server OK to previous command’,

state => ‘awaiting_reply’,

2.what could be a reason for

reason_for_failure => ‘got server response before full buffer’,

3.what could be a reason for

reason_for_failure => ‘client cmd not packet 0’,

4.what could be a reason for non-empty

server_retransmissions => [

1999078553,

1999078553

],

I suppose if such a common errors can be described somewhere in guides, this will be very useful.

Finally, can these issues be connected to network issues and how can I investigate/fix ?

Your help will be very appreciated.

Thank you.

mysql_dump.09.05_new.pcap.jpg

Those messages come from MySQLProtocolParser.
I see errors in the screenshot: some packets were lost (Previous segment not captured) and that could be the source of the problem.
If there are missing packets, ProtocolParser cannot do a correct analysis.

Regards