Not the answer you need?
Register and ask your own question!

script with pt-heartbeat becomes zombie

e-ferrarie-ferrari EntrantCurrent User Role Beginner
Hi,

i'm currently testing replication of a MySQL DB. MySQL is version 5.5.49, Server is SLES 11 SP4. I have installed percona-toolkit 2.2.16-1.
I'm currently testing pt-heartbeat. I have a script which starts pt-heartbeat:
sunhb65278:~ # cat /root/skripte/heartbeat.sh
#!/bin/bash

pidof perl /usr/bin/pt-heartbeat > /dev/null
rueck=$?

if [ $rueck -ne 0 ]; then
 pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
 sleep 5
fi

temp=$(pt-heartbeat -h sunhb58820-2 --user checksum --password checksum --check --database percona --master-server-id 10352397)

diff1=${temp#*.}
# Zahl vor dem Punkt

diff2=${temp%.*}
# Zahl hinter dem Punkt

if [ $diff1 -gt 0 ]; then
  mail -s "pt-heartbeat on $HOSTNAME fehlgeschlagen" bernd.lentes&#64;helmholtz-muenchen.de << EOT
    Achtung ! Slave hängt $temp Sekunden hinter dem Master !
EOT
exit
fi

if [ $diff2 -gt 0 ]; then
  mail -s "pt-heartbeat on $HOSTNAME fehlgeschlagen" bernd.lentes&#64;helmholtz-muenchen.de << EOT
    Achtung ! Slave hängt $temp Sekunden hinter dem Master !
EOT
exit
fi

The script is called by cron every minute.

As you see the script first tries if pt-heartbeat is already running, if not it starts.
If i watch the processes, this happen:
TIME:15:51:01
root     31532  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:02
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15008 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31546  0.0  0.0   4552   544 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:03
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31553  0.0  0.0   4552   544 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:04
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31560  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:05
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31567  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:06
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31574  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:07
root     31535  0.0  0.0  11320  1400 ?        Ss   15:51   0:00 /bin/bash /root/skripte/heartbeat.sh
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31576 33.3  0.0  83044 17924 ?        S    15:51   0:00 perl /usr/bin/pt-heartbeat -h sunhb58820-2 --user checksum --password checksum --check --database percona --master-server-id 10352397
root     31582  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:08
root     31535  0.0  0.0      0     0 ?        Zs   15:51   0:00 [heartbeat.sh] <defunct>
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31589  0.0  0.0   4552   548 pts/1    S+   15:51   0:00 grep heartbeat

TIME:15:51:09
root     31535  0.0  0.0      0     0 ?        Zs   15:51   0:00 [heartbeat.sh] <defunct>
root     31539  0.0  0.0  76732 15312 ?        Ss   15:51   0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root     31596  0.0  0.0   4552   544 pts/1    S+   15:51   0:00 grep heartbeat

First no process is running. Then cron starts the script. Why is the script /root/skripte/heartbeat.sh (pid 31535) becoming a zombie at 15:51:08 ?

Do you have any idea ?

Thanks.

Bernd
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.