script with pt-heartbeat becomes zombie

Hi,

i’m currently testing replication of a MySQL DB. MySQL is version 5.5.49, Server is SLES 11 SP4. I have installed percona-toolkit 2.2.16-1.
I’m currently testing pt-heartbeat. I have a script which starts pt-heartbeat:

sunhb65278:~ # cat /root/skripte/heartbeat.sh
#!/bin/bash

pidof perl /usr/bin/pt-heartbeat > /dev/null
rueck=$?

if [ $rueck -ne 0 ]; then
pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
sleep 5
fi

temp=$(pt-heartbeat -h sunhb58820-2 --user checksum --password checksum --check --database percona --master-server-id 10352397)

diff1=${temp#*.}
# Zahl vor dem Punkt

diff2=${temp%.*}
# Zahl hinter dem Punkt

if [ $diff1 -gt 0 ]; then
mail -s "pt-heartbeat on $HOSTNAME fehlgeschlagen" bernd.lentes&#64;helmholtz-muenchen.de << EOT
Achtung ! Slave hängt $temp Sekunden hinter dem Master !
EOT
exit
fi

if [ $diff2 -gt 0 ]; then
mail -s "pt-heartbeat on $HOSTNAME fehlgeschlagen" bernd.lentes&#64;helmholtz-muenchen.de << EOT
Achtung ! Slave hängt $temp Sekunden hinter dem Master !
EOT
exit
fi

The script is called by cron every minute.

As you see the script first tries if pt-heartbeat is already running, if not it starts.
If i watch the processes, this happen:

TIME:15:51:01
root 31532 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:02
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15008 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31546 0.0 0.0 4552 544 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:03
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31553 0.0 0.0 4552 544 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:04
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31560 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:05
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31567 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:06
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31574 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:07
root 31535 0.0 0.0 11320 1400 ? Ss 15:51 0:00 /bin/bash /root/skripte/heartbeat.sh
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31576 33.3 0.0 83044 17924 ? S 15:51 0:00 perl /usr/bin/pt-heartbeat -h sunhb58820-2 --user checksum --password checksum --check --database percona --master-server-id 10352397
root 31582 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:08
root 31535 0.0 0.0 0 0 ? Zs 15:51 0:00 [heartbeat.sh] <defunct>
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31589 0.0 0.0 4552 548 pts/1 S+ 15:51 0:00 grep heartbeat

TIME:15:51:09
root 31535 0.0 0.0 0 0 ? Zs 15:51 0:00 [heartbeat.sh] <defunct>
root 31539 0.0 0.0 76732 15312 ? Ss 15:51 0:00 perl /usr/bin/pt-heartbeat -h 127.0.0.1 --user checksum --password checksum --update --database percona --daemonize
root 31596 0.0 0.0 4552 544 pts/1 S+ 15:51 0:00 grep heartbeat

First no process is running. Then cron starts the script. Why is the script /root/skripte/heartbeat.sh (pid 31535) becoming a zombie at 15:51:08 ?

Do you have any idea ?

Thanks.

Bernd