Not the answer you need?
Register and ask your own question!

Joining cluster fails because of SST timeout

DreeDree ContributorInactive User Role Beginner
I'm running into the same problem as this topic:
https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster/44077-xtradb-cluster-keeps-failing-to-join-cluster

After 9000 seconds SST is stopped, so nodes can no longer join my cluster, as it's too big now to complete in time.
Is there a fix for this yet?

Comments

  • bdelmedicobdelmedico Contributor Current User Role Beginner
    Man, change the systemd timeout
  • DreeDree Contributor Inactive User Role Beginner
    It's not the systemd timeout, it's a timeout in SST.
  • bdelmedicobdelmedico Contributor Current User Role Beginner
    Kkk I'll tell you a little about my replication problems .. hope it helps ..

    I had a lot of timeout problem ..

    1 - systemd timeout in mysql start
    2 - I discovered with the help of Rene that the next bottleneck was my firewall that was generating timeout when it got the processor in 100%, with that it knocked down all the connections.
    3 - I closed a VPC with aws
    4 - timeout settings within my.cnf (wsrep_provider_options = " gcs.max_packet_size=1048576; evs.send_window=512; evs.user_send_window=512; evs.inactive_timeout = PT90S; evs.suspect_timeout = PT30S; evs.install_timeout = PT60S; evs.keepalive_period = PT6S; evs.max_install_timeouts = 8 ")
    5 - memory confguration problems in joiner server my.cnf
    6 - to run without crashes I upgraded the insternet link from 10Mb to 50Mb.

    I think that was all that ... kkkk but solved my problems ... today my bank of 80G takes 240 minutes to replicate everything, this nor generate any line of warning in the logs.

    Besides that I did tuning the operating system.
    net.core.somaxconn = 1024
    net.core.netdev_max_backlog = 5000
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    net.ipv4.tcp_wmem = 4096 12582912 16777216
    net.ipv4.tcp_rmem = 4096 12582912 16777216
    net.ipv4.tcp_max_syn_backlog = 8096
    net.ipv4.tcp_slow_start_after_idle = 0
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.ip_local_port_range = 10240 65535

    fs.file-max=200000
    kernel.sem=250 32000 100 1024
    kernel.shmmax=4294967295
    ######
    net.ipv4.tcp_retries2 = 2

    #net.ipv4.tcp_syn_retries = 0
    net.ipv4.tcp_synack_retries = 0

    net.ipv4.tcp_keepalive_time = 30
    net.ipv4.tcp_keepalive_intvl = 1
    net.ipv4.tcp_keepalive_probes = 2

    vm.swappiness = 0
    vm.dirty_ratio = 80
    vm.dirty_background_ratio = 5
    vm.dirty_expire_centisecs = 12000
  • DreeDree Contributor Inactive User Role Beginner
    I just switched to MariaDB Galera Cluster, which doesn't seem to have this timeout. It's working fine on that.

    Thanks for your very detailed answer though!
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.