Not the answer you need?
Register and ask your own question!

Percona cluster - one server crashed - memory bug?

francoisstarkfrancoisstark ContributorCurrent User Role Beginner
I have three servers in a cluster - running Percona XtraDB 5.6.22-72.0-56 on ubuntu 12.04, xeon systems with ECC ram and raid 5 arrays, 32GB ram each. Unlikely to be a hardware issue. This is what the error log said:

18:19:44 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Please help us make Percona XtraDB Cluster better by reporting any bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=37
max_threads=153
thread_count=19
connection_count=2
It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69252 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f2d18000990 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...

stack_bottom = 7f30005e0a70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8e811e]
/usr/sbin/mysqld(handle_fatal_signal+0x392)[0x65ffa2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f303d3e5cb0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ for_trx_v3EPNS_9TrxHandleE+0xa0)[0x7f302225a0f0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ trxs_upto_Elb+0x158)[0x7f302225b8c8]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM18proces s_commit_cutEll+0x85)[0x7f3022288215]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispa tchEPvRK10gcs_actionRb+0x405)[0x7f3022269d75]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7proce ssEPvRb+0x5e)[0x7f302226a8ee]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_ recvEPv+0x78)[0x7f302228f958]
/usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7f30222a4c8e] /usr/sbin/mysqld[0x5a491c]
/usr/sbin/mysqld(start_wsrep_THD+0x287)[0x58d247] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f303d3dde9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f303c8f88bd]

Trying to get some variables. Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 12
Status: NOT_KILLED

Comments

  • francoisstarkfrancoisstark Contributor Current User Role Beginner
    Trying to restart this server's mysql the SST fails.....
    I accidentally started it in bootstrap mode and it ran for about a minute by itself, causing some writes to the databases. Then I edited my.cnf and restarted mysql service, expecting a full SST but get this error:


    2015-04-28 12:53:54 45962 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.X.X.X' --auth 'sstuser:sdfgdfghdry56' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '45962' '' '
    WSREP_SST: [INFO] Streaming with tar (20150428 12:53:55.080)
    WSREP_SST: [INFO] Using socat as streamer (20150428 12:53:55.081)
    WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | tar xfi - --recursive-unlink -h; RC=( ${PIPESTATUS[@]} ) (20150428 12:53:55.110)
    2015-04-28 12:53:55 45962 [Note] WSREP: Prepared SST request: xtrabackup|10.x.X.X:4444/xtrabackup_sst
    2015-04-28 12:53:55 45962 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2015-04-28 12:53:55 45962 [Note] WSREP: REPL Protocols: 5 (3, 1)
    2015-04-28 12:53:55 45962 [Note] WSREP: Assign initial position for certification: 14184918, protocol version: 3
    2015-04-28 12:53:55 45962 [Note] WSREP: Service thread queue flushed.
    2015-04-28 12:53:55 45962 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.x.x.x:4568
    2015-04-28 12:53:55 45962 [Note] WSREP: Node 1.0 (perc1) requested state transfer from '*any*'. Selected 0.0 (dxss3)(SYNCED) as donor.
    2015-04-28 12:53:55 45962 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 14184918)
    2015-04-28 12:53:55 45962 [Note] WSREP: Requesting state transfer: success, donor: 0
    2015-04-28 12:53:55 45962 [Note] WSREP: 0.0 (dxss3): State transfer to 1.0 (dxss1) complete.
    2015-04-28 12:53:55 45962 [Note] WSREP: Member 0 (dxss3) synced with group.
    WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20150428 12:53:55.425)
    WSREP_SST: [INFO] Total time on joiner: 0 seconds (20150428 12:53:55.427)
    WSREP_SST: [INFO] Removing the sst_in_progress file (20150428 12:53:55.428)
    2015-04-28 12:53:55 45962 [Note] WSREP: SST complete, seqno: 14119148
    2015-04-28 12:53:55 45962 [Note] Plugin 'FEDERATED' is disabled.
    2015-04-28 12:53:55 45962 [Note] InnoDB: Using atomics to ref count buffer pool pages
    2015-04-28 12:53:55 45962 [Note] InnoDB: The InnoDB memory heap is disabled
    2015-04-28 12:53:55 45962 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
    2015-04-28 12:53:55 45962 [Note] InnoDB: Memory barrier is not used
    2015-04-28 12:53:55 45962 [Note] InnoDB: Compressed tables use zlib 1.2.3.4
    2015-04-28 12:53:55 45962 [Note] InnoDB: Using Linux native AIO
    2015-04-28 12:53:55 45962 [Note] InnoDB: Using CPU crc32 instructions
    2015-04-28 12:53:55 45962 [Note] InnoDB: Initializing buffer pool, size = 10.0G
    2015-04-28 12:53:55 45962 [Note] InnoDB: Completed initialization of buffer pool
    2015-04-28 12:53:55 45962 [Note] InnoDB: Highest supported file format is Barracuda.
    2015-04-28 12:53:56 45962 [Note] InnoDB: 128 rollback segment(s) are active.
    2015-04-28 12:53:56 45962 [Note] InnoDB: Waiting for purge to start
    2015-04-28 12:53:56 45962 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.22-72.0 started; log sequence number 30463102736
    2015-04-28 12:53:56 45962 [Note] RSA private key file not found: /var/lib/mysql//private_key.pem. Some authentication plugins will not work.
    2015-04-28 12:53:56 45962 [Note] RSA public key file not found: /var/lib/mysql//public_key.pem. Some authentication plugins will not work.
    2015-04-28 12:53:56 45962 [Note] Server hostname (bind-address): '*'; port: 3306
    2015-04-28 12:53:56 45962 [Note] IPv6 is available.
    2015-04-28 12:53:56 45962 [Note] - '::' resolves to '::';
    2015-04-28 12:53:56 45962 [Note] Server socket created on IP: '::'.
    2015-04-28 12:53:56 45962 [Note] Event Scheduler: Loaded 0 events
    2015-04-28 12:53:56 45962 [Note] WSREP: Signalling provider to continue.
    2015-04-28 12:53:56 45962 [Note] WSREP: inited wsrep sidno 1
    2015-04-28 12:53:56 45962 [Note] WSREP: SST received: 5b18cbf7-sdfgsdfgsdfg8379-11e3-92395df271:14119148
    2015-04-28 12:53:56 45962 [Note] WSREP: Receiving IST: 65770 writesets, seqnos 14119148-14184918
    2015-04-28 12:53:56 45962 [Note] /usr/sbin/mysqld: ready for connections.
    Version: '5.6.22-72.0-56' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel72.0, Revision 978, WSREP version 25.8, wsrep_25.8.r4150
    2015-04-28 12:53:56 45962 [ERROR] Slave SQL: Could not execute Delete_rows event on table dxss.codesecrets; Can't find record in 'codesecrets', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 213, Error_code: 1032
    2015-04-28 12:53:56 45962 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 14119181
    2015-04-28 12:53:56 45962 [Warning] WSREP: Failed to apply app buffer: seqno: 14119181, status: 1
    at galera/src/trx_handle.cpp:apply():340
    Retrying 2th time
    2015-04-28 12:53:56 45962 [ERROR] Slave SQL: Could not execute Delete_rows event on table dxss.codesecrets; Can't find record in 'codesecrets', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 213, Error_code: 1032
    2015-04-28 12:53:56 45962 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 14119181
    2015-04-28 12:53:56 45962 [Warning] WSREP: Failed to apply app buffer: seqno: 14119181, status: 1
    at galera/src/trx_handle.cpp:apply():340
    Retrying 3th time
    2015-04-28 12:53:56 45962 [ERROR] Slave SQL: Could not execute Delete_rows event on table dxss.codesecrets; Can't find record in 'codesecrets', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 213, Error_code: 1032
    2015-04-28 12:53:56 45962 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 14119181
    2015-04-28 12:53:56 45962 [Warning] WSREP: Failed to apply app buffer: seqno: 14119181, status: 1
    at galera/src/trx_handle.cpp:apply():340
    Retrying 4th time
    2015-04-28 12:53:56 45962 [ERROR] Slave SQL: Could not execute Delete_rows event on table dxss.codesecrets; Can't find record in 'codesecrets', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 213, Error_code: 1032
    2015-04-28 12:53:56 45962 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 14119181
    2015-04-28 12:53:56 45962 [ERROR] WSREP: receiving IST failed, node restart required: Failed to apply trx 14119181 4 times
    2015-04-28 12:53:56 45962 [Note] WSREP: Closing send monitor...

    It seems I must delete the local innodb data files and somehow force a complete SST. The other two systems in the cluster are running fine.

    Thanks
    Francois
  • francoisstarkfrancoisstark Contributor Current User Role Beginner
    Still trying to join the cluster. I deleted all GRA and grastate files from /var/lib/mysql. This forces a SST in stead of a IST. Deleted the related database directories as well and tried to join. The joiner gives me this error:

    WSREP_SST: [ERROR] xtrabackup process ended without creating '/var/lib/mysql//xtrabackup_galera_info'

    It created three other files in /var/lib/mysql:
    xtrabackup_checkpoints
    xtrabackup_info
    xtrabackup_logfile

    aaaaaaand problem found: The failed system had this line in my.cnf:
    wsrep_sst_method=xtrabackup

    While in the other two running systems, this line was commented out. So I commented out the line, deleted the grastate file again and reqstarted mysql service. It forced a full SST, and from what I can see in the logfile, the default sst method is now set to xtrabackup-v2

    Problem solved. For now. The original crash still baffles.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.