Cluster crashing quite often

In preparation for migrating our tokumx prod cluster to mongo3 cluster and before our percona visit in a couple weeks, we decided to stand up a dev and qa cluster. This was done this week and it is quite unstable, crashing several times a day.

Both environments have 3 machines. generally the same two will crash (the primary and one secondary) and the third will remain in a secondary state.

Machines are Red Hat Enterprise VMs. 8 CPU, 16gb ram. data directory is on a SAN.

This is the log entry during the crash. I looked on both machines and it appears to be the exact same trace dump

2016-01-27T18:36:10.526-0600 F - Got signal: 6 (Aborted).

0x10b6cd2 0x10b6583 0x10b694a 0x7f22377286a0 0x7f2237728625 0x7f2237729e05 0x15c0a23 0x161a714 0x15ddf3f 0x15df70d 0x161fe60 0x1644f36 0x7f2238c8ca51 0x7f22377de93d
----- BEGIN BACKTRACE -----
{“backtrace”:[{“b”:“400000”,“o”:“CB6CD2”},{“b”:“400000”,“o”:“CB6583”},{“b”:“400000”,“o”:“CB694A”},{“b”:“7F22376F6000”,“o”:“326A0”},{“b”:“7F22376F6000”,“o”:“32625”},{“b”:“7F22376F6000”,“o”:“33E05”},{“b”:“400000”,“o”:“11C0A23”},{“b”:“400000”,“o”:“121A714”},{“b”:“400000”,“o”:“11DDF3F”},{“b”:“400000”,“o”:“11DF70D”},{“b”:“400000”,“o”:“121FE60”},{“b”:“400000”,“o”:“1244F36”},{“b”:“7F2238C85000”,“o”:“7A51”},{“b”:“7F22376F6000”,“o”:“E893D”}],“processInfo”:{ “mongodbVersion” : “3.0.8”, “gitVersion” : “nogitversion”, “uname” : { “sysname” : “Linux”, “release” : “2.6.32-573.7.1.el6.x86_64”, “version” : “#1 SMP Thu Sep 10 13:42:16 EDT 2015”, “machine” : “x86_64” }, “somap” : [ { “elfType” : 2, “b” : “400000”, “buildId” : “7E06EF067281BA0E4AB5A7FDD89C759DFE5CEB71” }, { “b” : “7FFD648EF000”, “elfType” : 3, “buildId” : “2426D85978796C7ED259CDC601A7C310C339A21C” }, { “b” : “7F22392C9000”, “path” : “/usr/lib64/libsasl2.so.2”, “elfType” : 3, “buildId” : “E0AEE889D5BF1373F2F9EE0D448DBF3F5B5113F0” }, { “b” : “7F22390B3000”, “path” : “/lib64/libz.so.1”, “elfType” : 3, “buildId” : “D053BB4FF0C2FC983842F81598813B9B931AD0D1” }, { “b” : “7F2238EA2000”, “path” : “/lib64/libbz2.so.1”, “elfType” : 3, “buildId” : “1250B1D041DD7552F0C870BB188DC3A34DF2651D” }, { “b” : “7F2238C85000”, “path” : “/lib64/libpthread.so.0”, “elfType” : 3, “buildId” : “D467973C46E563CDCF64B5F12B2D6A50C7A25BA1” }, { “b” : “7F2238A19000”, “path” : “/usr/lib64/libssl.so.10”, “elfType” : 3, “buildId” : “93610457BCF424BEBBF1F3FB44E51B51B50F2B55” }, { “b” : “7F2238636000”, “path” : “/usr/lib64/libcrypto.so.10”, “elfType” : 3, “buildId” : “06DDBB192AF74F99DB58F2150BFB83F42F5EBAD3” }, { “b” : “7F223842E000”, “path” : “/lib64/librt.so.1”, “elfType” : 3, “buildId” : “58C5A5FF5C82D7BE3113BE36DD87C7004E3C4DB1” }, { “b” : “7F223822A000”, “path” : “/lib64/libdl.so.2”, “elfType” : 3, “buildId” : “B5AE05CEDC0CE917F50A3A468CFA2ACD8592E8F6” }, { “b” : “7F2237F24000”, “path” : “/usr/lib64/libstdc++.so.6”, “elfType” : 3, “buildId” : “28AF9321EBEA9D172CA43E11A60E02D0F7014870” }, { “b” : “7F2237CA0000”, “path” : “/lib64/libm.so.6”, “elfType” : 3, “buildId” : “989FE3A42CA8CEBDCC185A743896F23A0CF537ED” }, { “b” : “7F2237A8A000”, “path” : “/lib64/libgcc_s.so.1”, “elfType” : 3, “buildId” : “2AC15B051D1B8B53937E3341EA931D0E96F745D9” }, { “b” : “7F22376F6000”, “path” : “/lib64/libc.so.6”, “elfType” : 3, “buildId” : “A6D15926E61580E250ED91F84FF7517F3970CD83” }, { “b” : “7F22394E3000”, “path” : “/lib64/ld-linux-x86-64.so.2”, “elfType” : 3, “buildId” : “04202A4A8BE624D2193E812A25589E2DD02D5B5C” }, { “b” : “7F22374DC000”, “path” : “/lib64/libresolv.so.2”, “elfType” : 3, “buildId” : “F704FA7D21D05EF31E90FB4890FCA7F3D91DA138” }, { “b” : “7F22372A5000”, “path” : “/lib64/libcrypt.so.1”, “elfType” : 3, “buildId” : “128802B73016BE233837EA9F2DCBC2153ACC2D6A” }, { “b” : “7F2237061000”, “path” : “/lib64/libgssapi_krb5.so.2”, “elfType” : 3, “buildId” : “0C72521270790A1BD52C8F6B989EEA5A575085BF” }, { “b” : “7F2236D7A000”, “path” : “/lib64/libkrb5.so.3”, “elfType” : 3, “buildId” : “DC11D5D89BDC77FF242481122D51E5A08DB60DA8” }, { “b” : “7F2236B76000”, “path” : “/lib64/libcom_err.so.2”, “elfType” : 3, “buildId” : “13FFCD68952B7715DDF34C9321D82E3041EA9006” }, { “b” : “7F223694A000”, “path” : “/lib64/libk5crypto.so.3”, “elfType” : 3, “buildId” : “15782495E3AF093E67DDAE9A86436FFC6B3CC4D3” }, { “b” : “7F2236747000”, “path” : “/lib64/libfreebl3.so”, “elfType” : 3, “buildId” : “58BAC04A1DB3964A8F594EFFBE4838AD01214EDC” }, { “b” : “7F223653C000”, “path” : “/lib64/libkrb5support.so.0”, “elfType” : 3, “buildId” : “44A3A1C1891B4C8170C3DB80E7117A022E5EECD0” }, { “b” : “7F2236339000”, “path” : “/lib64/libkeyutils.so.1”, “elfType” : 3, “buildId” : “3BCCABE75DC61BBA81AAE45D164E26EF4F9F55DB” }, { “b” : “7F223611A000”, “path” : “/lib64/libselinux.so.1”, “elfType” : 3, “buildId” : “2D0F26E648D9661ABD83ED8B4BBE8F2CFA50393B” } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x10b6cd2]
mongod(+0xCB6583) [0x10b6583]
mongod(+0xCB694A) [0x10b694a]
libc.so.6(+0x326A0) [0x7f22377286a0]
libc.so.6(gsignal+0x35) [0x7f2237728625]
libc.so.6(abort+0x175) [0x7f2237729e05]
mongod(_Z23toku_ftnode_pf_callbackPvS_S_iP11pair_attr_s+0xAC3) [0x15c0a23]
mongod(_Z30toku_cachetable_pf_pinned_pairPvPFiS_S_S_iP11pair_attr_sES_P9cachefile10blocknum_sj+0x104) [0x161a714]
mongod(_Z24toku_ft_flush_some_childP2ftP6ftnodeP14flusher_advice+0x23F) [0x15ddf3f]
mongod(Z28toku_ftnode_cleaner_callbackPv10blocknum_sjS+0x1DD) [0x15df70d]
mongod(_ZN7cleaner11run_cleanerEv+0x270) [0x161fe60]
mongod(+0x1244F36) [0x1644f36]
libpthread.so.0(+0x7A51) [0x7f2238c8ca51]
libc.so.6(clone+0x6D) [0x7f22377de93d]
----- END BACKTRACE -----

One update. Our dev primary crashed. Brought it back up. started a small java app and as soon as it connected mongo crashed. repeated this 3 times. got it started again and didn’t start the java app…

Hi pocket,

Can you tell me the exact RHEL Version, architecture and exact package you are using (i.e. rpm -qf /usr/bin/mongod or tarball name)

–Dave

::::::::::::::
/etc/lsb-release
::::::::::::::
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
::::::::::::::
/etc/redhat-release
::::::::::::::
Red Hat Enterprise Linux Server release 6.7 (Santiago)
::::::::::::::
/etc/system-release
::::::::::::::
Red Hat Enterprise Linux Server release 6.7 (Santiago)

Percona-Server-MongoDB-server-3.0.8-1.2.el6.x86_64

I think it might be due to the java driver, which is 2.x, via gmongo (groovy wrapper around the java driver). However our qa environment seems to be more often than not. The dev machine crashes immediately upon connecting with this script. Our primary application does not crash it, so it may just be a coincidence.

I’m working now to rewrite the groovy app to use the newest driver, but the api has changed enough to be a pain :confused:

Not a driver problem apparently. I rewrote the script to use the mongo driver 3.2.1 (latest) and it still crashes immediately. To clarify it crashes upon a bulk write. it reads data first, which succeeds, then does a bulk delete and that call causes the crash.

Thanks, the specific package was helpful. We have traced it to an issue within the PerconaFT node cleaner so it’s definitely not a driver issue. If you could provide a core dump that would be helpful. Also, could you describe the operation load of your application that may be helpful.