Mongo DB crashing due to unspecified error

Hi Team,

We faced a similar issue similar to below. We are using V4.4.23

Can you please help with it

Helo Sai,

Well hard to tell with just a below piece of information. Bye the way, did you noticed the exact exception in mongo logs which you have referenced ?

{"t":{"$date":"2021-11-22T21:59:53.801+00:00"},"s":"I", "c":"CONTROL", "id":31430, "ctx":"conn772041","msg":"Error collecting stack trace: {err}","attr":{"err":"unw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\nunw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\n"}}

May be better if you can provide us the complete database error log file (“mongod.log”) along with the OS/Kernel logs (/var/log/messages or sudo dmesg -T ) if any.

Have you verify the health of the OS (CPU/Memory) and other components like Disk, IOPS was fine around the issue time period ?

Did you face similar issue on any other ReplicaSet node as well ? Have there been any recent configuration changes, new deployments, or any OS patching carried out ?

Regards,
Anil

Hi @anil.joshi

This is the exact one I found. The messages has been auto cleaned

We use Amazon linux 2 with instance type m5.large

CPU usage and memory were at 20% and 60 % respectively

{"t":{"$date":"2023-10-29T04:54:52.282+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"AuthorizationManager-41","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
{"t":{"$date":"2023-10-29T04:54:52.364+00:00"},"s":"E",  "c":"CONTROL",  "id":31430,   "ctx":"AuthorizationManager-41","msg":"Error collecting stack trace","attr":{"error":"unw_get_proc_name(7F8FD9CC38E0): unspecified (general) error\nunw_get_proc_name(7F8FD9938CA0): unspecified (general) error\nunw_get_proc_name(7F8FD993A148): unspecified (general) error\nunw_get_proc_name(7F8FD9931A4A): unspecified (general) error\nunw_get_proc_name(7F8FD9931AC2): unspecified (general) error\nunw_get_proc_name(7F8FD9CB944B): unspecified (general) error\nunw_get_proc_name(7F8FD99F452F): unspecified (general) error\nunw_get_proc_name(7F8FD9CC38E0): unspecified (general) error\nunw_get_proc_name(7F8FD9938CA0): unspecified (general) error\nunw_get_proc_name(7F8FD993A148): unspecified (general) error\nunw_get_proc_name(7F8FD9931A4A): unspecified (general) error\nunw_get_proc_name(7F8FD9931AC2): unspecified (general) error\nunw_get_proc_name(7F8FD9CB944B): unspecified (general) error\nunw_get_proc_name(7F8FD99F452F): unspecified (general) error\n"}}
{"t":{"$date":"2023-10-29T04:54:52.364+00:00"},"s":"I",  "c":"CONTROL",  "id":31431,   "ctx":"AuthorizationManager-41","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"559139A226C6","b":"5591369FE000","o":"30246C6","s":"_ZN5mongo34StackTraceAddressMetadataGenerator4loadEPv","s+":"646"},{"a":"559139A241F9","b":"5591369FE000","o":"30261F9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"559139A21456","b":"5591369FE000","o":"3023456","s":"_ZN5mongo16stackTraceSignalEv","s+":"406"},{"a":"7F8FD9CC38E0","b":"7F8FD9CB2000","o":"118E0"},{"a":"7F8FD9938CA0","b":"7F8FD9905000","o":"33CA0","s":"gsignal","s+":"110"},{"a":"7F8FD993A148","b":"7F8FD9905000","o":"35148","s":"abort","s+":"148"},{"a":"7F8FD9931A4A","b":"7F8FD9905000","o":"2CA4A"},{"a":"7F8FD9931AC2","b":"7F8FD9905000","o":"2CAC2"},{"a":"7F8FDB4A10A5","b":"7F8FDB497000","o":"A0A5","s":"ber_sockbuf_ctrl","s+":"155"},{"a":"7F8FDB6CB4CB","b":"7F8FDB6A6000","o":"254CB","s":"ldap_int_connect_cbs","s+":"2B"},{"a":"7F8FDB6CB7D0","b":"7F8FDB6A6000","o":"257D0","s":"ldap_connect_to_host","s+":"1E0"},{"a":"7F8FDB6B693B","b":"7F8FDB6A6000","o":"1093B","s":"ldap_int_open_connection","s+":"6B"},{"a":"7F8FDB6C9048","b":"7F8FDB6A6000","o":"23048","s":"ldap_new_connection","s+":"1A8"},{"a":"7F8FDB6B610A","b":"7F8FDB6A6000","o":"1010A","s":"ldap_open_defconn","s+":"2A"},{"a":"7F8FDB6CA263","b":"7F8FDB6A6000","o":"24263","s":"ldap_send_initial_request","s+":"143"},{"a":"7F8FDB6BA2AE","b":"7F8FDB6A6000","o":"142AE","s":"ldap_pvt_search","s+":"10E"},{"a":"7F8FDB6BA3DE","b":"7F8FDB6A6000","o":"143DE","s":"ldap_pvt_search_s","s+":"4E"},{"a":"7F8FDB6BA480","b":"7F8FDB6A6000","o":"14480","s":"ldap_search_ext_s","s+":"20"},{"a":"559138988865","b":"5591369FE000","o":"1F8A865","s":"_ZN5mongo15LDAPManagerImpl9execQueryERKSsbRSt6vectorISsSaISsEE","s+":"1D5"},{"a":"559138989395","b":"5591369FE000","o":"1F8B395","s":"_ZN5mongo15LDAPManagerImpl14queryUserRolesERKNS_8UserNameERN4absl13node_hash_setINS_8RoleNameENS4_13hash_internal4HashIS6_EESt8equal_toIS6_ESaIS6_EEE","s+":"3E5"},{"a":"5591387B4466","b":"5591369FE000","o":"1DB6466","s":"_ZN5mongo30AuthzManagerExternalStateLocal18getUserDescriptionEPNS_16OperationContextERKNS_11UserRequestEPNS_7BSONObjE","s+":"BD6"},{"a":"5591387BE363","b":"5591369FE000","o":"1DC0363","s":"_ZN5mongo24AuthorizationManagerImpl13UserCacheImpl7_lookupEPNS_16OperationContextERKNS_11UserRequestE","s+":"203"},{"a":"5591387D5BCE","b":"5591369FE000","o":"1DD7BCE","s":"_ZN5mongo14future_details10statusCallIRZZNS_16ReadThroughCacheINS_11UserRequestENS_4UserEE28_asyncLookupWhileInvalidatedESt11unique_lockINS_12latch_detail5MutexEERNS5_16InProgressLookupEENUlPNS_16OperationContextERKNS_6StatusEE0_clESD_SG_EUlvE_JNS0_8FakeVoidEEEEDaOT_DpOT0_","s+":"7E"},{"a":"5591387D5D3C","b":"5591369FE000","o":"1DD7D3C","s":"_ZN5mongo7PromiseINS_4UserEE7setWithIZZNS_16ReadThroughCacheINS_11UserRequestES1_E28_asyncLookupWhileInvalidatedESt11unique_lockINS_12latch_detail5MutexEERNS6_16InProgressLookupEENUlPNS_16OperationContextERKNS_6StatusEE0_clESE_SH_EUlvE_Li0EEEvOT_","s+":"4C"},{"a":"5591387D5F6A","b":"5591369FE000","o":"1DD7F6A","s":"_ZZN5mongo15unique_functionIFvPNS_16OperationContextERKNS_6StatusEEE8makeImplIZNS_16ReadThroughCacheINS_11UserRequestENS_4UserEE28_asyncLookupWhileInvalidatedESt11unique_lockINS_12latch_detail5MutexEERNSC_16InProgressLookupEEUlS2_S5_E0_EEDaOT_EN12SpecificImpl4callEOS2_S5_","s+":"4A"},{"a":"5591395274EA","b":"5591369FE000","o":"2B294EA","s":"_ZN5mongo20ReadThroughCacheBase11CancelToken9tryCancelEv","s+":"34A"},{"a":"559139528822","b":"5591369FE000","o":"2B2A822","s":"_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockINS_12latch_detail5LatchEE","s+":"132"},{"a":"55913952B903","b":"5591369FE000","o":"2B2D903","s":"_ZN5mongo10ThreadPool13_consumeTasksEv","s+":"83"},{"a":"55913952C6B6","b":"5591369FE000","o":"2B2E6B6","s":"_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKSs","s+":"E6"},{"a":"55913952CAC0","b":"5591369FE000","o":"2B2EAC0","s":"_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKSs","s+":"4F0"},{"a":"559139E62DCF","b":"5591369FE000","o":"3464DCF","s":"__cxa_init_primary_exception","s+":"33F"},{"a":"7F8FD9CB944B","b":"7F8FD9CB2000","o":"744B"},{"a":"7F8FD99F452F","b":"7F8FD9905000","o":"EF52F","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"4.4.23-22","gitVersion":"6dbe14d25e2a4ba0515610749b1afe4119b06c42","compiledModules":[],"uname":{"sysname":"Linux","release":"4.14.106-97.85.amzn2.x86_64","version":"#1 SMP Fri Mar 15 17:07:54 UTC 2019","machine":"x86_64"},"somap":[{"b":"5591369FE000","elfType":3,"buildId":"018257FAD89D040158C984792AF949E594765390"},{"b":"7F8FDB6A6000","path":"/lib64/libldap-2.4.so.2","elfType":3,"buildId":"5F7B9963CCDE3E404EB03758E5EDD11F0F14F77D"},{"b":"7F8FDB497000","path":"/lib64/liblber-2.4.so.2","elfType":3,"buildId":"184B7AEC24616F47793C34BF0A9FE70219552678"},{"b":"7F8FD9CB2000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"BC2E8D5CDFB0A3CC6DB42A136DD1BB61AF8EED99"},{"b":"7F8FD9905000","path":"/lib64/libc.so.6","elfType":3,"buildId":"140E425DB38E5E4C2BFA7E56F3609E707B850AC5"}]}}}}

Hi Sai,

Thanks for confirming the below details.

We use Amazon linux 2 with instance type m5.large
CPU usage and memory were at 20% and 60 % respectively

It would be helpful if you could provide some preceding lines to better analyse the triggering patterns such as any warnings or suspicious messages. Alternatively, you may consider attaching the complete “mongod.log” file for a broader review as well.

{"t":{"$date":"2023-10-29T04:54:52.282+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"AuthorizationManager-41","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
{"t":{"$date":"2023-10-29T04:54:52.364+00:00"},"s":"E",  "c":"CONTROL",  "id":31430,   "ctx":"AuthorizationManager-41","msg":"**Error collecting stack trace**","attr":{"error":"unw_get_proc_name(7F8FD9CC38E0): unspecified (general) error\nunw_get_proc_name(7F8FD9938CA0): unspecified (general) error\nunw_get_proc_name(7F8FD993A148): unspecified (general) error\nunw_get_proc_name(7F8FD9931A4A): unspecified (general) error\nunw_get_proc_name(7F8FD9931AC2): unspecified (general) error\nunw_get_proc_name(7F8FD9CB944B): unspecified (general) error\nunw_get_proc_name(7F8FD99F452F): unspecified (general) error\nunw_get_proc_name(7F8FD9CC38E0): unspecified (general) error\nunw_get_proc_name(7F8FD9938CA0): unspecified (general) error\nunw_get_proc_name(7F8FD993A148): unspecified (general) error\nunw_get_proc_name(7F8FD9931A4A): unspecified (general)

Is the Node still crashing or it was just a one time observation ?

We see some bug report - https://jira.mongodb.org/browse/SERVER-50971 with respect to heavy writes performed on a collection with TTL index in MongoDB 4.4. May be you can consider upgrading to the latest MongoDB(4.4.25) which got some more fixes.

Did you performed any DDL (Index Add) or observe any particular query/workload running before the issue ?

Could you please share the kernel logs ( /var/log/messages or sudo dmesg -T) as well to have a quick look over there ?

Regards,
Anil