Large number of long-lived open connections (Percona 8)

Hi everyone,
The current use case for the app I’m working on includes a single machine (32 GB ram) supporting a large number of long-lived connections (could last days) from client applications.

The problem is, after after fewer than 2000 connections, the server is unable to start new connections. I don’t think this is a machine issue, as the same machine was able to keep open up to 10K connections prior to upgrading to Percona 8.

htop shows no more than 10GB of memory used (out of 32), and load is 0.96, so nothing special

Current OS & Percona version:


Ubuntu 18.04.3 LTS
mysql Ver 8.0.16-7 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release '7', Revision '613e312')

my.cnf:
(notice I set the limits to really high values, even though these are not expected to be seen in production – it shouldn’t affect anything as long as they’re not reached, no?)

[mysqld]
max_connections = 16384
open_files_limit = 16384
table_open_cache = 16384
innodb_open_files = 16384
thread_cache_size = 2048


innodb_buffer_pool_size = 4G
innodb_buffer_pool_instances = 1
innodb_log_buffer_size = 64M
innodb_use_native_aio = 1
innodb_buffer_pool_instances = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 15000
innodb_io_capacity_max = 30000

error.log
(notice that a SELECT COUNT(*) also killed it)


2019-10-02T17:22:17.557873Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 737 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:22:17.557895Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:23:29.044965Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 532 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:23:29.044989Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:24:29.729184Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 579 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:24:29.729213Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:25:57.374747Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 427 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:25:57.374785Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:27:13.303634Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 213 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:27:13.303667Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:28:13.604014Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 687 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:28:13.604038Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:29:14.182797Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 364 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:29:14.182818Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:30:25.705593Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 338 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:30:25.705618Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
2019-10-02T17:31:39.472443Z 0 [ERROR] [MY-000000] [connection_h] Error log throttle: 446 'Can't create thread to handle new connection' error(s) suppressed
2019-10-02T17:31:39.472465Z 0 [ERROR] [MY-010249] [Server] Can't create thread to handle new connection(errno= 11)
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
17:32:17 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f5e74a20a40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f5d7c536d50 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x3d) [0x56469607c27d]
/usr/sbin/mysqld(handle_fatal_signal+0x303) [0x564695339433]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f602f760890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7f602d8cce97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7f602d8ce801]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957) [0x7f602e2c1957]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ab6) [0x7f602e2c7ab6]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92af1) [0x7f602e2c7af1]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_rethrow+0x49) [0x7f602e2c7d79]
/usr/sbin/mysqld(void std::vector<std::thread, std::allocator<std::thread> >::_M_realloc_insert<dberr_t (Reader<Key_reader, Key_reader_row>::*&)(unsigned long, mpmc_bq<Reader<Key_reader, Key_reader_row>::Ctx*>&, std::function<dberr_t (unsigned long, buf_block_t const*, unsigned char const*, dict_index_t*, row_prebuilt_t*)>&), Reader<Key_reader, Key_reader_row>*, unsigned long&, std::reference_wrapper<mpmc_bq<Reader<Key_reader, Key_reader_row>::Ctx*> >, std::reference_wrapper<std::function<dberr_t (unsigned/usr/sbin/mysqld(Reader<Key_reader, Key_reader_row>::start_parallel_load(std::function<dberr_t (unsigned long, buf_block_t const*, unsigned char const*, dict_index_t*, row_prebuilt_t*)>)+0x230) [0x564696326d40]
/usr/sbin/mysqld(Reader<Key_reader, Key_reader_row>::read(std::function<dberr_t (unsigned long, buf_block_t const*, unsigned char const*, dict_index_t*, row_prebuilt_t*)>&&)+0x382) [0x56469632e2f2]
/usr/sbin/mysqld(row_scan_index_for_mysql(row_prebuilt_t*, dict_index_t*, unsigned long, bool, unsigned long*)+0x364) [0x564696317f74]
/usr/sbin/mysqld(ha_innobase::records(unsigned long long*)+0x1cd) [0x5646961e9f3d]
/usr/sbin/mysqld(get_exact_record_count(QEP_TAB*, unsigned int, int*)+0xed) [0x5646951bc67d]
/usr/sbin/mysqld(end_send_count(JOIN*, QEP_TAB*)+0x7e) [0x5646951bef4e]
/usr/sbin/mysqld(JOIN::exec()+0x6e8) [0x5646951c2f08]
/usr/sbin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x2ee) [0x56469524fede]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x458) [0x564695257ea8]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x2445) [0x564695201eb5]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*, bool)+0x41b) [0x56469520492b]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x12c6) [0x5646952060f6]
/usr/sbin/mysqld(do_command(THD*)+0x1f9) [0x564695208709]
/usr/sbin/mysqld(+0x10a1320) [0x56469532a320]
/usr/sbin/mysqld(+0x1e9efff) [0x564696127fff]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f602f7556db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f602d9af88f]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f5e74f5a018): SELECT COUNT(*) FROM TABLE_NAME
Connection ID (thread ID): 256167
Status: NOT_KILLED

Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

I checked the usual places that the system’s max open files limits are set really high.

/etc/security/limits.conf


* soft nofile 1048576
* hard nofile 1048576
* soft nproc 1048576
* hard nproc 1048576

/lib/systemd/system/mysql.service
(notice the LimitNOFILE)


[Service]
User=mysql
Group=mysql
Type=notify
PermissionsStartOnly=true
ExecStartPre=/usr/share/mysql/mysql-systemd-start pre
EnvironmentFile=-/etc/default/mysql
ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS
TimeoutSec=600
LimitNOFILE=1048576