Well, I have been trying to find the source of a major bottleneck for the past few days.
What happens is apache connections spike and cause the server to grind to a halt as each apache connection tends to run a script which also opens a mysql connection. So things spiral into chaos as apache connections build up. These spikes occur intermittently and occur much more during peak time. The question is what is causing the spikes and the spiral into chaos as the server runs fine under normal heavy load between each spike.
In the past days I have switched from the Apache 2 prefork MPM to the worker MPM - what happens now is instead of the server grinding to a halt but still working, apache actually crashes and the error_log message is usually that MaxClients was reached. The crashing doesn’t matter though as Apache still needed to be restarted when I was running the prefork MPM anyway. This indicates to me that the underlying problem is still there and not an Apache problem.
The next thing I did was disable mysql query cache caching by default because I had a theory that mass invalidation of frequently modified tables might be causing serious problems and was often wasting memory because I have an APC caching system on the application level which supersedes the mysql query cache in many cases.
Now, I have looked extensively at my mysql slow queries log, mysql error_log and unix messages log and at the apache error_log around each crash. My slow query time is set to 1 second and I have caught most of the major slow queries I could find. I’ve even enabled and inspected the general query log. The two things I can say are that:
-
the apache error_log usually registers a “MaxClients reached. Consider raising MaxClients” error before the crash. MaxClients is set to 400 which is well above normal load. I’ve set it to 1000 and the crashes still occur, they just take longer and are more painful as the system starts thrashing.
-
the mysql error_log is registering hundreds of aborted connections around each crash. In some cases over 400 aborted connections in a second.
My server is a dual harpertown (8 cpus) with 12GB RAM and a 100mbit port. The database is on a dedicated SCSI hard drive and all static content is served by separate servers.
mysql> SHOW STATUS;±----------------------------------±-----------+| Variable_name | Value |±----------------------------------±-----------+| Aborted_clients | 44521 || Aborted_connects | 1118 || Binlog_cache_disk_use | 0 || Binlog_cache_use | 0 || Bytes_received | 115 || Bytes_sent | 181 || Com_admin_commands | 0 || Com_alter_db | 0 || Com_alter_table | 0 || Com_analyze | 0 || Com_backup_table | 0 || Com_begin | 0 || Com_call_procedure | 0 || Com_change_db | 0 || Com_change_master | 0 || Com_check | 0 || Com_checksum | 0 || Com_commit | 0 || Com_create_db | 0 || Com_create_function | 0 || Com_create_index | 0 || Com_create_table | 0 || Com_create_user | 0 || Com_dealloc_sql | 0 || Com_delete | 0 || Com_delete_multi | 0 || Com_do | 0 || Com_drop_db | 0 || Com_drop_function | 0 || Com_drop_index | 0 || Com_drop_table | 0 || Com_drop_user | 0 || Com_execute_sql | 0 || Com_flush | 0 || Com_grant | 0 || Com_ha_close | 0 || Com_ha_open | 0 || Com_ha_read | 0 || Com_help | 0 || Com_insert | 0 || Com_insert_select | 0 || Com_kill | 0 || Com_load | 0 || Com_load_master_data | 0 || Com_load_master_table | 0 || Com_lock_tables | 0 || Com_optimize | 0 || Com_preload_keys | 0 || Com_prepare_sql | 0 || Com_purge | 0 || Com_purge_before_date | 0 || Com_rename_table | 0 || Com_repair | 0 || Com_replace | 0 || Com_replace_select | 0 || Com_reset | 0 || Com_restore_table | 0 || Com_revoke | 0 || Com_revoke_all | 0 || Com_rollback | 0 || Com_savepoint | 0 || Com_select | 1 || Com_set_option | 0 || Com_show_binlog_events | 0 || Com_show_binlogs | 0 || Com_show_charsets | 0 || Com_show_collations | 0 || Com_show_column_types | 0 || Com_show_create_db | 0 || Com_show_create_table | 0 || Com_show_databases | 0 || Com_show_errors | 0 || Com_show_fields | 0 || Com_show_grants | 0 || Com_show_innodb_status | 0 || Com_show_keys | 0 || Com_show_logs | 0 || Com_show_master_status | 0 || Com_show_ndb_status | 0 || Com_show_new_master | 0 || Com_show_open_tables | 0 || Com_show_privileges | 0 || Com_show_processlist | 0 || Com_show_slave_hosts | 0 || Com_show_slave_status | 0 || Com_show_status | 1 || Com_show_storage_engines | 0 || Com_show_tables | 0 || Com_show_triggers | 0 || Com_show_variables | 0 || Com_show_warnings | 0 || Com_slave_start | 0 || Com_slave_stop | 0 || Com_stmt_close | 0 || Com_stmt_execute | 0 || Com_stmt_fetch | 0 || Com_stmt_prepare | 0 || Com_stmt_reset | 0 || Com_stmt_send_long_data | 0 || Com_truncate | 0 || Com_unlock_tables | 0 || Com_update | 0 || Com_update_multi | 0 || Com_xa_commit | 0 || Com_xa_end | 0 || Com_xa_prepare | 0 || Com_xa_recover | 0 || Com_xa_rollback | 0 || Com_xa_start | 0 || Compression | OFF || Connections | 3729736 || Created_tmp_disk_tables | 0 || Created_tmp_files | 543 || Created_tmp_tables | 1 || Delayed_errors | 2132890 || Delayed_insert_threads | 11 || Delayed_writes | 5390758 || Flush_commands | 1 || Handler_commit | 0 || Handler_delete | 0 || Handler_discover | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 0 || Handler_read_next | 0 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_next | 0 || Handler_rollback | 0 || Handler_savepoint | 0 || Handler_savepoint_rollback | 0 || Handler_update | 0 || Handler_write | 132 || Innodb_buffer_pool_pages_data | 124849 || Innodb_buffer_pool_pages_dirty | 51 || Innodb_buffer_pool_pages_flushed | 560603 || Innodb_buffer_pool_pages_free | 2 || Innodb_buffer_pool_pages_latched | 1 || Innodb_buffer_pool_pages_misc | 6221 || Innodb_buffer_pool_pages_total | 131072 || Innodb_buffer_pool_read_ahead_rnd | 249 || Innodb_buffer_pool_read_ahead_seq | 1146 || Innodb_buffer_pool_read_requests | 2613650366 || Innodb_buffer_pool_reads | 101110 || Innodb_buffer_pool_wait_free | 0 || Innodb_buffer_pool_write_requests | 5355760 || Innodb_data_fsyncs | 99895 || Innodb_data_pending_fsyncs | 0 || Innodb_data_pending_reads | 0 || Innodb_data_pending_writes | 0 || Innodb_data_read | 2121207808 || Innodb_data_reads | 110176 || Innodb_data_writes | 1271653 || Innodb_data_written | 2431035904 || Innodb_dblwr_pages_written | 560603 || Innodb_dblwr_writes | 11367 || Innodb_log_waits | 0 || Innodb_log_write_requests | 1914050 || Innodb_log_writes | 786942 || Innodb_os_log_fsyncs | 76904 || Innodb_os_log_pending_fsyncs | 0 || Innodb_os_log_pending_writes | 0 || Innodb_os_log_written | 1238141440 || Innodb_page_size | 16384 || Innodb_pages_created | 4899 || Innodb_pages_read | 129335 || Innodb_pages_written | 560603 || Innodb_row_lock_current_waits | 0 || Innodb_row_lock_time | 1016 || Innodb_row_lock_time_avg | 7 || Innodb_row_lock_time_max | 72 || Innodb_row_lock_waits | 138 || Innodb_rows_deleted | 40039 || Innodb_rows_inserted | 54930 || Innodb_rows_read | 1092053930 || Innodb_rows_updated | 658682 || Key_blocks_not_flushed | 0 || Key_blocks_unused | 84660 || Key_blocks_used | 135516 || Key_read_requests | 958821347 || Key_reads | 140233 || Key_write_requests | 3120784 || Key_writes | 2058025 || Last_query_cost | 0.000000 || Max_used_connections | 335 || Ndb_cluster_node_id | 0 || Ndb_config_from_host | || Ndb_config_from_port | 0 || Ndb_number_of_data_nodes | 0 || Not_flushed_delayed_rows | 0 || Open_files | 1158 || Open_streams | 0 || Open_tables | 1798 || Opened_tables | 0 || Prepared_stmt_count | 0 || Qcache_free_blocks | 10088 || Qcache_free_memory | 69483352 || Qcache_hits | 5381877 || Qcache_inserts | 1202186 || Qcache_lowmem_prunes | 7319 || Qcache_not_cached | 27700147 || Qcache_queries_in_cache | 21196 || Qcache_total_blocks | 53751 || Questions | 199318311 || Rpl_status | NULL || Select_full_join | 0 || Select_full_range_join | 0 || Select_range | 0 || Select_range_check | 0 || Select_scan | 1 || Slave_open_temp_tables | 0 || Slave_retried_transactions | 0 || Slave_running | OFF || Slow_launch_threads | 0 || Slow_queries | 0 || Sort_merge_passes | 0 || Sort_range | 0 || Sort_rows | 0 || Sort_scan | 0 || Table_locks_immediate | 49034075 || Table_locks_waited | 730 || Tc_log_max_pages_used | 0 || Tc_log_page_size | 0 || Tc_log_page_waits | 0 || Threads_cached | 33 || Threads_connected | 18 || Threads_created | 17038 || Threads_running | 4 || Uptime | 67595 || Uptime_since_flush_status | 67595 |±----------------------------------±-----------+230 rows in set (0.04 sec)mysql> SHOW VARIABLES;±--------------------------------±------------------------------+| Variable_name | Value |±--------------------------------±------------------------------+| auto_increment_increment | 1 || auto_increment_offset | 1 || automatic_sp_privileges | ON || back_log | 50 || basedir | / || binlog_cache_size | 32768 || bulk_insert_buffer_size | 8388608 || character_set_client | latin1 || character_set_connection | latin1 || character_set_database | latin1 || character_set_filesystem | binary || character_set_results | latin1 || character_set_server | latin1 || character_set_system | utf8 || character_sets_dir | /usr/share/mysql/charsets/ || collation_connection | latin1_swedish_ci || collation_database | latin1_swedish_ci || collation_server | latin1_swedish_ci || completion_type | 0 || concurrent_insert | 1 || connect_timeout | 10 || datadir | /database/mysql/ || date_format | %Y-%m-%d || datetime_format | %Y-%m-%d %H:%i:%s || default_week_format | 0 || delay_key_write | ON || delayed_insert_limit | 100 || delayed_insert_timeout | 300 || delayed_queue_size | 1000 || div_precision_increment | 4 || keep_files_on_create | OFF || engine_condition_pushdown | OFF || expire_logs_days | 0 || flush | OFF || flush_time | 0 || ft_boolean_syntax | + -><()~*:“”&| || ft_max_word_len | 84 || ft_min_word_len | 4 || ft_query_expansion_limit | 20 || ft_stopword_file | (built-in) || group_concat_max_len | 1024 || have_archive | YES || have_bdb | NO || have_blackhole_engine | YES || have_compress | YES || have_crypt | YES || have_csv | YES || have_dynamic_loading | YES || have_example_engine | YES || have_federated_engine | YES || have_geometry | YES || have_innodb | YES || have_isam | NO || have_merge_engine | YES || have_ndbcluster | DISABLED || have_openssl | NO || have_ssl | NO || have_query_cache | YES || have_raid | NO || have_rtree_keys | YES || have_symlink | YES || hostname | fpsbanana.arkerion.com || init_connect | || init_file | || init_slave | || innodb_additional_mem_pool_size | 1048576 || innodb_autoextend_increment | 8 || innodb_buffer_pool_awe_mem_mb | 0 || innodb_buffer_pool_size | 2147483648 || innodb_checksums | ON || innodb_commit_concurrency | 0 || innodb_concurrency_tickets | 500 || innodb_data_file_path | ibdata1:10M:autoextend || innodb_data_home_dir | /database/mysql || innodb_adaptive_hash_index | ON || innodb_doublewrite | ON || innodb_fast_shutdown | 1 || innodb_file_io_threads | 4 || innodb_file_per_table | OFF || innodb_flush_log_at_trx_commit | 2 || innodb_flush_method | || innodb_force_recovery | 0 || innodb_lock_wait_timeout | 50 || innodb_locks_unsafe_for_binlog | OFF || innodb_log_arch_dir | || innodb_log_archive | OFF || innodb_log_buffer_size | 1048576 || innodb_log_file_size | 5242880 || innodb_log_files_in_group | 2 || innodb_log_group_home_dir | ./ || innodb_max_dirty_pages_pct | 90 || innodb_max_purge_lag | 0 || innodb_mirrored_log_groups | 1 || innodb_open_files | 300 || innodb_rollback_on_timeout | OFF || innodb_support_xa | ON || innodb_sync_spin_loops | 20 || innodb_table_locks | ON || innodb_thread_concurrency | 8 || innodb_thread_sleep_delay | 10000 || interactive_timeout | 30 || join_buffer_size | 1048576 || key_buffer_size | 1073741824 || key_cache_age_threshold | 300 || key_cache_block_size | 1024 || key_cache_division_limit | 100 || language | /usr/share/mysql/english/ || large_files_support | ON || large_page_size | 0 || large_pages | OFF || lc_time_names | en_US || license | GPL || local_infile | ON || locked_in_memory | OFF || log | OFF || log_bin | OFF || log_bin_trust_function_creators | OFF || log_error | ./mysql_errors.log || log_queries_not_using_indexes | ON || log_slave_updates | OFF || log_slow_queries | ON || log_warnings | 2 || long_query_time | 1 || low_priority_updates | ON || lower_case_file_system | OFF || lower_case_table_names | 0 || max_allowed_packet | 33554432 || max_binlog_cache_size | 4294963200 || max_binlog_size | 1073741824 || max_connect_errors | 10 || max_connections | 400 || max_delayed_threads | 20 || max_error_count | 64 || max_heap_table_size | 16777216 || max_insert_delayed_threads | 20 || max_join_size | 18446744073709551615 || max_length_for_sort_data | 1024 || max_prepared_stmt_count | 16382 || max_relay_log_size | 0 || max_seeks_for_key | 4294967295 || max_sort_length | 1024 || max_sp_recursion_depth | 0 || max_tmp_tables | 32 || max_user_connections | 400 || max_write_lock_count | 4294967295 || multi_range_count | 256 || myisam_data_pointer_size | 6 || myisam_max_sort_file_size | 2146435072 || myisam_recover_options | OFF || myisam_repair_threads | 1 || myisam_sort_buffer_size | 67108864 || myisam_stats_method | nulls_unequal || ndb_autoincrement_prefetch_sz | 1 || ndb_force_send | ON || ndb_use_exact_count | ON || ndb_use_transactions | ON || ndb_cache_check_time | 0 || ndb_connectstring | || net_buffer_length | 16384 || net_read_timeout | 30 || net_retry_count | 10 || net_write_timeout | 60 || new | OFF || old_passwords | ON || open_files_limit | 4506 || optimizer_prune_level | 1 || optimizer_search_depth | 62 || pid_file | /database/mysql/mysqld.pid || plugin_dir | || port | 3306 || preload_buffer_size | 32768 || profiling | OFF || profiling_history_size | 15 || protocol_version | 10 || query_alloc_block_size | 8192 || query_cache_limit | 8388608 || query_cache_min_res_unit | 4096 || query_cache_size | 134217728 || query_cache_type | DEMAND || query_cache_wlock_invalidate | OFF || query_prealloc_size | 8192 || range_alloc_block_size | 4096 || read_buffer_size | 4194304 || read_only | OFF || read_rnd_buffer_size | 4194304 || relay_log | || relay_log_index | || relay_log_info_file | relay-log.info || relay_log_purge | ON || relay_log_space_limit | 0 || rpl_recovery_rank | 0 || secure_auth | OFF || secure_file_priv | || server_id | 1 || skip_external_locking | ON || skip_networking | OFF || skip_show_database | OFF || slave_compressed_protocol | OFF || slave_load_tmpdir | /tmp/ || slave_net_timeout | 3600 || slave_skip_errors | OFF || slave_transaction_retries | 10 || slow_launch_time | 2 || socket | /var/lib/mysql/mysql.sock || sort_buffer_size | 8388608 || sql_big_selects | ON || sql_mode | || sql_notes | ON || sql_warnings | OFF || ssl_ca | || ssl_capath | || ssl_cert | || ssl_cipher | || ssl_key | || storage_engine | MyISAM || sync_binlog | 0 || sync_frm | ON || system_time_zone | UTC || table_cache | 2048 || table_lock_wait_timeout | 50 || table_type | MyISAM || thread_cache_size | 40 || thread_stack | 196608 || time_format | %H:%i:%s || time_zone | SYSTEM || timed_mutexes | OFF || tmp_table_size | 134217728 || tmpdir | /tmp || transaction_alloc_block_size | 8192 || transaction_prealloc_size | 4096 || tx_isolation | REPEATABLE-READ || updatable_views_with_limit | YES || version | 5.0.67-community-log || version_comment | MySQL Community Edition (GPL) || version_compile_machine | i686 || version_compile_os | redhat-linux-gnu || wait_timeout | 30 |±--------------------------------±------------------------------+237 rows in set (0.01 sec)
I have also attached an example of the aborted connections. You can see from the time that they start to alleviate at the end of the log. What I noticed is that the error changes from “Got an error reading communication packets” to “Got timeout reading communication packets” some way through the log.
Can anyone help me? (
Thanks