Big table, 340+ million rows, 42G table, SELECT has taken days,

Hi,

Any idea how can I improve the query performance for this situation?

I have a big table with 340 million rows, MyISAM, denormalized, as below:


+---------------------+--------------------------------------------+------+-----+------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------------------------------------+------+-----+------------------+-------+
| message_id | mediumint(8) | NO | PRI | -1 | |
| message_id_binary | binary(16) | NO | PRI | 0 | |
| UDH | varbinary(16) | NO | PRI | | |
| mo_mt | enum('MT','MO','MSF','MCB','PSF','NONSUB') | NO | | MT | |
| shortcode | varchar(16) | NO | | | |
| msisdn | varchar(32) | NO | | | |
| operator_code | varchar(32) | NO | | | |
| operator_name | varchar(32) | NO | | | |
| gateway | enum('CM','Globway','MIF','Jet') | NO | | CM | |
| content_type | varchar(20) | NO | | | |
| tariff | mediumint(8) | NO | | 0 | |
| country | varchar(2) | NO | MUL | | |
| send_or_received_dt | datetime | NO | | NULL | |
| scenario_id | varchar(32) | NO | | | |
| status_code | mediumint(8) | NO | | 0 | |
| final_status_dt | datetime | NO | | NULL | |
+---------------------+--------------------------------------------+------+-----+------------------+-------+

With indexes on below fields:


+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t | 0 | UNIQ | 1 | message_id | A | 18476 | NULL | NULL | | BTREE | | |
| t | 0 | UNIQ | 2 | message_id_binary | A | 68476650 | NULL | NULL | | BTREE | | |
| t | 0 | UNIQ | 3 | UDH | A | 342383253 | NULL | NULL | | BTREE | | |
| t | 1 | extractor | 1 | country | A | NULL | NULL | NULL | | BTREE | | |
| t | 1 | extractor | 2 | final_status_dt | A | NULL | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The following query is running for 697,554 seconds, and it’s still in “Sending date” stage:


SELECT
t.shortcode AS shortcode,
'' AS keyword,
'' AS service_name,
t.operator_code AS operator_code,
t.operator_name AS operator_name,
t.gateway AS gateway_name,

'Undefined' AS frequency_type,
'Undefined' AS content_type,

IF(t.tariff = 0, 'Free', 'Premium') AS billing_status,
t.tariff AS tariff_code,
UPPER(t.mo_mt) AS transaction_type,
'Content MT' AS message_type,

t.send_or_received_dt AS send_dt,
t.final_status_dt AS update_dt,
CASE
WHEN (t.status_code = 20) THEN 'Delivered'
WHEN (t.status_code IN (19, 37, 22)) THEN 'Pending'
ELSE 'Failed'
END AS dnstatus_type,

t.msisdn AS customer_code,
990000 AS etl_code,
0 AS subscription_id,
CONCAT(t.message_id, '_', HEX(t.message_id_binary), '_', HEX(t.UDH)) AS `hash`
FROM
cmlog.t
WHERE
t.country = 'ES' AND
t.final_status_dt > '2012-01-01 00:00:00'
ORDER BY
t.final_status_dt ASC

This is MySQL 5.6, running on CentOS, with below memory stat:


free -m
total used free shared buffers cached
Mem: 22137 21950 187 0 151 10533
-/+ buffers/cache: 11265 10872
Swap: 8191 1091 7100

and this is /etc/my.cnf:


# General
port = 3306
socket = /var/lib/mysql/mysql.sock
datadir = /var/lib/mysql
character-set-server = utf8
user = mysql
symbolic-links = 0
back_log = 50
max_connections = 100
max_connect_errors = 999999999
table_open_cache = 4096
max_allowed_packet = 32M
thread_cache_size = 128
open_files_limit = 32768
sql_mode = NO_ENGINE_SUBSTITUTION
default-storage-engine = MyISAM
skip-name-resolve
slave-net-timeout = 300

# Threading
thread_handling = one-thread-per-connection
thread_stack = 256K

# Query
query_cache_type = 0
query_cache_size = 0M
query_cache_limit = 1M
query_cache_min_res_unit = 4K

# MYISAM
key_buffer_size = 10240M
read_buffer_size = 1M
read_rnd_buffer_size = 2M
bulk_insert_buffer_size = 32M
join_buffer_size = 1M
sort_buffer_size = 4M
myisam_sort_buffer_size = 8192M
tmp_table_size = 1024M
max_heap_table_size = 1024M
myisam_repair_threads = 8
myisam-recover = DEFAULT

The data on disk:


-rw-rw----. 1 mysql mysql 42G May 4 15:30 t.MYD
-rw-rw----. 1 mysql mysql 22G May 4 15:30 t.MYI

I googled around to find some ways to improve this, first, by preloading the index to memory using:


load index into cache t ignore leaves;

OR by simply IGNORE INDEX, as 90% of the data in this table, belong to country=“ES”. But I cannot try these experiments one by one, because each will take days, so I’m looking for more advices before going on.

Any idea how to improve this?

Try to recreate extractor index with country and final_status_dt columns swapped.