Hi,
Any idea how can I improve the query performance for this situation?
I have a big table with 340 million rows, MyISAM, denormalized, as below:
+---------------------+--------------------------------------------+------+-----+------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------------------------------------+------+-----+------------------+-------+
| message_id | mediumint(8) | NO | PRI | -1 | |
| message_id_binary | binary(16) | NO | PRI | 0 | |
| UDH | varbinary(16) | NO | PRI | | |
| mo_mt | enum('MT','MO','MSF','MCB','PSF','NONSUB') | NO | | MT | |
| shortcode | varchar(16) | NO | | | |
| msisdn | varchar(32) | NO | | | |
| operator_code | varchar(32) | NO | | | |
| operator_name | varchar(32) | NO | | | |
| gateway | enum('CM','Globway','MIF','Jet') | NO | | CM | |
| content_type | varchar(20) | NO | | | |
| tariff | mediumint(8) | NO | | 0 | |
| country | varchar(2) | NO | MUL | | |
| send_or_received_dt | datetime | NO | | NULL | |
| scenario_id | varchar(32) | NO | | | |
| status_code | mediumint(8) | NO | | 0 | |
| final_status_dt | datetime | NO | | NULL | |
+---------------------+--------------------------------------------+------+-----+------------------+-------+
With indexes on below fields:
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t | 0 | UNIQ | 1 | message_id | A | 18476 | NULL | NULL | | BTREE | | |
| t | 0 | UNIQ | 2 | message_id_binary | A | 68476650 | NULL | NULL | | BTREE | | |
| t | 0 | UNIQ | 3 | UDH | A | 342383253 | NULL | NULL | | BTREE | | |
| t | 1 | extractor | 1 | country | A | NULL | NULL | NULL | | BTREE | | |
| t | 1 | extractor | 2 | final_status_dt | A | NULL | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The following query is running for 697,554 seconds, and it’s still in “Sending date” stage:
SELECT
t.shortcode AS shortcode,
'' AS keyword,
'' AS service_name,
t.operator_code AS operator_code,
t.operator_name AS operator_name,
t.gateway AS gateway_name,
'Undefined' AS frequency_type,
'Undefined' AS content_type,
IF(t.tariff = 0, 'Free', 'Premium') AS billing_status,
t.tariff AS tariff_code,
UPPER(t.mo_mt) AS transaction_type,
'Content MT' AS message_type,
t.send_or_received_dt AS send_dt,
t.final_status_dt AS update_dt,
CASE
WHEN (t.status_code = 20) THEN 'Delivered'
WHEN (t.status_code IN (19, 37, 22)) THEN 'Pending'
ELSE 'Failed'
END AS dnstatus_type,
t.msisdn AS customer_code,
990000 AS etl_code,
0 AS subscription_id,
CONCAT(t.message_id, '_', HEX(t.message_id_binary), '_', HEX(t.UDH)) AS `hash`
FROM
cmlog.t
WHERE
t.country = 'ES' AND
t.final_status_dt > '2012-01-01 00:00:00'
ORDER BY
t.final_status_dt ASC
This is MySQL 5.6, running on CentOS, with below memory stat:
free -m
total used free shared buffers cached
Mem: 22137 21950 187 0 151 10533
-/+ buffers/cache: 11265 10872
Swap: 8191 1091 7100
and this is /etc/my.cnf:
# General
port = 3306
socket = /var/lib/mysql/mysql.sock
datadir = /var/lib/mysql
character-set-server = utf8
user = mysql
symbolic-links = 0
back_log = 50
max_connections = 100
max_connect_errors = 999999999
table_open_cache = 4096
max_allowed_packet = 32M
thread_cache_size = 128
open_files_limit = 32768
sql_mode = NO_ENGINE_SUBSTITUTION
default-storage-engine = MyISAM
skip-name-resolve
slave-net-timeout = 300
# Threading
thread_handling = one-thread-per-connection
thread_stack = 256K
# Query
query_cache_type = 0
query_cache_size = 0M
query_cache_limit = 1M
query_cache_min_res_unit = 4K
# MYISAM
key_buffer_size = 10240M
read_buffer_size = 1M
read_rnd_buffer_size = 2M
bulk_insert_buffer_size = 32M
join_buffer_size = 1M
sort_buffer_size = 4M
myisam_sort_buffer_size = 8192M
tmp_table_size = 1024M
max_heap_table_size = 1024M
myisam_repair_threads = 8
myisam-recover = DEFAULT
The data on disk:
-rw-rw----. 1 mysql mysql 42G May 4 15:30 t.MYD
-rw-rw----. 1 mysql mysql 22G May 4 15:30 t.MYI
I googled around to find some ways to improve this, first, by preloading the index to memory using:
load index into cache t ignore leaves;
OR by simply IGNORE INDEX, as 90% of the data in this table, belong to country=“ES”. But I cannot try these experiments one by one, because each will take days, so I’m looking for more advices before going on.
Any idea how to improve this?