Not the answer you need?
Register and ask your own question!

Big table, 340+ million rows, 42G table, SELECT has taken days,

amirbehzadamirbehzad EntrantInactive User Role Beginner
Hi,

Any idea how can I improve the query performance for this situation?

I have a big table with 340 million rows, MyISAM, denormalized, as below:
+---------------------+--------------------------------------------+------+-----+------------------+-------+
| Field               | Type                                       | Null | Key | Default          | Extra |
+---------------------+--------------------------------------------+------+-----+------------------+-------+
| message_id          | mediumint(8)                               | NO   | PRI | -1               |       |
| message_id_binary   | binary(16)                                 | NO   | PRI | 0                |       |
| UDH                 | varbinary(16)                              | NO   | PRI |                  |       |
| mo_mt               | enum('MT','MO','MSF','MCB','PSF','NONSUB') | NO   |     | MT               |       |
| shortcode           | varchar(16)                                | NO   |     |                  |       |
| msisdn              | varchar(32)                                | NO   |     |                  |       |
| operator_code       | varchar(32)                                | NO   |     |                  |       |
| operator_name       | varchar(32)                                | NO   |     |                  |       |
| gateway             | enum('CM','Globway','MIF','Jet')           | NO   |     | CM               |       |
| content_type        | varchar(20)                                | NO   |     |                  |       |
| tariff              | mediumint(8)                               | NO   |     | 0                |       |
| country             | varchar(2)                                 | NO   | MUL |                  |       |
| send_or_received_dt | datetime                                   | NO   |     | NULL             |       |
| scenario_id         | varchar(32)                                | NO   |     |                  |       |
| status_code         | mediumint(8)                               | NO   |     | 0                |       |
| final_status_dt     | datetime                                   | NO   |     | NULL             |       |
+---------------------+--------------------------------------------+------+-----+------------------+-------+

With indexes on below fields:
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name  | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t     |          0 | UNIQ      |            1 | message_id        | A         |       18476 |     NULL | NULL   |      | BTREE      |         |               |
| t     |          0 | UNIQ      |            2 | message_id_binary | A         |    68476650 |     NULL | NULL   |      | BTREE      |         |               |
| t     |          0 | UNIQ      |            3 | UDH               | A         |   342383253 |     NULL | NULL   |      | BTREE      |         |               |
| t     |          1 | extractor |            1 | country           | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
| t     |          1 | extractor |            2 | final_status_dt   | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The following query is running for 697,554 seconds, and it's still in "Sending date" stage:
SELECT
    t.shortcode                             AS shortcode,
    ''                                      AS keyword,
    ''                                      AS service_name,
    t.operator_code                         AS operator_code,
    t.operator_name                         AS operator_name,
    t.gateway                               AS gateway_name,

    'Undefined'                             AS frequency_type,
    'Undefined'                             AS content_type,

    IF(t.tariff = 0, 'Free', 'Premium')     AS billing_status,
    t.tariff                                AS tariff_code,
    UPPER(t.mo_mt)                          AS transaction_type,
    'Content MT'                            AS message_type,

    t.send_or_received_dt                   AS send_dt,
    t.final_status_dt                       AS update_dt,
    CASE
        WHEN (t.status_code = 20)            THEN 'Delivered'
        WHEN (t.status_code IN (19, 37, 22)) THEN 'Pending'
        ELSE 'Failed'
    END                                     AS dnstatus_type,

    t.msisdn                                AS customer_code,
    990000                                       AS etl_code,
    0                                       AS subscription_id,
    CONCAT(t.message_id, '_', HEX(t.message_id_binary), '_', HEX(t.UDH)) AS `hash`
FROM
    cmlog.t
WHERE
    t.country = 'ES' AND
    t.final_status_dt > '2012-01-01 00:00:00'
ORDER BY
    t.final_status_dt ASC

This is MySQL 5.6, running on CentOS, with below memory stat:
free -m
             total       used       free     shared    buffers     cached
Mem:         22137      21950        187          0        151      10533
-/+ buffers/cache:      11265      10872
Swap:         8191       1091       7100

and this is /etc/my.cnf:
# General
port                            = 3306
socket                          = /var/lib/mysql/mysql.sock
datadir                         = /var/lib/mysql
character-set-server            = utf8
user                            = mysql
symbolic-links                  = 0
back_log                        = 50
max_connections                 = 100
max_connect_errors              = 999999999
table_open_cache                = 4096
max_allowed_packet              = 32M
thread_cache_size               = 128
open_files_limit                = 32768
sql_mode            = NO_ENGINE_SUBSTITUTION
default-storage-engine        = MyISAM
skip-name-resolve
slave-net-timeout        = 300

# Threading
thread_handling                 = one-thread-per-connection
thread_stack                    = 256K

# Query
query_cache_type                = 0
query_cache_size                = 0M
query_cache_limit               = 1M
query_cache_min_res_unit        = 4K

# MYISAM
key_buffer_size                 = 10240M
read_buffer_size                = 1M
read_rnd_buffer_size            = 2M
bulk_insert_buffer_size         = 32M
join_buffer_size                = 1M
sort_buffer_size                = 4M
myisam_sort_buffer_size         = 8192M
tmp_table_size                  = 1024M
max_heap_table_size        = 1024M
myisam_repair_threads           = 8
myisam-recover                  = DEFAULT

The data on disk:
-rw-rw----. 1 mysql mysql  42G May  4 15:30 t.MYD
-rw-rw----. 1 mysql mysql  22G May  4 15:30 t.MYI

I googled around to find some ways to improve this, first, by preloading the index to memory using:
load index into cache t ignore leaves;

OR by simply IGNORE INDEX, as 90% of the data in this table, belong to country="ES". But I cannot try these experiments one by one, because each will take days, so I'm looking for more advices before going on.

Any idea how to improve this?

Comments

  • DenisMDenisM Contributor Current User Role Beginner
    Try to recreate extractor index with country and final_status_dt columns swapped.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.