TokuDB load performance issue

Hi All,

We are experimenting TokuDB to help us manage our data in MySQL with a few billion records and close to 8TB data.

One of the biggest table has about 2bln records alone. So we decided to use that table to test TokuDB engine. We were all expecting much quicker load time and comparable if not faster query time.

The table has structure like below:
entity_id bigint
data_date date
value float

PK entity_id, date_date
PARTITION BY RANGE (to_days(data_date))

basically 15 calendar days of data in each partition.

Everyday, there are 6MM records, so each partition has about 60MM (6MM * 10 business days).

The first test we did was to load 300GB data into the table (about 1.2 billion records) using LOAD DATA INFILE command. We did the exact test against TokuDB and InnoDB. Strange enough, it took TokuDB 32 hours to load, when InnoDB and only took 10.5 hours. We have repeated the same test several times, the result is very consistent.

The compression obviously worked. TokuDB only took 1/3 of storage required by InnoDB. But the 3X load time is totally unexpected.

The test machine we used is a server with 8 cores, 32G RAM and SAN storage. We allocated 16G RAM to Toku and InnoDB during their respective tests. CPU and disk IO was low during both test. Only one of the CPU core reached to 60% usage. All other cores were around 10 - 15%.

In a shrink-down version of test (6 hours for Toku), noticed it spent lots of time ( 7,966,477 ms) on locktree: time waiting on lock escalation. Is this something normal?

Any other place we should check?



Have you tried experimenting with different values for “tokudb_loader_memory_size”, TokuDB defaults it to 100MB so you might see an increase in performance if you increase that variable.
[url]TokuDB Variables — Percona Server 5.7 Documentation