Percona 5.6.14 a hundred times slower than Percona 5.5.27

I have a system that’s completely unusable after an upgrade to Percona server 5.6.

The table in question looks like this:

CREATE TABLE edges (
target bigint(20) unsigned NOT NULL,
version bigint(20) unsigned NOT NULL,
type bigint(20) unsigned NOT NULL,
src bigint(20) unsigned NOT NULL,
pos int(10) unsigned NOT NULL,
PRIMARY KEY (target,version,pos),
KEY version (version)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And the query:

select * from edges where (target=t1 and version=v1) or (target=t2 and version=v2) … order by target, version, pos

The table contains 50,000 rows. The where criterion are 500 (target,version) pairs per query.

Percona server 5.5: 0.0166 seconds
Percona server 5.6: 2.2346 seconds

Good:
id: 1
select_type: SIMPLE
table: edges
type: range
possible_keys: PRIMARY,version
key: PRIMARY
key_len: 16
ref: NULL
rows: 468
Extra: Using where
1 row in set (0.00 sec)

Bad:
id: 1
select_type: SIMPLE
table: edges
type: ref
possible_keys: PRIMARY,version
key: version
key_len: 8
ref: const
rows: 546
Extra: Using where
1 row in set (0.00 sec)

Unfortunately, this effect is not easily reproducible with random data.

Is there anything I should know about the new version or some specific setting I should pay attention to that influences index selection? I guess I could “use index (primary)”, but I find this distasteful. The queries have been in use as they are for many years without problems, starting with MySQL 4.0 or 4.1.

my.cnf: [URL=“0bin - encrypted pastebin”]0bin - encrypted pastebin

Hi,

As you can see Good query using Primary Key (cluster index) while Bad using secondary index, so Primary key search is always better then secondary in InnoDB tables. But still it might be possible that “version” index is used either because of bad statistics. Can you try to run ANALYZE TABLE couple of times on “edges” table and check again explain plan?

What is the value of use_index_extensions in optimizer_switch? Can you set it to ON/OFF and check explain plan again?
i.e set session optimizer_switch=“use_index_extensions=off”;

Thanks a lot for your suggestions! I’ve already run analyze table a few times before my original post, and I have just done so again now, but it doesn’t make a difference.

use_index_extensions=off does make a huge difference, though. It was on (seems to be the default), and now that I’ve switched it to “off”, the queries run smoothly again.

Thanks again!

Well, unfortunately, this only fixed my abstracted test case, not the real system. The latter is not much different, except that there are three very similar queries from three seperate similar tables, and these queries are run from 8 threads simultaneously.

I’ll see if I can adapt my test case for posting here.

I seem to have messed up my testing scenario, and all is well indeed when using “set session optimizer_switch=‘use_index_extensions=off’”. Expect some sample data and a Python script for reproduction a bit later.

A further unfortunate circumstance is that use_index_extension=off will only work after having executed “analyze table” after filling the table initially. Alternatively, running a few of these slow queries does the same, but only after a disconnect/reconnect. A session with slow queries will stay slow perpetually.

I’m desparately trying to create a test case for this, but so far to no avail… I hope I’ll get there eventually. At least, I can still easily reproduce it with our real product.

I have a test case now. It turns out that some queries take 30-50 seconds instead of the usual 0.02.

It doesn’t always get stuck, when I run it, but most of the time. I experimented with the optimizer_switch line a bit. Regardless of the use_index_extension setting, it will break down sooner or later.

I have set it up so that I can instantly switch between multiple versions of Percona server; they are operating on the same InnoDB files even. This is not necessary for reproduction, though.

Excerpt from output with Percona Server 5.5.27:

0.0422 seconds 515
0.0289 seconds 602
0.0466 seconds 760
0.0264 seconds 663
0.0412 seconds 652
0.0455 seconds 551
0.0298 seconds 644
0.0437 seconds 623
0.0447 seconds 704
0.0446 seconds 567
0.0253 seconds 546
0.0376 seconds 609
0.0431 seconds 534
0.0541 seconds 760
0.0324 seconds 561
0.0273 seconds 644
0.0413 seconds 508
0.0445 seconds 628
0.0444 seconds 658
0.0296 seconds 578

And with Percona Server 5.6.14:

0.0259 seconds 500
0.0318 seconds 500
0.0317 seconds 500
0.0226 seconds 500
0.0216 seconds 500
0.0317 seconds 500
0.0320 seconds 500
0.0319 seconds 500
0.0336 seconds 500
0.0337 seconds 500
18.6618 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
0.0428 seconds 500
0.0570 seconds 500
0.0570 seconds 500
27.5641 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
30.4068 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
35.4683 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
0.0230 seconds 500
44.6238 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
44.7074 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
44.8300 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
0.0296 seconds 500
45.2974 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
0.0311 seconds 500
0.0321 seconds 500
37.1574 seconds 500 ← !!
((1L, ‘SIMPLE’, ‘EI’, ‘ref’, ‘PRIMARY,version’, ‘version’, ‘8’, ‘const’, 500L, ‘Using index condition; Using where’),)
0.0266 seconds 500

During a typical run, after a minute or so, “show processlist\G” shows something like this:

*************************** 5. row ***************************
Id: 44
User: sr
Host: my_host:43457
db: sr_test
Command: Query
Time: 26
State: Sending data
Info: select * from EI where 0=1 or (target=52208912 and version=1905371148) or (target=52208913 and versi
Rows_sent: 496
Rows_examined: 0
*************************** 6. row ***************************
Id: 45
User: sr
Host: my_host:43460
db: sr_test
Command: Query
Time: 26
State: Sending data
Info: select * from EI where 0=1 or (target=52207637 and version=1905371148) or (target=52207638 and versi
Rows_sent: 496
Rows_examined: 0
*************************** 7. row ***************************
Id: 46
User: sr
Host: my_host:43463
db: sr_test
Command: Query
Time: 27
State: Sending data
Info: select * from EI where 0=1 or (target=52206994 and version=1905371148) or (target=52206995 and versi
Rows_sent: 496
Rows_examined: 0
*************************** 8. row ***************************
Id: 47
User: sr
Host: my_host:43468
db: sr_test
Command: Query
Time: 21
State: Sending data
Info: select * from EI where 0=1 or (target=52211510 and version=1905371148) or (target=52211511 and versi
Rows_sent: 496
Rows_examined: 0
*************************** 9. row ***************************
Id: 48
User: sr
Host: my_host:43472
db: sr_test
Command: Query
Time: 22
State: Sending data
Info: select * from EI where 0=1 or (target=52210867 and version=1905371148) or (target=52210868 and versi
Rows_sent: 496
Rows_examined: 0

The buffer pool size does not seem to be relevant. I have been able to reproduce with 512M, 1G, 3G. The my.cnf configuration file and the Python script can be found here: [url]https://github.com/Ringdingcoder/dumpingground/tree/master/percona[/url], a dump of the test database here: [URL]Home | Trayport

Steps to reproduce:

  1. Load dump into a database of your choosing (mine is called sr_test).
  2. Adjust connection parameters in the get_cursor function in test.py.
  3. Run python2 test.py
  4. Let it run for a few minutes.

I have now built a pristine MySQL server 5.6.15 (community edition), and it behaves exactly the same, so at least we know that the problem is not specific to Percona Server.

We are getting ready to migrate from MySQL Server Community 5.5 to either MySQL Community 5.6 or Percona Server 5.6, but would like more info about where this issue stands before making a decision.

To summarize this thread: MySQL 5.6 chooses not to use the primary key under unknown circumstances with simple queries like “select … from tbl where equals(pk-cols, pk-tuple1) or equals(pk-cols, pk-tuple2) or …”

It contains a test case which should enable everyone with enough motivation to reproduce this behavior. The data for the test case requires a few gigs of storage (I believe around 3-4GB), compressed dump is about 30MB.

Current MariaDB (10.0.12) has exactly the same problem (just ran my test case against it).

This sounds very much like [URL]MySQL Bugs: #72690: A 5.6's optimizer bug which is releated to 'or' operator's number, and hence tweaking eq_range_index_dive_limit should solve it, but setting this variable to (1000, 5000, 0) makes absolutely no difference :(.

As was found out by a MySQL developer in the bug discussion linked from my previous post, this requires a combination of two optimizer tweaks (which I have both tried to use individually before, to no avail): eq_range_index_dive_limit as well as optimizer_switch/use_index_extensions.