NDB and Character Indexes

Hey,

I’m fairly new to MySQL’s Clustering capabilities, though I am familiar with non-clustered installs. I have been doing performance comparisons between a few database systems, and with MySQL, and also its clustered counterpart. I have found in my tests that when dealing with character indexed queries and millions of rows, clustered queries run orders of magnitude slower than single MyISAM servers with the same information.

Is there something I need to know about the configuration that would increase this performance?

Tyler

It depends on what query you’re running.

In reality it is quite often the case as NDB needs to perform many lookups over network which are slow.

For example…

TableEmails with 30 million records, an auto-inc( primary key ), an email address(indexed), and an integer status identifier(indexed), with an extra index on status,email.

Now, the query : select * from TableEmails where status=0 and email = ‘bleh@bleh.com’ takes 20 seconds to return on my ndb setup whereas it takes .2 seconds on MyISAM. Is there some configuration setting that I am missing? Is there some performance tweak I could do to speed this up?

Tyler

P.S. emails will tend to be close to unique… so, the return value on that query is likely to be only 1 or 2 records.

Ok, feeling somewhat like an idiot… I wish to issue the problem as I have further unraveled it…

The issue doesn’t actually lie in the character index. It appears to be in the AND… Let me explain :

When I run the query : select * from TableEmails where status=0 or the query : select * from TableEmails where email = ‘bleh@bleh.com’

the result returns almost instantly. However, when I and them together : select * from TableEmails where status=0 and email = ‘bleh@bleh.com’, the result takes the 20 seconds.

I have tried creating a new index with both status and email, but that didn’t help at all. I have also found this to be an issue on another table that has 5 fields, all indexed integers… when I select, update, delete, etc… with an AND in the where clause, the query runs INCREDIBLY slower than if I were to run each of the where conditions individually.

Please tell me I am missing something small, a config setting perhaps.

Tyler

Ok…

More information regarding my findings. I have SOMEWHAT solved my problem. I found that NDB isn’t figuring out which index to use . When I have the two columns separately indexed and run the query, it runs SLOW. If I have a single index with both columns in it, AND and index for each column separately, it runs slow.

When I use a FORCE INDEX to use the single index spanning both columns, it runs fast!

So, MySQL NDB isn’t detecting the best method to go about the select statement. Now, this only partially solves my problem in that you can only use FORCE INDEX on select statements. What am I to do with DELETES and UPDATES?

Once again, this is a work in progress but any suggestions would be very welcome! Please help!

Tyler

PROBLEM SOLVED!!! Ok… I’ve seen this posted as a bug on the mysql website and been watching for a response. Here is the temporary fix :

Make sure you order your indexes from the most complex to the least. MySQL appears to search thru the indexes( in the order that you create them ) and identifies all the possible indexes for it to use. Then, rather than select the most efficient, it selects the FIRST one( or the oldest created index ). You can easily see the order of indexes if you use the MySQL Admin application.

Now, This all seems to be fine since the complex index appears to return a result just as fast as the individual. Therefore, there is no need for the individual index. ( until they fix this issue )

Tyler

Please post any other findings related to this. BTW, I am using 5.1.14 beta for testing.

Thanks Tyler for your response.

NDB is often slower than local storage engine but it should not be that much slower, good you’ve found the problem.

I expect NDB cost estimation is not as smart as with other storage engines yet, this is why MyISAM picks up index while it does not.

needs to perform many lookups over network which are slow.