MySQL poor MyIsam UTF8 performance

We recently upgraded our Fedora Core server from 4.0.27 to 5.0.37 CS. In this process we also decided to change the character encoding on some of our tables from Latin1 to UTF 8. Wich is needed for a future release of our App. Problem that arose after the conversion was very slow “like” searches on large TEXT fields. factor 8-10 times slower then our previous version of MySQL on the same hardware.

After hours of debugging and testing we found that changing the encoding of the TEXT fields back to Latin1 (wich is acually not what we want but needed to be done) we get the performance back to what it was under 4.0.27. I understand that UTF-8 takes more bytes compared to Latin1 so that is probably where this comes from but we can’t seem to find a way to improve performance under UTF-8. Wich server-vars can be tweaked to gain some some speed here? we upped temp table sizes, switched MySQL tmp dir to an in memory (ram-disk) nothing seemes to help. So if anybody has a clue please let me know what to tweak.

If the table type is InnoDB try upping InnoDB table buffers as both Indexes and Data are cached.

Is the search like ‘%word%’ or like ‘word%’? A index could satisfy performance in the last case.

Further more if the solution you have is to search text content for words why not use a more suitable technology like MySQL full-text indexes or get even better performance by compiling MySQL with Sphinx FT engine support?

The table is MyIsam, not innodb. And performance is just pretty good when using Latin1 it just drops (8-10 fold) when the text fields are changed to UTF-8 and the data is inserted again. I know that searching with like ‘%word%’ as we do in this case is not very efficient but does it have to be that slow with UTF-8? we would have to rewrite an extensive part of our application otherwise

Does seem strange to me, I’m not sure if LIKE searches are effected by word length, I would assume not, making the performance here bizarre. It must have something to do with MySQL internals handling Unicode.

I don’t know wheter it has anything to do with buffer lenght or stuff like that but I can understand that searching for somthing that has possible three times more bytes takes longer or needs more memory. I just don’t know what to adjust. Thanks

Could you maybe benchmark the query by using a InnoDB copy of the table and increase InnoDB related buffers so that a large portion od the data is in RAM?

Or is that scenario out of the question?

That is not easy to do as we use MyIsam only and disabled al other storage engines as much as possible