TokuDB migration while RocksDB feels... beta?

Hi Guys, what’s the status of rocksdb developement? Are any considerable resources dedicated to progress and improve it?

Basically we’re doing a huge tokudb migration (several TBs of compressed data and billions of rows). While testing rocks it seems that the engine is lacking, so compression is worse and query speed dropped (around x2 even for small data sets). That’s not even a real issue, what i think we’re missing is something like “single file per table” as im not really sure if keeping TBs of data and hundreds of tables in singe set of files is good idea for production as any error in one of these files could break the whole server.

Maybe to have an option of just keeping everything in separated folders for each table would make the engine faster and more error-resiliant?

Is that even considered as a possible feature? Or for the migration, if we want the server to be reliable - innodb is currently the only way to go? Is rocks another short-term solution while long term it’s another dead-end?

Thanks for all thoughts…

Extremely actively developed.

“considerable resources” => Yes, Facebook :slight_smile: They created RocksDB and have deployed in over 90% of their X0,000 MySQL servers.

A quick google search also reveals many other companies using Percona + MyRocks in their fleets of MySQL servers handling petabytes of production-critical data.

Can you give more details here? Lacking in what way?

You’re expecting MyRocks, a database engine based on Log-Structure-Merge Trees (LSM), to store its data on disk the same way InnoDB (based on B+Trees) does?

MyRocks is very reliable, otherwise the companies in the results above would not be using it.

Hi Matthew, thanks for the response… so why it feels like “lacking”… sorry that’s a longer response, maybe just lack of experience on my side…

  1. Basically I feel many simple, straightforward and “always supported” features are not working as “expected”. That’s actually not on percona but maria so maybe your plugin is different.

It feels that innodb, aria, myisam, memory and tokudb are interchargable. So i can change the engine to the one with proper performance characteristics and it’ll work “same way” more or less and i can’t break anything with config…

With myrocks it seems it’s even problematic to start, for example
ALTER TABLE trading._raw_order_hist ENGINE = ROCKSDB;

Operation aborted: Failed to acquire lock due to rocksdb_max_row_locks limit

Same thing when user wants to “copy” data with INSERT INTO.

I know there’s bulk load, with some constraints. After reading the docs i’m not even sure if data loaded using bulk mode will be correctly loaded. Okay bulk load, unordered… loaded the data, it worked.

Then there’s another issue… when doing simple UPDATEs (like i want to set a colum to zero for debug) i’m getting the same errors even on 800k row tables while rocksdb_max_row_locks is set to > million. Even on full table where it should be easy to just lock full table when there’s nothing else going on…

So the point is that if you have to “micromanage” simple operations this way it should at least be predictable to the point that if you have max locks > total rows, that should succeed every time on dev where you’re not holding any other locks. Ideally it’d just “switch” to full table lock when you’re “out of” row locks (i know that sounds super simple, probably horribly complicated to do… however other engines manage it).

That’s just one aspect. I was testing rocks for a long time, not sure if that’s true anymore but

  1. I read that compacting the LSM tree is limited on default settings to some rather small size (like 100GB). The other thing is that tree compaction (tested that) required twice as much storage as the full database, so if i have eg. 6 TB of data i’d need to keep another 6 TB of SSD space, idling so the server can “balance” the levels and reclaim free space after deleting rows.

  2. The compaction process is resource intensive and affects whole server. So it’s hard to plan it because whole server is affected. I can’t compact eg. single table which is rarely used so it can be slow. Also read some claims that large data size is affecting whole server making it slow (>1TB of data).

  3. Couple of times when i needed to kill the server or vm was closed without shutdown by mistake, it resulted in all rocks tables to be “broken” and inaccessible anymore (as plugin wasn’t able to load, unlike even myisam when such issues were limited to single table and were repaired automatically).

So unlike all other engines single (large) table could affect performance and validity of other tables. I have some data which are large, often updated and not very needed (as they can be re-generated or just restored). These will affect small, critical tables.

As for other comments. Don’t expect that myrocks will store data the same way as inno or myisam, maybe each (or choosen) tables could have its own LSM tree stored in separated folder. Not sure what the overhead of this approach would be to manage eg. 500 LSM trees per server so around 100k-200k files per server.

For asking if it’s being developed maybe i understand it incorrectly but i believed rocksdb is a k-v database, which is maintained by facebook and then there’s probably this very complicated middleware (mysql plugin) which allows mysql to use it as a backend for transactional, row-based database. Facebook also develops the mysql plugin for maria, oracle or percona server? What im afraid of is that toku was obsoleted because there was lack of manpower to maintain it and it got very complicated (as far as i understand the issue with keeping toku alive)… now if i use rocks and rocks (mysql plugin) will be abandoned - there’s no other migration path as there’s nothing comparable available…

I believe that rocks was picked over toku because it was “simpler”. However i also believe that rocks could become even more complicated than toku, later as features will be added… so currently thinking about proper migration path which would be innodb+clickhouse, with possible addition of rocks… while i’m not sure if i shouldn’t focus on innodb only as this seems to have best support and even mariadb is slowly becoming “innosql” really :wink:

MyRocks is very reliable, otherwise the companies in the results above would not be using it.

So it’s just steep learning curve and all the issues can be mitigated by proper level or config / tuning (so it can be on par with toku when properly configured, for example).

Eg. from my experience, with toku it has its issues, eg. something’s broken with statistics and sometimes table row count is totally off, so optimizer creates unoptimal execution plan in some cases… but it can be fixed by manually issuing recount. I never lost any data with it (maybe i have started using rocks too early as i was testing it for a long time)…

Correct.

Yep

Facebook contacted Percona to develop the MyRocks engine in partnership. We work very closely with FB on this. FB uses Percona MySQL, and the MyRocks engine we created with them.

Toku was obsoleted due to its extremely low usage. Percona purchased TokuTek many years ago to gain this engine. Many engineers from TT remained. As budgets were planned, an evaluation of Toku’s usage in the community was taken and it was deemed too costly to continue maintaining for such a small base, especially when the colossal FB said “hey, help us create this mysql engine so we can deploy it across X0,000 servers”.

Yes. MyRocks has a very specific use-case. It is NOT a general purpose engine like InnoDB. Our customers that use MyRocks use it for very specific workloads. It is not a “replace all innodb with myrocks” engine. It is “this specific write-intensive application should be evaluated with myrocks to improve database performance”.

All defaults suck :smiley: Take a look at this my.cnf file for MyRocks. This is the baseline config for one of our larger myrocks customers. Maybe it’ll help out your testing.

rocksdb_cnf.txt (1.8 KB)