Sysbench Benchmark for MongoDB – Performance Update

plasma · on June 5, 2013

Received an email from TokoDB about this new tech, available to play now.

Perf Blog: http://www.tokutek.com/2013/03/sysbench-benchmark-for-mongod...

Download Source: https://github.com/Tokutek/mongo

TokuTek say to contact them for using it.

TokuTek made TokuDB, a storage engine for MySQL that uses Fractal Indexes (as opposed to B-Tree), read more at http://www.tokutek.com/products/tokudb-for-mysql/

danielbarla · on June 5, 2013

I'm curious about the fractal tree indexing referred to, but so far all the information I'm running across seems to be fairly one-sided. Performance is rarely so one-dimensional that I can accept statements like "When used in a database, they can be used in any setting where a B-tree is used, with improved performance" (from http://en.wikipedia.org/wiki/TokuDB) on face value. The algorithm claims to have strong insert performance, as well as higher compressability when compared to B-trees, and all the performance results play heavily on these.

What about read-heavy workloads, etc? Can someone with a bit more insight provide an objective overview?

leif · on June 5, 2013

I am an engineer at Tokutek. Just talking about TokuMX, I have only seen MongoDB beat us on single-threaded, read-only workloads on extremely small data sets (like under 8K, or one MongoDB B-tree bucket).

The statement is really pretty much true. The best B-tree indexes (InnoDB) beat Fractal Tree indexes a little bit on in-memory reads, but we're not done tuning our implementation. On out-of-memory reads though, we usually beat B-trees because we compress so much better that we simply need to read less off disk.

Benchmarking is a tricky business though. Workloads can be varied, and you can probably find some corner cases where InnoDB beats TokuDB. I think we have some outstanding issues where the MySQL optimizer doesn't plan our queries properly, and we're still working on that. I'm a little out of touch with the MySQL side these days though so that could be wrong. But algorithmically, there are no cases where a B-tree index has a significant advantage over a Fractal Tree index.

Generally though, the read performance advantage we see is that if your indexes are Fractal Tree indexes, you can afford to maintain a richer set of indexes than you could have on InnoDB or MongoDB, and these extra indexes make your queries orders of magnitude faster. I think this is the most important (non-obvious) point to understand. I gave a talk about it here: http://www.youtube.com/watch?v=q6BnG74FZMQ

StefanKarpinski · on June 5, 2013

This data structure sounds as mythical as a unicorn. Are there any peer-reviewed publications about this data structure or is it a trade secret? The name seems pretty vacuous/hype-laden since any well-balanced tree is fractal.

leif · on June 5, 2013

The name is a marketing term, you're right. The structure is somewhere in the realm of buffer trees, LSM-trees, $B^\epsilon$ trees, and COLAs. It's hard to give it an academic name precisely because the implementation takes hints from many places, but we hope to publish more soon. B-trees are on the optimal tradeoff curve but there are many other points on that curve, and B-trees heavily favor reads so there's plenty of room to win on the write side. I have a couple of blogs about this actually: http://www.tokutek.com/2011/09/write-optimization-myths-comp... http://www.tokutek.com/2011/10/write-optimization-myths-comp...

esmet · on June 5, 2013

"Fractal Tree" is indeed a marketing trademark. But the algorithms used are not necessarily secret. The implementation is similar to the cache oblivious streaming btree: http://supertech.csail.mit.edu/papers/sbtree.pdf‎

Of course what's not described in the paper include: transactions, mvcc, logging/recovery, parallelism, etc. Those are really interesting topics as well.

danielbarla · on June 6, 2013

Thanks for the information, with your link and the ones others have posted, I've been able to get a better idea of how it works.

Regarding benchmarking, my recommendation would be to present other workloads as well; if what you say is true, you have nothing to fear, and rough parity with B-trees in your worst case scenarios would only make your argument stronger (it would definitely impress me more if it was included).

My point is that developers who are worried about index algorithm performance don't normally believe in free lunches, and your current benchmarks seem to be slightly cherry picked. Some of the wording is also a bit of a hard sell (and it sounds like your algorithm is good enough to stand on its own legs, so it's unnecessary). This will put off a lot of the more skeptical developers immediately.

leif · on June 7, 2013

Glad to help. I hear you, we'd love to put out more benchmarks, but honestly we just don't have enough peopletime to do as much benchmarking as the software deserves, and if you can only put out k benchmarks/month, you'd better be putting out your best k benchmarks.
Tim, our VP of Engineering, does most of our official benchmarking, in between project management, support, and testing (and a million other things). Just today (after we posted the TokuMX vs. MongoDB iibench blog), he said something about how he can't do many more of them, it just takes too long for MongoDB to finish the whole benchmark. That's time our servers can't spend testing the new software we're trying to ship.

My hope is that now that the code is open, more users will start pounding on it and publishing benchmarks. Justin Swanhart recently posted some great benchmark results where he showed us in a less-than-flattering light: http://shardquery.com/2013/05/25/tokudb-vs-percona-xtradb-us.... I haven't studied the workload he ran, but it looked like he may be pointing out some bad query planning behavior and slow cache warmup properties that we should probably address at some point. And that's good for the product and good for the ecosystem, so I am thrilled* that he posted it. I just hope we get 50 more just like it.

Anyway, I'm glad you've got a better understanding. Hope you can find a use case for it!

mjs · on June 5, 2013

The best explanation I've found of how it works (slides, PDF):

http://cdn.oreillystatic.com/en/assets/1/event/36/How%20Toku...

nemothekid · on June 5, 2013

What version of MongoDB was TokuDB forked from? Could I use this as a drop in replacement for a 2.4 deployment, or mix a replica set with 2.4 and Toku?

zardosht · on June 5, 2013

TokuMX was forked from 2.2.4, but we have and will continue to add features from 2.4. For instance, hash based sharding is in TokuMX.

One cannot mix a replica set with TokuMX and MongoDB.

nasalgoat · on June 5, 2013

Could I start a TokuMX replica to get the data across then shut off the master and switch to using it?

We have a very large install and migrating the data on production systems would be very difficult otherwise.

leif · on June 5, 2013

Unfortunately not, the replication format is completely different. A tool to do this would not be difficult to write, though.

malkia · on June 5, 2013

I'm working with PostgreSQL right now, and it supports various indices (I'm sticking with btree for everything I have for now). Just made me wonder, whether the folks behind TokuDB cannot pursue PGSQL variant of their solution?

never mind - found some good answers here - http://postgresql.1045698.n5.nabble.com/Fractal-tree-indexin...

programminggeek · on June 5, 2013

So, TokuMX is web scale?

In all seriousness, this looks super cool, but I'm easily fooled by benchmarks and I don't have any projects right now that really push MongoDB that hard, so it doesn't impact me other than giving us a potentially faster Mongo, which is cool.

zardosht · on June 5, 2013

I work for Tokutek. In addition to performance improvements and compression, another cool thing about TokuMX: it's fully transactional. We hope this makes development of applications simpler.

programminggeek · on June 5, 2013

That's awesome! I hope you didn't mind the joke too much. I really do use/love Mongo and I appreciate anyone who contributes to it.

kapilvt · on June 5, 2013

could you clarify this statement. mongodb native is consistent and atomic wrt to single documents. Are you saying that tokumx does something additional wrt to transactions (which also implies client api changes)?

leif · on June 5, 2013

TokuMX offers multi-document transactional semantics without application changes (snapshot reads), as well as protocol support for multi-statement (read-modify-write style) transactions, within a single shard. We are still designing how we want to present transactions in a sharded cluster.

toolslive · on June 5, 2013

is to be expected: you have a data structure where half of the inserts can be done in O(1).

nasalgoat · on June 5, 2013

I've signed up for this beta release - even without ACID transactions on a cluster-wide basis, the other features alone are worth it, especially the compression.

philsnow · on June 5, 2013

From what I understand, what mongodb needs is not more speed, but better acid guarantees by default.

zardosht · on June 5, 2013

From Leif earlier, "TokuMX offers multi-document transactional semantics without application changes (snapshot reads), as well as protocol support for multi-statement (read-modify-write style) transactions, within a single shard. We are still designing how we want to present transactions in a sharded cluster."

philsnow · on June 5, 2013

All built on the rock-solid webscale foundation of Mongo?