SH:Blog:

Possible Directions for RethinkDB

The company RethinkDB went out of business in late 2016, and since then, ownership of the rights got transferred to the Linux Foundation, and from the end-user's perspective, the biggest bit of activity was the release of RethinkDB 2.3.6, its first open source release, thanks to a bunch of work put in by Etienne Laurin.

That was an incremental release with bug fixes and compilation fixes on new platforms. At some point in time, the branch v2.4.x was made (off of RethinkDB's master branch, called next) in preparation for releasing RethinkDB 2.4. The purpose of this release was to be to put out existing work implemented by the company before it went out of business, the most exciting of which are write hooks and integration of the Windows build branch. A few new features and fixes made it into the next branch, including hard durability latency improvements, and soft durability IOPS reduction (which, under some workloads, comes with or exposes a slow-rolling memory leak, unfortunately), and other fixes also made it into v2.4.x: various bits of build system cleanup, fixes for r.iso8601, a millisecond rounding bug, an LRU cache bug, and more. These fixes are just sitting around waiting to get released.

Areas For Improvement

Right now, RethinkDB has the following areas for improvement:

  1. Clustering Reliability

    Many RethinkDB users have experienced mysterious problems related to clustering logic. Reported behaviors include endless backfilling, nodes being unable to connect to the entire cluster, repeated crashing of certain nodes in the cluster, and certain nodes having different opinions about where tables are.

    This probably depends on how complicated their clustering usage is – make lots of proxy nodes, reconfigure tables, etc., and you're more likely to hit an edge case.

  2. Scalability?

    Possibly there are problems here -- the scaling benchmarks on RethinkDB's website are limited to two-digit numbers of machines.

    Some people say they can't have data replication because backfilling is too slow.

  3. Storage Performance

    The storage engine uses a lot of disk space, and it also can get a lot of write amplification. Stored data tends to get scattered around disk, and read performance (e.g. iterating a large table, or working with large documents) can be abysmal. Write workloads perform an excessive number of IOPS (mitigated in the 2.3.6-srh-extra release, but one user that encountered a series of slow-running memory leaks).

  4. Storage Reliability

    Some users have seen the storage files get corrupted such that the server node crashes when restarting from an existing database file.

  5. Query Evaluation Performance

    The query language implementation uses C++ exceptions for propagating errors, which causes the r.default command implementation to be very slow. Also, there is, generally speaking, interpreter overhead, of the sort that you get from a highly dynamic interpreter.

  6. Query Optimization / Language Enhancements for Performance

    It would help some users if queries were not evaluated so naively. For example, some users write queries that involve subqueries instead of using a join -- the naive implementation evaluates the subquery from scratch each time. A possible performance enhancement would be to evaluate the query as a join, or to cache the table used in a subquery. Another possibility would be a ReQL command that creates a cached table object, such as r.table('foo').do(function(foo) { /* use foo in subqueries */ }). I don't mean this as a specific query language proposal, but an example of the general possibility of expanding control over query evaluation.

  7. Driver API Enhancements for Performance

    For some users (or some benchmarks), the driver spends a lot of CPU time simply constructing query objects. It would be useful if prepared statements could be constructed client-side, with parameters serialized out-of-band, to save CPU time.

  8. Meta-Problem: Complicated Implementation

    The biggest problem RethinkDB faces overall is that the implementation is complicated and bespoke. It would be nice if you could take a random C++ developer, throw them at the codebase, and tell them to figure it out. The problem is, RethinkDB has its own Raft implementation, its own storage engine, and its own green threads implementation with its own concurrency utilities. The query language implementation also has its own details that you need to be aware of. The result is, if there's a bug in the storage engine, or Raft implementation, or even the query language, it takes some costly immersement time to load that into your head.

Possible Future Directions

Let's suppose RethinkDB has a bit of development put into it. There are different choices about what could be done to RethinkDB in the near future. They are not incompatible with one another.

Release 2.4 and iterate

One choice is to release RethinkDB 2.4. That takes approximately no development work, some release work. It needs somebody with access to the package repositories. What this accomplishes is that it releases write hooks, a useful tool for people to use. It offers the most simple upgrade path: shut down your cluster, replace the RethinkDB binaries with the new version, and start up your cluster.

The query evaluator could then be optimized or the language could be enhanced, as listed above. A few commands that are useful with write hooks, like a command for specifying schemas and validating documents against them, would be appropriate to add. Efforts can be made to attack slow query evaluation in the storage layer too, including the strategy for evaluating range scans.

The storage engine could also be replaced with one built on top of RocksDB. For users whose data does not fit in RAM, or who perform a lot of writes, this would make for better performance. For everybody, this would avoid some bugs RethinkDB's storage engine has. Some of the work for encoding tables into a key/value store could be reused with a distributed key/value store.

Rebuild On Top of a Distributed Key/Value Store and iterate

Another possible direction is to throw out the entire clustering layer, the entire storage engine, and point the query language at a distributed key/value store such as FoundationDB. FoundationDB is pretty compatible – it even lets you watch a key for changes.

One big benefit from this is it gets rid of the meta-problem of a complicated implementation. All the clustering and disk storage concerns are pushed off to another project. Then the surface area of RethinkDB development would consist of client drivers, the server, and query language evaluation.

It has some other consequences:

  1. Change feeds would be dropped, in the first version, and when implemented, their implementation would impose more overhead on reads and writes, and the r.changes(...) part of the API might not be drop-in compatible. On the plus side, a plausible implementation would make them be resumable.
  2. Users are subject to FoundationDB write behavior, and that means (for FoundationDB) there no longer being a choice between “soft” and “hard” durability. Also, certain kinds of query implementation, such as atomic updates to a row, could have higher latency and tighter limitations on how many writes per second can be performed on a given row. But some of RethinkDB's performance problems, involving scanning through tables, and backfilling, would be mitigated. I'd say performance is a net win, unless you're trying to pound on an individual key.
  3. Transactions are a possible end-user feature. The underlying key/value storage engine's transaction mechanism can be made available to the user.
  4. Maximum primary key size can be increased to a more comfortable limit.
  5. There is enough flexibility for implementing new storage features, like columnar storage.
  6. The new version would not be a drop-in replacement for previous versions -- you have to migrate your clusters.
  7. Administration is different, it's all about running a FoundationDB cluster. This might have impact on services like Compose, which might not want to upgrade RethinkDB if it takes substantial development effort.
  8. Overall, the administration effort is lower. There is no fine-grained reconfiguration of tables, sharding, and replication settings. That means there are fewer choices, if you do want options relating to table configuration.
  9. Some users would be negatively impacted by factors nobody has thought of.

I think this would address a lot of pain points RethinkDB has. On the other hand, it's a big, dramatic jump. Many users' main problem is performance, which would be addressed by RocksDB or query language implementation improvements. There would be some benefit in releasing 2.4 as-is, first, and then having a FoundationDB follow-up release afterwards.

----

If you are using RethinkDB and have opinions about this, please let me know what you think by email at sam@samuelhughes.com, or post it online and email a link to me.

- Sam

(posted December 19 '18)