Distributed SQL vs. NewSQL

dkhenry · on Jan 28, 2020

I am one of the Maintainers of Vitess and I just wanted to pipe in on a few points.

The comparison chart lists vitess as not having "High Performance" due to what they are saying is a single coordinator node ( vtgate ). You can actually have as many of those as you want to scale Vitess far beyond what any other system ( including yugabyte ) offers in terms of performance. We recently did a benchmark in partnership with AWS showing how far you could push AWS Aurora using Vitess ( https://planetscale.com/news/planetscale-aws-benchmark )

Secondly is the claim that there are no distributed transactions or cross shard joins. There are in fact distributed transactions, and cross shard joins. You can in fact do full cross shard ACID transactions. We do recommend that you are aware of the sharding mechanism as we will not trigger a cross shard transaction unless we need to.

Finally they do say there is no Native Failover/Repair which is only technically accurate. We use a third party tool called orchistrator to do the fail overs, and we recommend you run that along side the cluster so yes its not "built in", but it will launch as part of our Helm Chart fully configured to automatically do native fail overs.

espadrine · on Jan 28, 2020

Vitess is great. Although I am wondering whether the Planetscale achievement is partially attributable to the fantastic work put in the closed-source Aurora.

In particular, it is an order of magnitude better than the displayed CockroachDB benchmark[0], with 81 c5d.9xlarge nodes instead of Vitess’ 64 r4.16xlarge.

Can you still beat that with MySQL or MariaDB?

[0]: https://www.cockroachlabs.com/docs/v19.2/performance.html

PeterZaitsev · on Jan 28, 2020

I would imagine this is some form of AWS partnership to showcase Aurora here. I do not think results with MySQL would be substantially different. If anything I would expect better price performance.

dkhenry · on Jan 28, 2020

That is correct, this was done in partnership with AWS to show off Aurora, however we have achieved similar results with stock MySQL. We are pretty confident that with standard MySQL using MyRocks and some high end storage devices we will be able to beat those numbers with fewer resources.

pm90 · on Jan 30, 2020

Don't mean to be crass... but why not do what you just describe and publish it to promote Vitess? Would lay the argument to rest :)

dkhenry · on Jan 31, 2020

The cost to run those benchmarks with AWS was close to $50,000 just in infrastructure cost ( that was the main reason we jumped from 16 shards to 64 shards at the larger instance size, we wanted to show the top end, but we didn't have the resources to do all the sizes in between ), that doesn't even account for the engineering time to put together the solution and run the tests. We would love to have more funding to run those kinds of tests, but Vitess doesn't have a big sponsoring company to bankroll it the way some other projects do.

evanweaver · on Jan 29, 2020

> There are in fact distributed transactions, and cross shard joins. You can in fact do full cross shard ACID transactions.

The Vitess docs explicitly say that 2PC transactions are not isolated and are not ACID.

Did something change?

dkhenry · on Jan 29, 2020

We will normally list it as ACI*D since there is a situation where you can break isolation.

evanweaver · on Jan 29, 2020

What situation? Are cross-shard transactions ever isolated?

dkhenry · on Jan 29, 2020

Yes they are normally isolated, however its only at a READ_COMMITTED level, and a nuance of the implementation is that you may see committed data rolled back by the protocol in the event of a failure. Its still technically READ_COMMITTED, but its an unexpected behavior from stock MySQL so we make sure to qualify our documentation.

evanweaver · on Jan 29, 2020

What you describe is read uncommitted. From your own docs:

> A third party that performs cross-database reads can observe partial commits while a 2PC transaction is in progress.

Regardless of rollback status, the in-flight transaction will not be observed correctly across shards, for example, with a cross-shard join. It doesn't matter what the consistency guarantee within a shard is; this is not ACID nor is it serializable.

dkhenry · on Jan 29, 2020

Its not, its still read committed. Its the same thing as if you had to read from two rows in a transaction, and your isolation was read committed, and both those rows would be modified by a different transaction. Its possible you read the first row, a transaction is committed and then you read the second row with the updates from the transaction. You have just read a partial commit. It is ACID and it is not serializable, the thing is MySQL lets you use higher levels of isolation like repeatable read or serializable which are very useful and today Vitess can't guarantee that in a 2PC.

aphyr · on Jan 30, 2020

Y'all really should hire me at some point. I've been building out transactional analysis tooling for Jepsen that can help distinguish between exactly these cases. :-)

dkhenry · on Jan 30, 2020

Yes! We have it on our todo list to have you test out Vitess.

kd5bjo · on Jan 29, 2020

I may be just tripping over semantics here, but how can you consider data that may still be automatically rolled back to be “committed”? I thought that’s supposed to mean the data has been fully stored such that it can only be changed/removed by a subsequent transaction.

dkhenry · on Jan 29, 2020

The data will be committed, but you may get a subsequent read that has the previous value, before it is overwritten by the final value. This would only occur if the shard where the commit is occurring fails, while the promoted shard replays the transaction before the final commit is propagated to the user. Since the shard as failed is impossible for a single transaction to experience this behavior, but an outside observer would be able to see this behavior. Transactionally everything would still be isolated, but outside the transaction you would see behavior you won't expect.

gigatexal · on Jan 28, 2020

Vitess runs YouTube. Mic drop. Nothing else to say.

gigatexal · on Jan 28, 2020

To be fair. There has been a lot of work to make it easier for mere mortals to run it. So take the complexity of it into consideration when evaluating it for use.

irfansharif · on Jan 28, 2020

Those numbers are surprising to me. Have latency charts to share? And are you using think time? http://www.tpc.org/tpcc/detail.asp

espadrine · on Jan 28, 2020

Hmm, it does seem like they are not using think time.

From the official TPC-C specification[0]:

> The maximum throughput is achieved with infinitely fast transactions resulting in a null response time and minimum required wait times. The intent of this clause is to prevent reporting a throughput that exceeds this maximum, which is computed to be 12.86 tpmC per warehouse.

So the maximum result they should be able to get at 100K warehouses is 1,286,000 tpmC.

To reach higher, they should do what Alibaba did[1] and use 4,794,240 warehouses. They got officially accepted with a result which dwarfs even Planetary’s incorrect benchmark.

[0]: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_...

[1]: https://www.alibabacloud.com/blog/oceanbase-did-better-than-...

jhugg · on Jan 29, 2020

Re VoltDB:

>In VoltDB, all replicas for a given shard are updated synchronously by the client application. This is where VoltDB pays a significant performance penalty for write operations when compared with Raft/Paxos-powered distributed SQL databases. Distributed consensus protocols like Raft and Paxos require writes to be sent to all replicas but commit as soon as a majority of replicas have acknowledged the request. Waiting for all replicas to respond is not necessary since the consensus can be established with a majority. Additionally, VoltDB does not detect network partitions by default and requires a special network-fault-protection mode to be configured. When a node of the cluster gets partitioned, the network-fault-protection mode comes into play. It negatively impacts cluster performance by increasing cluster recovery time for not only accepting writes on the shards whose replica was lost in the node partition but also for repopulating the data on the partitioned node when it joins back the cluster.

Wow. Basically none of that is true.

colanderman · on Jan 29, 2020

Ya. The stuff on NuoDB is basically all untrue or misleading as well (source: I work there). I wouldn't put too much stock in this self-promotion article.

bithavoc · on Jan 28, 2020

I tried Yugabyte a few weeks back, the TServer kept crashing while my backend was creating some very simple tables. I'm stuck with CockroachDB for the time being.

vvern · on Jan 28, 2020

> I'm stuck with CockroachDB for the time being.

Would you care to share any negative experiences you've had with CockroachDB? What is motivating your desire to look elsewhere?

bithavoc · on Jan 28, 2020

A few weeks back after my attempt to switch to Yugabyte I decided to commit to CockroachDB in a dedicated cluster with TLS, my backend runs migrations including creating the database but only admins and root are able to create databases but because TLS was enabled my root user didn't work anymore. I was not able to create more admin users because that's part of the RBAC Enterprise offering. The database was there but the authorization system triggers before IF NOT EXISTS condition, so I had to make changes to make it work in production. I honestly thought that if I stayed away from Enterprise features then I would be fine, but no: it seems like if I wanted to deploy my backend on-premise, the deployment process would have to include an step to run the CREATE DATABASE script out of band either before enabling TLS or via localhost(?) in a database node, otherwise I'm fucked unless the client buys CRDB Enterprise to smooth out the deployment process. That's a very extreme case of Open-Core money-squeezing shit I would like to stay away from.

CockroachDB Core is fine as a product, the docs are up to date and it's not hard to operate, all that is appreciated it. I just don't think I'm the target user, unless you make a shit ton of money as you scale it will be hard to operate without:

- Distributed Backups(Enterprise)

- Follower Reads(Enterprise) (no, follow-the-workload doesn't cut it)

- RBAC(Enterprise)

Yugabyte offers all that in their open-source license, which is why I'm rooting for them, that in addition to Change Data capture(Enterprise offering in CockroachDB) is also included.

knz42 · on Jan 28, 2020

Hello bithavoc, I happen to be currently working on that area of CockroachDB and I would like to help.

To start with, any scenario that you find where a secure deployment without an Enterprise license makes you unable to operate your apps using the core features, would be a serious bug on our side and we'd give it high priority. There is no intent to do bait-and-switch here.

If I read between your lines I can see the following:

- you need an admin user to create databases;

- the only admin user is root;

- it appears that you cannot log in with root on a secure cluster (TLS enabled). Is that understanding correct?

I think I understand the following: your SQL client is attempting to connect using TLS but without presenting the root client cert to the server. The root client cert is a cert that you can generate using the cluster's CA key (using the `cockroach cert` command). Without a valid client cert, the connection is refused.

You may think this seems to work for other users than root. This is because for other users, when cert validation fails, the server allows the client to use a password instead. This choice to use a password is not available for the root user up to and including 19.2.

Here is your way forward:

- either you configure your SQL client that creates database to present the root client cert to the server. This will enable you to use the root account on a secure server and create databases (and other users, and then assign privileges to these users over the new database).

- or, you wait until crdb 20.1, where it will be possible to configure a password for the root user and let root clients use password authentication instead of presenting client TLS certs.

Does this help?

If it doesn't please create an issue on github and describe your situation in more details and cc me (@knz). Thank you

bithavoc · on Jan 28, 2020

Correct, I’m using TLS with password authentication, I don’t want to generate certs for my backend user, so yes, root password authorization solves the issue. I can wait for 20.1, should be coming up this month right?

irfansharif · on Jan 28, 2020

v20.1 is slated for release in April.

contrahax · on Jan 28, 2020

Lack of geospatial/postgis compatibility is a sore spot for many use cases.

grizzles · on Jan 28, 2020

Have you tried comdb2? I've heard good things. Idk why it isn't more popular.

ddorian43 · on Jan 28, 2020

It doesn't have sharding.

grizzles · on Jan 28, 2020

Can't you just do that manually?

ddorian43 · on Jan 28, 2020

I don't want to build a db. The whole point is not to do sharding manually.

akhilcacharya · on Jan 28, 2020

I didn't know it was used outside of Bloomberg...

themoop · on Jan 28, 2020

Yugabyte has great articles in general and seems almost to be good to be true what I have read.

It is just missing a bit of independent third party reviews but it does look interesting.

Anyone with hands on experience?

dilloc · on Jan 28, 2020

Yugabyte v. Cockroach analysis: https://www.cockroachlabs.com/blog/unpacking-competitive-ben...

disclaimer: I work at cockroach

pm90 · on Jan 28, 2020

Nice. Seems a bit disingenuous on yugabyte to claim they’re better performant in light of this.

Just want to say that I love your docs. Haven’t used cockroachDB just yet but the docs are clean, easy to follow, devoid of marketing and focused on facts.

bithavoc · on Jan 28, 2020

Second that, the CockroachDB documentation is excellent. All the DDL and DML statements are clearly documented and with examples, that's really cool.

themoop · on Jan 28, 2020

Thanks, that is exactly the kind of third party analysis I was looking for to add some color to all of Yugabyte's claims

Nican · on Jan 28, 2020

Oh good. I had not seen this blog post from CRDB, and I had asked this exact question 4 months ago: https://news.ycombinator.com/item?id=21007562

fizwhiz · on Jan 28, 2020

https://jepsen.io/analyses/yugabyte-db-1.3.1

rdtsc · on Jan 28, 2020

From your link:

> 2019-09-05: YugaByte’s blog post states YugaByte DB “passes Jepsen tests”. We feel obligated to state that YugaByte DB’s Jepsen test suite does not pass, though it may in the future. Race conditions in YugaByte DB’s schema system can cause correctness errors. For example, inserting rows into a freshly-created table with DEFAULT values may result in the values for those columns initialized to NULL instead. We can also now confirm that this issue affects all default values, not just DEFAULT NOW(). It also appears that DDL race conditions might, under certain conditions, render tables completely unusable.

Yikes. I understand the need for marketing and making their product look good but it's going a bit too far saying they pass the tests when they didn't. I can easily see potential customers pausing when encountering that kind of attitude from YugabyteDB's developers. "What else are they dishonest about..." kind of idea.

willvarfar · on Jan 28, 2020

I don’t know the case at yugabyte but, in general, the outright lies by sales has nothing to do with the integrity of the developers.

gauravahuja · on Jan 28, 2020

Developers are responsible for fair benchmarks.

willvarfar · on Jan 28, 2020

I know this is basically an ad for yugabyte, but it’s a nice summary nonetheless.

vearwhershuh · on Jan 28, 2020

Does anyone have good experiences with seamless sharding at the database layer? I'm old school and I'd be strongly inclined to look for a simple cut that allowed me to shard at the app layer, but I'd love to hear an alternative view.

derekperkins · on Jan 28, 2020

Vitess is awesome if you're using MySQL.