Amazon Dynamo used a concept they termed “sloppy quorums”. They describe it as follows:
If Dynamo used a traditional quorum approach it would be unavailable during server failures and network partitions, and would have reduced durability even under the simplest of failure conditions. To remedy this it does not enforce strict quorum membership and instead it uses a “sloppy quorum”; all read and write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while walking the consistent hashing ring.
I am no expert on Cassandra, but my reading of the Cassandra api page is that its quorums are non-sloppy. (Please let me know in the comments if that is not the case.) Specifically it says:
All discussion of “nodes” here refers to nodes responsible for holding data for the given key; “surrogate” nodes involved in HintedHandoff do not count towards achieving the requested ConsistencyLevel.
This is quite interesting, as Werner Vogel’s quote at the top suggests that quorum sloppiness is really important! So let’s look a bit deeper.
Note: Cassandra does support a ConsistencyLevel of ANY which I read as allowing writes to what they call “Surrogate nodes”. So Cassandra can, as I read it, do sloppy writes when one desires dynamo-style W=1 factor, by specifying ANY.
Let’s break implementations into one of three classes:
- Sloppy quorums - as outlined in the Dynamo paper.
- Strict quorums - writes do not persist if we cannot get acks back from W quorum members.
- Strict-warn quorums - writes return an error if we cannot get acks back from W quorum members; however, the data might persist despite the acknowledgment.
I make the distinction between Strict and Strict-warn as I am not sure to which Cassandra belongs. (What’s the answer?)
Sloppy Quorums
As this is the Dynamo behavior, this case presumably is well understood.
Strict Quorums
At first glance one would assume availability is poor for Strict quorums. However, perhaps if we have rack-aware and data-center aware assignment of the N nodes, it’s not so bad?
Consider R=3, W=3, N=5 with 5 [responsible] nodes [of a 100 node system] in 5 data centers. In this situation, availability should be quite good as long as 3 of the 5 data centers is up. However, write availability will be lower than a Sloppy system: in Dynamo, if the network partitioned into data centers {A,B,C} as one partition, and {D,E} as another, we could still read and write in the {D,E} partition. Here we cannot. We aren’t getting the availability that Dynamo delivers. Instead, we are more in a strongly consistent camp of solutions. I actually think this is a perfectly reasonable level of availability — it just isn’t Dynamo level availability.
Consider W=N. For example, perhaps we use R=1,W=3,N=3 to achieve R+W>N. It’s pretty clear here that write availability will be poor. If any one of the three nodes goes down, we cannot write. It seems to me that for a Strict system, use of W=N or R=N should be unusual. Is that correct?
It’s interesting to note that even with R+W>N, we do not get strong consistency from a Strict dynamo design — at least not as strong consistency is defined in the CAP theorem (which refers to “read/write data object atomic consistency”). Perhaps we instead we should say a strict quorum Dynamo-inspired solution provides immediate consistency:
after a write is acknowledged by W servers in a R+W>N strict quorum system, all reads will immediately thereafter see the result of that write, or of successor writes
It is possible in a strict quorum architecture to implement strong consistency, of course: it is just not automatic.
Strict-warn Quorums
I do not know if String-warn applies in the real world, but it changes the write analysis so if it does apply we need to go through it.
First, reads behave the same as Strict quorums. If N=5 and R=3, we have the same situation as above: the {D,E} partition cannot do reads.
What about writes? Here, we get an error back if we cannot write to W servers, yet, our write may still be durable. In a way, we effectively have here W*=1, and a warning returned when the number of acks is less than W. Thus the “-warn” moniker.
Consider again W=3, N=5 with 5 nodes in 5 data centers. In the {D,E} partition, we write to two servers and get two acks. That is less than W, so we return an error/warning. However, after the network heals, presumably the write propogates.
In this case, there was no quorum — not even a sloppy one. We could perhaps describe Strict-warn systems as: supports writes without quorums, as long as at least one node is available for writing, but warns the client if a quorum is not achieved.
All three classes above could be useful for certain use cases, but it seems important that one knows in which class a particular product configuration belongs before building anything.
-
dmerr posted this