Consistency Levels in Cosmos DB

2021-12-01 | Azure · CosmosDB · Database ·

TLDR; Cosmos DB offers five different levels of consistency guarantees so that the best performance of an application can be achieved without sacrificing data integrity.

What is Consistency?

Within distributed systems, consistency assumes there is a single correct way to describe the order in which the events happen. Consistency can also be seen as recency guarantee; there appears to be a single copy of the data. As soon as one client successfully completes a write, all observers reading from the database must be able to see the value just written.

On one hand, strong consistency guarantees every user will see the latest updated data, but may need to wait for all instances to be up to date. On the other hand, an eventual consistency guarantee means users will get a fast response but may see inconsistent data.

With globally distributed applications, the characteristics of the chosen consistency level is emphasised. A greater distance for data to travel and a greater number of replicas to update increases the likelihood of performance issues under a strongly consistent model and inconsistent data under an eventually consistent model.

Different consistency models offer different trade off between consistency, write latencies, throughput and availability. Generally, stronger consistency results in higher write latencies, higher throughput, and reduced availability. The purpose of different consistency levels is to give the client the best performance whilst achieving the minimum accepted level of consistency to an application.

So what are the consistency level Cosmos offers? What are their guarantees, and when should I use each?

As a nod to the white paper1 that made consistency levels click for me, I'll use the example of a global sporting event watched by billions of people, the 2018 FIFA World Cup final.

The scores and time of updates are as follows:

Time France Croatia Database Operation
00 0 0 read("France"); read("Croatia")
18 1 0 write(team: "France", score: 1)
29 1 1 write(team: "Croatia", score: 1)
38 2 1 write(team: "France", score: 2)
52 3 1 write(team: "France", score: 3)
59 4 1 write(team: "France", score: 4)
69 4 2 write(team: "Croatia", score: 2)
90 4 2 read("France"); read("Croatia")

In this scenario we have a primary database (located in West Europe region) providing writes to all replicas, and three follower replicas within the data centre. We also geo-replicate our data across each continent, each having four replicas in each data centre.

We will consider who would most benefit from what consistency level. Cosmos DB allows clients to override the default consistency level for a specific read request, meaning we can relax read consistency per request for increased performance.

Guarantees

Cosmos DB offers five levels of consistency guarantees. In order of consistency, they are:

  • Strong
    • You will always see the latest version of the data.
    • At any point in the game, you will see the latest score.
  • Bounded Staleness
    • Reads lag behind writes by at most K updates or time interval T.
    • The score you see might be old by timespan T.
  • Session
    • Within a "session", reads are functionally strongly consistent. Outside the session, consistent prefix.
    • Reads in the "EU west session" will receive the latest score. Everyone outside may see an old score.
  • Consistent Prefix
    • For all regions, writes are ordered.
    • At any point in the game, you will never see a score that has never occurred.
  • Eventual
    • No guarantees, reads may be stale and inconsistent. Eventually, the data will be consistent until a new write.
    • Any combination of scores may be seen, including scores that never occur.

Strong consistency

Also known as linearizability, strong consistency guarantees all your users see the latest writes; every user at the 70 minute mark will see the score {France: 4, Croatia: 2}. As a consequence, there is increased latency.

For a write to be committed to the system and acknowledged as successful, all replicas have to agree on the most recent value. This means high latency and reduced availability. If a region becomes unavailable, since the application cannot accept reads that are stale, the application will not be able to respond until the region has recovered and updated to the most recent write value.

In practice anyone wanting updates from the World Cup final would be well served by a strong consistency guarantee. However, most people would be OK in seeing a score that is outdated by a few minutes. When would this not be acceptable? As scores are calculated by incrementing the previous score, it would be a disaster if the scorekeeper read the wrong score. In this case, the score keeper could not tolerate stale reads so strong consistency would be appropriate.

Bounded Staleness consistency

This is the second strongest consistency guarantee Cosmos DB offers. As the name suggests, there is a window that you can tolerate stale data. Reads can be behind at most K updates to an item or behind by at most T time interval, whichever is reached first. For data that falls out of the specified window, the guarantees are identical to strong consistency.

What is interesting about bounded staleness is that if you know the maximum allowable lag for your system then you can get the same guarantee of strong consistency without the penalty in latency.

A football pundit only cares about the score during half time and at the end of the game. The pundit could use the length of the game to guarantee they read the most recent score by setting the window to 45 minutes. A read in the 46th minute would mean the most recent score is read and again in the 91st minute for post-game discussion.

Session consistency

This is the default configured consistency level. Within a single client session, operations are guaranteed to follow:

  • Monotonic Reads:
    • While the client can read arbitrarily stale data, it is guaranteed to read data that is increasingly up to date over time.
    • (E.g., one it has read the score of {France: 4, Croatia: 2}, it won't then later see {France: 0, Croatia: 0} in the same session.)
  • Monotonic writes:
    • Ensures that if the client performs writes w and then v, all processes within the session observe w before v.
  • Read your writes:
    • All writes performed by the client are visible to the session's subsequent reads. In other words, strong consistency holds for the writer and clients sharing the same session id.
  • Write follows reads:
    • Ensures that write updates are propagated after performing the previous read operation.
  • Consistent prefix:
    • Covered in the next section

A session is defined through a unique session key that all writers include in their requests. Cosmos DB uses the session key to ensure read-my-writes (for the writer) and one of the above guarantees for readers using the session token. Within the same session, reads are consistent. Everyone outside of the session will fall back to the consistent prefix model.

Session consistency provides good performance: write latencies, availability, and read throughput comparable to that of eventual consistency.

Revisiting the score keeper we notice that they can achieve consistent reads without sacrificing performance by using session consistency. As the score keeper will be reading their own writes, session consistency guarantees they will see their most recent writes with the added benefit of not having to wait for all replicas to be up to date.

Consistent Prefix consistency

Also referred to as snapshot isolation; the reader sees a "snapshot of the data". Consistent prefix guarantees whilst they may not read the most current score, the score would reflect a previous score. And unlike bounded staleness, there is no guarantee on how long the delay is.

With consistent prefix consistency, reads are some prefix of the updates. Possible scores for a consistent prefix read could be one of the following:

  • {France: 1, Croatia: 0}
  • {France: 2, Croatia: 1}
  • {France: 4, Croatia: 1}
  • {France: 4, Croatia: 2}

Consistent prefix is great for those who wish to have write latencies, availability, and read throughput similar to eventual consistency whilst guaranteeing ordered writes. This means you'll never see a state which could have emerged if the order of writes was changed. As we will see below, a read value of {France: 0, Croatia: 1} could never happen under consistent prefix as France's first goal (write to the database) was before Croatia's.

Eventual consistency

Eventual consistency does not guarantee ordering for reads. Writes are also propagated in an arbitrary fashion. This means that you can see data that never occurs. For our football example, under eventual consistency a reader might see the score {France: 0, Croatia: 1}. In reality, this event never happened as France scored before Croatia. If no more writes occur, the replicas will eventually catch up and become consistent with the leader.

A casual fan might opt for eventual consistency and read the final score hours or a day later. As there are no writes after the 90th minute, it is highly likely every replica would be updated with the final score when the fan decided to check the score. There is no need to choose a more consistent level than this.

Conclusion

Giving strong consistency gives up performance and availability. The question is then what is the minimum level of consistency that is tolerable to the program that gives you the best performance? Choosing the best consistency for your application requires application semantics and usage scenarios to be understood.

In reality, Cosmos DBs SLAs guarantee that read latency for all consistency levels is always guaranteed to be less than 10 milliseconds at the 99th percentile. In addition, Cosmos provides a Probabilistic Bounded Staleness (PBS) metric to quantify when eventual consistency is "good enough".