Consistency, the untold story

You know something things are easier said than done! Consistency in distributed systems is one such unicorn everyone's after. Most claim to have seen it but in reality their version of unicorn doesn't look like what the majority believes a unicorn must look like.

Less than a year ago I started working on Sidewinder, my version of a fast and scalable time series database. While there is a lot to talk about learnings from my work on Sidewinder, let's keep that for another day in the future.


Preface

So back to our talk on consistency. What I want to shine light in this blog are the practical implemenational considerations for distributed consistency, so we won't be discussing linearizability or serializability in detail. There's a great blog / book that goes in the details of various aspects of this http://book.mixu.net/distsys/single-page.html

In this blog I am going to use the example of Sidewinder and compare that to consistency challenges and solutions used by other distributed databases like Kafka, Cassandra, Elasticsearch etc. More specifically, I will be focusing on AP systems.

Few Concepts First

Let me briefly describe some terms, I will be frequently referring to in this blog.
Leader/Master: the primary server handling a partition or shard
Coordinator: the cluster wide master for metadata
Replica: one of the slaves for a shard that holds a read-only copy of the data


Consistency Zoomed In

The CAP theorem has always been sought after as the Gospel for building distributed systems. What the CAP theorem doesn't make obvious is that Consistency is not boolean and it's constraints can be varied based on use cases even at the lowest level.

Before diving deeper, let's understand the concept of divergence. In a distributed storage system, we usually replicate data to avoid the issue of data loss due to failures, this corresponds to the 'A' in CAP theorem. The amount of replication can be set at different levels depending on the type of system in question via a replication-factor variable. E.g. elasticsearch sets it at the index level, what actually has replicas are shards; Kafka does it at topic level, what actually has replicas are Kafka partitions etc. The problem of divergence arises when replicas go out of sync, simply speaking these two copies of the data actually start to have different data points.

How can that be you ask? It's because irrespective of the type of AP system in question there can be following scenarios:

  1. Shard master receives data point but is unable to write data to replica (push replication) due to temporary network issue
  2. Replica is unable to read data point from Shard master (pull replication) due to network issue
  3. Replica has power / process crash
  4. Replica has process stall (garbage collection etc.)


Any of these situations will lead to a divergent replica iff the master accepts and acknowledges a write. If the write is rejected and rolled back on all the nodes, no harm done however, most systems don't exactly have a rollback design because a true system to guarantee such a thing would require the dreaded 2 Phase Commit which will likely ruin the fundamental reason for using distributed systems i.e. linear performance increase. BTW, 2PC basically means our AP system has morphed into an CA system and we have lost partitioning/sharding ability.

There is one more option, introduce and intermediate system called a WAL. Let's review the solutions to these problems in the next few blog posts.


Comments

  1. Thanks for publishing such great information. You are doing such a great job. This information is very helpful for everyone. Keep sharing about duct cleaning barrington. Thanks.

    ReplyDelete

Post a Comment

Popular posts from this blog

Minuteman: Design Choices

Building distributed systems: The basics