Building distributed systems: The basics
Distributed systems are difficult to build, due to challenges around consistency and sharding. This post discusses a few more approaches on how to solve the challenges around consistency.
Background:Creating a distributed database is always challenging but this challenge can be broken down into two parts:
- Fault Tolerance
Note: The order of these two parts can always be changed depending on if scaling or fault tolerance is the primary need for distributing a database.
Sharding:Sharding is segmenting a single dataset into two or more independent units each of which can be operated on in isolation without impacting the other. Sharding strategies can be as simple as modhash or key range based distribution or as complex as advanced implementations of consistent hashing. Let's say you have a MySQL database table storing users, now due to the number of users and transactional load on the table you have concluded that horizontal scaling is a viable option for you. To shard this database table, you decide to store A-M on server 1 and M-Z on server two. Now you have two database instances instead of 1 providing you the ability (in theory) to handle 2x load. Due to this design change, any applications read or writing to the original database now need to be aware of this sharding scheme. Also, you will likely need to make an outage window to perform this split and migrate maneuver. Depending on the growth of your application, you might need to split a few more times, essentially following a geometric progression on the number of instances. Such sharding mechanism is usually referred to as static sharding. Databases like HBase and Cassandra perform this dynamically, performing splits when a load increase is detected.
Replication:Replication is the fault tolerance topic. This tends to be a more important for business critical systems since zero downtime is expected and needed to keep businesses running 24x7. Replication essentially implies cloning data from Server 'A' to Server 'B', so if 'A' went down, 'B' would take over, keeping the lights on. Things get challenging now, because quantum entanglement is not how most computer systems run; it's not possible for data to be completely in sync between two separate servers at any given point in time. E.g. if you created an entry for a user X on Server 'A' at time t1, that entry won't appear on Server 'B' right at time t1. This means that at time t1, Server 'B' is in an inconsistent state, so if there was a power failure on Server 'A' 1 nanosecond after t1, Server 'B' still doesn't know about user X. This is the classic consistency problem, more complex versions of this problem can be studied under the Byzantine General's problem (https://en.wikipedia.org/wiki/Byzantine_fault_tolerance#Byzantine_Generals.27_Problem)
Solution:Since we have reviewed how replication and sharding can be challenging and have understood at a conceptual level how this would work, let's see how we can address some of these problems and make it easier for engineers to implement the above and make their system distributed.
The WAL:Write Ahead Log, commonly known as a WAL is a powerful construct leveraged by most database systems and stores a transaction log before committing an operation to the final storage layer. WAL provides durability and atomicity for transactions. In addition to these two capabilities, WAL plays critical role in creating distributed systems by providing a mechanism to replicate the transaction log of an instance instead of replicating snapshots of the database state. This is essentially state machine replication using logs and can provide serializable consistency. WAL can also be extended to track what the read and write offsets are which gives us what data is and isn't committed to the end storage layer.
Conclusion:So if we can replicate the WAL between two database servers we can simplify the complexity of replicating data thereby achieving fault tolerance. We can build a simple subsystem to read WAL on Server 'A' and replay that to Server 'B'. Problem solved? Wait, we have to worry about a few other things. What about consistency? How do we make sure that all replicas have identical WALs? Let's borrow a few pages from Kafka and wait for my next post.