Minuteman: Design Choices

Let's use this post to extend on how Minuteman works to solve the consistency and sharding problems. What's Minuteman? Minuteman is an append only clustered WAL replication library that can be used to created a distributed AP or CP database. It has leader election, replica assignment and built-in check pointing so all you need to do is plug your own single node database and make a sharding choice; Minuteman will take care of the rest for you. Push Vs. Pull A push based replication design is better if there are N keys where N ranges in the millions whereas a pull based replication design is better if there are N keys where N ranges in the hundreds. Pull systems are simpler for consistency management. Explanation: In a pull design the entity that is being pulled needs to be tracked, therefore if there are millions of them, there will be overhead of tracking these entities as well as keeping their pull cycles independent of each other so they don't impact each other. A

Building distributed systems: The basics

Distributed systems are difficult to build, due to challenges around consistency and sharding. This post discusses a few more approaches on how to solve the challenges around consistency. Background: Creating a distributed database is always challenging but this challenge can be broken down into two parts: Sharding Fault Tolerance Note: The order of these two parts can always be changed depending on if scaling or fault tolerance is the primary need for distributing a database. Sharding: Sharding is segmenting a single dataset into two or more independent units each of which can be operated on in isolation without impacting the other. Sharding strategies can be as simple as modhash or key range based distribution or as complex as advanced implementations of consistent hashing. Let's say you have a MySQL database table storing users, now due to the number of users and transactional load on the table you have concluded that horizontal scaling is a viable option for you. To

Consistency, the untold story

You know something things are easier said than done! Consistency in distributed systems is one such unicorn everyone's after. Most claim to have seen it but in reality their version of unicorn doesn't look like what the majority believes a unicorn must look like. Less than a year ago I started working on Sidewinder, my version of a fast and scalable time series database. While there is a lot to talk about learnings from my work on Sidewinder, let's keep that for another day in the future. Preface So back to our talk on consistency. What I want to shine light in this blog are the practical implemenational considerations for distributed consistency, so we won't be discussing linearizability or serializability in detail. There's a great blog / book that goes in the details of various aspects of this In this blog I am going to use the example of Sidewinder and compare that to consistency challenges and solutions used