Distributed Systems: Architecture, Coordination, and Consensus
At its core, a distributed system consists of multiple autonomous machines that communicate over a network to coordinate actions and share resources . Unlike centralized configurations where a single system acts as the sole compute node, distributed systems leverage parallelism to achieve scale, fault tolerance, and low latency.
However, moving from a single-node design to a distributed paradigm introduces fundamental engineering challenges. The system must coordinate state changes across nodes without relying on a shared physical memory or a highly precise global physical clock. Instead, nodes must rely on asynchronous message passing, which is prone to arbitrary delays, package loss, and complete connection drops.
Core Characteristics of Distributed Systems
- Concurrency of Components: System components execute code simultaneously on separate hardware instances.
- Lack of a Global Clock: Physical clocks drift over time, making it impossible to rely on wall-clock timestamps alone for ordering sequential actions without margin of error.
- Independent Failures: Individual nodes can fail (crash, reboot, or run out of memory) while the rest of the system remains operational.
The Fallacies of Distributed Computing
A major source of engineering bugs stems from the "Fallacies of Distributed Computing"—a set of common, false assumptions developers make:
- The network is reliable.
- network latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology does not change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
When a network partition occurs, these fallacies are laid bare, forcing system architectures to build robust fault-tolerant schemes.
Footnotes
-
Designing Data-Intensive Applications - Martin Kleppmann's authoritative book on distributed data systems design. ↩
Distributed Systems Explained | System Design Interview Basics
Theoretical Foundations: CAP & PACELC Theorems
To build predictable systems, engineers must navigate formal theoretical constraints. The most prominent of these is the CAP theorem . Formally stated:
Because physical networks are inherently prone to partitions (), a distributed database must choose one of two options when a partition occurs:
- Consistency (): The system rejects incoming read or write operations to prevent stale or conflicting data states across disconnected partitions. This prioritizes linearizability.
- Availability (): The system accepts read or write operations on any accessible node, prioritizing responsiveness at the cost of temporary data divergence. The system later reconciles states via eventual consistency mechanisms.
Expanding CAP: The PACELC Theorem
The CAP theorem only describes system behavior during an active network partition. The PACELC theorem extends this by describing trade-offs under normal execution:
| System | Partition Behavior | Normal Operation Behavior | Archetype |
|---|---|---|---|
| MongoDB | Consistent () | Consistent () | PC/EC |
| Cassandra | Available () | Latency () | PA/EL |
| Spanner | Consistent () | Consistent () | PC/EC |
Footnotes
-
CAP Theorem Introduction - Overview of Brewer's CAP Theorem constraints in database design. ↩
The 'CA' System Myth
Many designers claim their database is a 'CA' system because it operates on highly reliable enterprise hardware. However, a CA system is mathematically impossible across a real network because physical connections can always fail. A network partition () is a physical reality, not an architectural choice.
Replication Protocol Latency Trade-offs
Average write latency (ms) vs replication consensus depth (N=3, N=5, N=9)
Raft Consensus Leader Election Workflow
- 1Step 1
All nodes start in the Follower state. Each node maintains a randomized election timeout (typically 150ms - 300ms). If a follower fails to receive heartbeats from an active leader before the timeout expires, it assumes the leader has failed and transitions to the Candidate state .
Footnotes
-
Raft Consensus Algorithm - The original paper and visualization resource for the Raft consensus protocol. ↩
-
- 2Step 2
The follower increments its current epoch term, votes for itself, and broadcasts RequestVote Remote Procedure Calls (RPCs) to all other nodes in the network cluster. This initiates a new election phase.
- 3Step 3
Nodes receive the RequestVote RPC and grant their vote if the candidate's log is at least as up-to-date as the receiver's own log, and the receiver hasn't already voted in this term. The candidate waits for a quorum: votes.
- 4Step 4
Once a candidate secures a quorum of votes, it ascends to the Leader state. It immediately begins broadcasting AppendEntries RPCs (heartbeats) to all peer nodes to establish authority and prevent new elections.
Preventing Split-Brain Scenarios
By enforcing a strict quorum size of , the system prevents split-brain scenarios where two independent partitions of nodes both elect their own leader simultaneously. Two overlapping majorities cannot exist in a system of size .
Consistent hashing maps both data keys and cluster nodes to a continuous circular coordinate space called a hash ring.
Key Benefits:
- Minimizes data movement when nodes are added or removed (only keys relocated, where is total keys and is node count).
- Utilizes virtual nodes (vnodes) to prevent hotspotting and achieve uniform load distribution across heterogeneous hardware.
Advanced Distributed System Edge Cases
Knowledge Check
Under the PACELC theorem, what does a system like Apache Cassandra prioritize in the absence of network partitions (the 'Else' condition) if it is configured for low latency?
Explore Related Topics
Computer Network
A computer network is a packet‑switched system that interconnects devices to exchange data using layered protocols such as OSI/TCP‑IP, enabling addressing, routing, and a range of services across different scopes.
- Core concepts: encapsulation/decapsulation, bandwidth, latency, with transfer time ≈ and throughput ≤ .
- OSI (7 layers) and TCP/IP (4 layers) models map functions from physical signaling up to application protocols (e.g., HTTP, DNS).
- Addressing hierarchy: MAC for local delivery, IP for routing, DNS for name resolution, DHCP for automatic configuration, and NAT for private‑public translation.
- Transport choices: TCP provides reliable, ordered delivery with congestion control; UDP offers low‑overhead, best‑effort delivery for latency‑sensitive apps.
- Security fundamentals include firewalls, VPNs, TLS, and access controls, which must be balanced against performance and usability.
Mastering Vector Databases: Architecture, Indexing, and Retrieval
Vector databases store high‑dimensional embeddings and enable fast semantic search by converting unstructured data into vectors and retrieving nearest neighbors with approximate nearest‑neighbor (ANN) algorithms.
- ANN indexes (Flat, IVF, HNSW) trade off query speed, recall, memory, and scalability.
- Similarity is measured with Euclidean distance, Cosine similarity, or Dot product; the chosen metric must match the embedding model’s training.
- The query lifecycle: vectorization → index traversal → similarity computation → filtering → top‑K results.
- IVF performance hinges on tuning the number of centroids (nlist) and probes (nprobe).
- Metric mismatches can severely degrade retrieval accuracy.
Microservices Architecture: Design Principles, Patterns, and Best Practices
Microservices architecture breaks applications into independent, domain‑focused services, offering scalability, agility, and fault isolation compared with monolithic designs.
- Microservices use bounded contexts, loose coupling, and high cohesion to enable polyglot, independently deployable services.
- Key patterns include the API Gateway for unified entry, Database‑per‑Service for data ownership, and the Strangler Fig for incremental migration.
- Avoid “distributed monoliths” by fully decoupling databases and eliminating synchronous chains.
- Challenges such as cross‑service transactions, service discovery, and debugging are addressed with the Saga pattern, discovery registries, and distributed tracing.
- The “smart endpoints, dumb pipes” principle keeps business logic inside services, not in the communication layer.
