core concepts

every systems design decision is a tradeoff. you want the system to handle more load, but adding nodes introduces consistency problems. you want strong guarantees, but strong guarantees cost latency. you want it always available, but availability and consistency pull in opposite directions when the network splits.

this chapter covers the vocabulary for those tradeoffs. not as definitions to memorize, but as forces you actually reason about when deciding what to give up.

what this chapter covers

scalability is about how systems handle growth, and why "just add more servers" breaks down as soon as any state is involved.

reliability is about continuing to work correctly when things go wrong. hardware fails, software has bugs, operators make mistakes. a reliable system expects all of this.

availability is about the system being reachable when you need it. it is related to reliability but not the same thing, a system can be reliable and unavailable, or available and unreliable.

cap theorem is the formal statement of a constraint every distributed system lives under: when the network partitions, you cannot have both strong consistency and availability at the same time. understanding what you are actually choosing when you "pick two" changes how you think about distributed databases.

performance vs correctness is the tradeoff that sits under almost every other architecture decision. faster systems usually accept weaker guarantees. stronger guarantees usually cost throughput or latency. knowing which one you actually need determines what you can safely trade away.