performance vs correctness

almost every architecture decision comes down to performance vs correctness. stronger guarantees cost something: latency, throughput, or both. faster systems usually accept weaker guarantees. knowing which one you actually need changes what you can safely trade away.

why the tradeoff exists

correctness requires coordination. if two nodes need to agree on a value before returning it to a caller, both nodes must exchange messages, wait for acknowledgment, and only then respond. that coordination has latency. it limits throughput because nodes spend time waiting on each other rather than processing new requests.

performance comes from avoiding coordination. serve reads from a local cache without confirming it is current. accept writes without waiting for all replicas to confirm. respond immediately with what you know rather than what you can prove. these decisions are fast precisely because they skip the expensive coordination steps.

the tradeoff is fundamental. it is not an implementation detail you can engineer away. coordination costs round trips, and round trips cost time.

correctness has degrees

not all correctness is equal. there is a spectrum:

linearizability: the strongest guarantee. every operation appears to take effect instantaneously at some point between its start and completion. the system behaves as if it were a single machine. this is expensive, achieving it requires coordination on every operation.

sequential consistency: operations happen in some global order consistent with each process's local order, but there is no real-time constraint. weaker than linearizability; still requires significant coordination.

causal consistency: operations that are causally related appear in the same order to all nodes. concurrent operations may appear in different orders at different nodes. much cheaper than linearizability.

eventual consistency: all nodes converge to the same value eventually, given no new updates. no guarantees about when, or about what you see in the meantime. very cheap, handles partitions well, requires applications to tolerate stale reads.

choosing a consistency model is choosing how much coordination your system does on every operation. stronger models mean more coordination; weaker models mean less, but put more burden on the application to handle inconsistency.

performance has dimensions

performance is not a single number. optimizing for one dimension often hurts another:

latency: time to process a single request. caching helps. coordination hurts.

throughput: requests processed per unit time. horizontal scaling helps. serialization hurts (queues, locks, ordered operations).

tail latency: p99, p999 latency. often dominated by coordination, garbage collection, retries. the mean and the tail diverge more as systems get more complex.

a decision that improves average latency might increase tail latency. a decision that improves throughput might degrade individual request latency under load. be specific about what you are measuring.

examples of the tradeoff in practice

database reads: reading from the primary gives you the most recent data (correctness). reading from a replica might return data that is 50ms behind (performance). for a shopping cart, you might accept stale reads. for a bank balance you are about to debit, you might not.

write acknowledgment: a database can acknowledge a write after durably storing it on one node (fast, but if that node dies before replication, data is lost) or after replication to multiple nodes (correct, slower). this is the fsync vs O_DSYNC vs in-memory-ack tradeoff in storage systems.

caches: a cache always trades correctness for performance. the question is only how much. short TTLs reduce staleness but increase cache misses. write-through caches maintain consistency but pay write latency. cache-aside patterns put the invalidation burden on the application. every caching decision is a bet about how much stale data you can tolerate.

event-driven systems: asynchronous processing is fast because it returns immediately, the work happens later. the tradeoff is that you cannot guarantee the work has completed when you return success to the caller. if the worker fails, you have told the client it succeeded. compensating for this requires idempotency, dead letter queues, and retry logic.

asking the right question

the common mistake is optimizing for performance when correctness is required, or over-engineering correctness when performance matters and the data is not critical.

before choosing a consistency model or caching strategy, answer:

what is the actual consequence of showing stale data here?
what is the actual consequence of a write acknowledgment that is not yet durable?
who bears the cost if the system is wrong?

a user who sees a like count that is 2 seconds out of date will not notice. a payment that is processed twice because the idempotency check used an eventually consistent store is a real problem. the cost of being wrong varies enormously across different parts of the same system.

design each component for the consistency it needs, not the strongest consistency you can achieve everywhere.