status
this chapter is in active development
expect live edits and rapid iteration (except for when i am really busy with other stuff) while this material is written.
status
this chapter is in active development
expect live edits and rapid iteration (except for when i am really busy with other stuff) while this material is written.
tcp gives you reliability, ordering, flow control, and congestion control. udp gives you ports and a checksum. that is it.
RFC 768 fits on a single page. the entire header is 8 bytes:
0 7 8 15 16 23 24 31
+--------+--------+--------+--------+
| source | destination |
| port | port |
+--------+--------+--------+--------+
| length | checksum |
+--------+--------+--------+--------+
| data octets ... |
+-----------------------------------+
no sequence numbers. no acknowledgments. no connection state. no handshake. no teardown. you shove a datagram into sendto() and the kernel ships it. whether it arrives, arrives once, arrives in order, or arrives at all is not udp's problem.
ports. same 16-bit source and destination space as tcp. the kernel uses them to demux incoming datagrams to the right socket.
checksum. covers the header and data. mandatory in ipv6 (RFC 8200), optional in ipv4 (but everyone enables it). detects corruption. does not detect loss, duplication, or reordering.
nothing else. no connection state, no buffer management, no retransmission, no flow control. a thin wrapper around raw ip with port-based multiplexing.
voice and video calls do not benefit from retransmission. by the time a lost audio frame is retransmitted, the conversation has moved on. playing a 200ms-stale frame sounds worse than skipping it. voip (rtp over udp), gaming netcode, and live video streaming all prefer dropping the occasional packet.
a dns query and response each fit in a single datagram. the entire transaction is two packets. tcp's three-way handshake would triple the latency for a 100-byte exchange. dns uses udp by default and only falls back to tcp for responses larger than the udp buffer (traditionally 512 bytes, now 4096+ with edns).
ntp and ptp send timestamped packets where the arrival time is the whole point. tcp's buffering and retransmission would destroy the timing signal. a delayed ntp packet is worse than a lost one.
tcp is point-to-point. there is no tcp multicast. if you need to send the same data to thousands of receivers simultaneously (stock feeds, live video, service discovery), udp is the only option at the transport layer.
most serious udp applications bolt on some form of reliability:
ack and retransmit. send a message, start a timer, retransmit if no response. dns resolvers do exactly this.
sequence numbers. track ordering and detect loss. rtp includes a sequence number in every packet.
forward error correction. send redundant data so the receiver can reconstruct lost packets without retransmission. used in video streaming where latency matters more than bandwidth.
selective retransmission. only retransmit what was lost, not everything after the gap. quic does this over udp; so do game engines.
at some point you have rebuilt half of tcp in userspace. that is the quic story: rather than building reliability from scratch, start with a well-designed userspace transport that does not have tcp's head-of-line blocking problem.
int fd = socket(AF_INET6, SOCK_DGRAM, 0);
SOCK_DGRAM instead of SOCK_STREAM. no connect() required; you can sendto() different destinations from the same socket.
struct sockaddr_in6 dest = { ... };
sendto(fd, buf, len, 0, (struct sockaddr *)&dest, sizeof(dest));
struct sockaddr_in6 src;
socklen_t srclen = sizeof(src);
ssize_t n = recvfrom(fd, buf, sizeof(buf), 0, (struct sockaddr *)&src, &srclen);
each recvfrom() returns one complete datagram and the sender's address. no stream reassembly. message boundaries are preserved.
calling connect() on a udp socket does not create a connection. it sets a default destination. after connecting, use send()/recv() instead of sendto()/recvfrom(). the kernel filters incoming datagrams to only deliver packets from the connected address.
connected udp sockets also cache the routing lookup instead of redoing it on every sendto(). for a dns resolver sending many queries to the same upstream, this matters.
connect(fd, (struct sockaddr *)&dest, sizeof(dest));
send(fd, buf, len, 0);
recv(fd, buf, sizeof(buf), 0);
nat devices track udp "connections" by source ip:port and destination ip:port. the binding is created on the first outbound packet and times out after inactivity. typical timeout is 30 seconds, compared to minutes or hours for tcp.
this matters for voip (gaps in audio longer than the nat timeout kill the binding), gaming (clients send periodic pings to keep it alive), and quic (connection ids exist partly because of this).
some carrier-grade nats are particularly aggressive. if your udp application works on the lan but fails on mobile networks, nat behavior is the first suspect.
reliable transfer. if you need every byte in order, use tcp or quic. reimplementing reliability on top of udp is a project, not a feature.
large payloads. udp datagrams over the path mtu get fragmented at the ip layer. ip fragmentation is brittle: one lost fragment means the entire datagram is lost, and some middleboxes drop fragments entirely. keep datagrams under 1280 bytes for ipv6.
congestion control. udp has none. an application blasting udp at line rate will congest the network and impact every other flow. if your udp application sends significant traffic, implement congestion control or risk being rate-limited by isps. RFC 8085 documents the requirements.
tcp handles the common case well. udp is for the cases where tcp's guarantees are either unnecessary or actively harmful. choosing a transport puts the decision framework in writing.