status
this chapter is in active development
expect live edits and rapid iteration (except for when i am really busy with other stuff) while this material is written.
status
this chapter is in active development
expect live edits and rapid iteration (except for when i am really busy with other stuff) while this material is written.
dns is the "first liar wins" protocol from RFC 1034/RFC 1035.
every request waits here until somebody hands back an ip, so stop pretending it is instant.
this is the first phase of the request. nothing else happens until dns answers.
your code yells getaddrinfo() and blocks.
libc checks /etc/nsswitch.conf to decide if /etc/hosts, mdns, ldap, or dns goes first.
then it reads /etc/resolv.conf (or /run/systemd/resolve/stub-resolv.conf) to find a recursive resolver.
a udp packet leaves on port 53.
no, your app does not retry; the stub does.
when that resolver ghosts you, the process just sits there looking foolish.
step one: the recursive resolver hits its cache.
ttl enforcement lives here, so careless ttls become everyone's pain.
step two: cache miss means walking from the root, to the tld, to the zone's authoritative servers, exactly like RFC 1034 section 5 diagrams.
step three: the resolver stores the rrset with that ttl and replies.
if dnssec is on, RFC 4033-4035 force it to validate signatures before handing anything back.
zones dish out soa, ns, a/aaaa, cname, txt, and friends.
for this chapter we only need a/aaaa, but cdns still chain cnames because steering people around the planet is fun.
every cname means another lookup and more latency before tcp even wakes up.
short ttl values are not random; they are the steering wheel cdns use when they feel like moving your traffic tonight.
operating systems cache answers (systemd-resolve --statistics, scutil --dns).
browsers cache too, usually with their own ttl caps.
recursive resolvers may even serve stale data per RFC 8767.
great when an authoritative server dies, maddening when you are waiting for a new record to propagate.
dig example.org
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 854
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;example.org. IN A
;; ANSWER SECTION:
example.org. 125 IN A 23.220.75.238
example.org. 125 IN A 23.215.0.132
example.org. 125 IN A 23.215.0.133
example.org. 125 IN A 23.220.75.235
;; Query time: 120 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
the header tells you status: NOERROR (it worked) and flags: qr rd ra (query response, recursion desired, recursion available).
the question section echoes what you asked for: an A record for example.org.
the answer section has four A records, each with a ttl of 125 seconds. that ttl is counting down from whatever the authoritative server originally set. when it hits zero, the resolver has to ask again.
Query time: 120 msec is how long your resolver took to answer. if this is high, either the cache was cold or your resolver is slow. SERVER: 127.0.0.53 means you are talking to systemd-resolved, not a real recursive resolver directly.
try dig +norecurse example.org to see what your resolver already cached. try dig @1.1.1.1 example.org to bypass your local resolver entirely and compare answers.
once you understand this output, the pause before tcp starts stops being mysterious.
it is just dns doing its slow, distributed phone book routine.