Out-of-order packets

Measuring an actual transfer of The Birth & Death of JavaScript with the packet capture tool Wireshark, I see a total of 61,807 packets received, each 1,432 bytes. Multiplying those two, we get 88.5 megabytes, which is the size of the video. (This doesn't include the overhead added by various protocols; if it did, we'd see a slightly higher number.)

The transfer was done over HTTP, a protocol layered over TCP, the Transmission Control Protocol. It only took 14 seconds, so the packets arrived at an average rate of about 4,400 per second, or about 250 microseconds per packet. In 14 seconds, my machine received all 61,807 of those packets, possibly out-of-order, reassembling them into the full file as they came in.

TCP packet reassembly is done using the simplest imaginable mechanism: a counter. Each packet is assigned a sequence number when it's sent. On the receiving side, the packets are put in order by sequence number. Once they're all in order, with no gaps, we know the whole file is present.

(Actual TCP sequence numbers tend not to be integers simply increasing by 1 each time, but that detail isn't important here.)

How do we know when the file is finished, though? TCP doesn't say anything about that; it's the job of higher-level protocols. For example, HTTP responses contain a "Content-Length" header specifying the response length in bytes. The client reads the Content-Length, then keeps reading TCP packets, assembling them back into their original order, until it has all of the bytes specified by Content-Length. This is one reason that HTTP headers (and most other protocols' headers) come before the response payload: otherwise, we wouldn't know the payload's size.

When we say "the client" here, we're really talking about the entire receiving computer. TCP reassembly happens inside the kernel, so applications like web browsers and curl and wget don't have to manually reassemble TCP packets. But the kernel doesn't handle HTTP, so applications do have to understand the Content-Length header and know how many bytes to read.

With sequence numbers and packet reordering, we can transmit large sequences of bytes even if the packets arrive out-of-order. But what if a packet is lost in transit, leaving a hole in the HTTP response?

This is one section of The Programmer's Compendium's article on Network Protocols, which contains more details and context.