Networking meets the real world

Digital systems don't exist; everything is analog.

Suppose we have a 5-volt CMOS system. (CMOS is a type of digital system; don't worry about it if you're not familiar.) This means that a fully-on signal will be 5 volts, and a fully-off signal will be 0. But nothing is ever fully on or fully off; the physical world doesn't work like that. In reality, our 5-volt CMOS system will consider anything above 1.67 volts to be a 1, and anything below 1.67 to be 0.

(1.67 is 1/3 of 5. Let's not worry about why the threshold is 1/3. If you want to dig, there's a wikipedia article, of course! Also, Ethernet isn't CMOS or even related to CMOS, but CMOS and its 1/3 cutoff make for a simple illustration.)

Our Ethernet packets have to go over a physical wire, which means changing the voltage across the wire. Ethernet is a 5-volt system, so we naively expect each 1 bit in the Ethernet protocol to be 5 volts and each 0 bit to be 0 volts. But there are two wrinkles: first, the voltage range is -2.5 V to +2.5 V. Second, and more strangely, each set of 8 bits gets expanded into 10 bits before hitting the wire.

There are 256 possible 8-bit values and 1024 possible 10-bit values, so imagine this as a table mapping them. Each 8-bit byte can be mapped to any of four different 10-bit patterns, each of which will be turned back into the same 8-bit byte on the receiving end. For example, the 10-bit value 00.0000.0000 might map to the 8-bit value 0000.0000. But maybe the 10-bit value 10.1010.1010 also maps to 0000.0000. When an Ethernet device sees either 00.0000.0000 or 10.1010.1010, they'll be understood as the byte 0 (binary 0000.0000).

(Warning: there are going to be some electronics words now.)

This exists to serve an extremely analog need: balancing the voltage in the devices. Suppose this 8-bit-to-10-bit encoding doesn't exist, and we send some data that happens to be all 1s. Ethernet's voltage range is -2.5 to +2.5 volts, so we're holding the Ethernet cable's voltage at +2.5 V, continually pulling electrons from the other side.

Why do we care about one side pulling more electrons than the other? Because the analog world is a mess and it will cause all kinds of undesirable effects. To take one: it can charge the capacitors used in low-pass filters, creating an offset in the signal level itself, eventually causing bit errors. Those errors would take time to accumulate, but we don't want our network devices to suddenly corrupt data after two years of uptime simply because we happened to send more binary 1s than 0s.

(Electronics words end here.)

By using an 8b/10b encoding, Ethernet can balance the number of 0s and 1s sent over the wire, even if we send data that's mostly 1s or mostly 0s. The hardware tracks the ratio of 0s to 1s, mapping outgoing 8-bit bytes to different options from the 10-bit table to achieve electrical balance. (Newer Ethernet standards, like 10 GB Ethernet, use different and more complex encoding systems.)

We'll stop here, because we're already beyond the scope of what can be considered programming, but there are many more protocol issues to accommodate the physical layer. In many cases, the solutions to hardware problems lie in the software itself, as in the case of the 8b/10b coding used to correct DC offset. This is perhaps a bit disconcerting to us as programmers: we like to pretend that our software lives in a perfect Platonic world, devoid of the vulgar imperfections of physicality. In reality, everything is analog, and accommodating that complexity is everyone's job, including the software's.

This is one section of The Programmer's Compendium's article on Network Protocols, which contains more details and context.