Destroy All Software Blog

A Case Study in Not Being A Jerk in Open Source

2018-06-21T00:00:00Z

Here's a mailing list message written by Linus Torvalds, original author and maintainer of the Linux kernel. It's unnecessarily mean. It also contains strong language, so probably don't put this on text-to-speech unless you want people around you to hear profanity.

The full email is reproduced below. We're also going to go through it piece by piece, so you can skip past this block quote if you don't mind seeing the email revealed piece by piece.

Honestly, this looks questionable to me.

I'm not talking about the changes themselves - I can live with them. But the _rationale_ is pure and utter garbage, and dangerously so.

The fact is, using a union to do type punning is the traditional AND STANDARD way to do type punning in gcc. In fact, it is the *documented* way to do it for gcc, when you are a f*cking moron and use "-fstrict-aliasing" and need to undo the braindamage that that piece of garbage C standard imposes.

So the commit message that talks about how horrible union aliasing is is pushing a story that is simply wrong. Using the standard to push it - the same standard that came up with the completely mis-guided aliasing rules - is not a valid argument.

Andy, what is the background for trying to push this idiocy? Don't tell me "the C standard is unclear". The C standard is _clearly_ bogus shit (see above on strict aliasing rules), and when it is bogus garbage, it needs to be explicitly ignored, and it needs per-compiler workarounds for braindamage. The exact same situation is true when there is some lack of clarity.

This is why we use -fwrapv, -fno-strict-aliasing etc. The standard simply is not *important*, when it is in direct conflict with reality and reliable code generation.

The *fact* is that gcc documents type punning through unions as the "right way". You may disagree with that, but putting some theoretical standards language over the *explicit* and long-time documentation of the main compiler we use is pure and utter bullshit.

I've said this before, and I'll say it again: a standards paper is just so much toilet paper when it conflicts with reality. It has absolutely _zero_ relevance. In fact, I'll take real toilet paper over standards any day, because at least that way I won't have splinters and ink up my arse.

So I want to see actual real arguments, not "the standard is unclear". When documented gcc behavior says one thing, and the standard might be unclear, we really don't care one whit about the lack of clarity in some standard.

So what's the _real_ reason for avoiding union aliasing?

There are competent people on standards bodies. But they aren't _always_ competent, and the translation of intent to English isn't always perfect either. So standards are not some kind of holy book that has to be revered. Standards too need to be questioned.

There's a lot of anger and frustration and profanity here: "bullshit", "f*cking moron", "piece of garbage", "splinters and ink up my arse", etc. When programmers read emails like this, there are three main types of response:

"This is bad. Nothing would be lost if Linus stopped doing this." If you're in this group, this post won't contain anything new for you.
"This seems bad, but I'm afraid that something will be lost if he stopped doing this." This blog post is written primarily for you.
"This isn't bad. People should learn to tolerate this or get off the Internet."

If you're in that third group, you might have noticed that your position is becoming less popular over time. Perhaps you aren't bothered at all by people calling your work "utter bullshit", calling you or others "f*cking morons", etc.

As you've noticed, other people are bothered by this stuff. If you insult people in professional interactions, you'll find yourself increasingly alienated and excluded simply because people don't like being insulted! (Though in an ideal world you'd avoid insulting people because it makes them feel bad!)

Let's rewrite this email to be less mean without removing any of the content. The full email will be reprinted here piece by piece. We won't do much editing for grammar or meaning; we won't remove any points that are made in the original; and we won't add any exposition. We're not trying to make it perfect.

First:

Honestly, this looks questionable to me.

I'm not talking about the changes themselves - I can live with them. But the _rationale_ is pure and utter garbage, and dangerously so.

The word "honestly" here doesn't mean "I am about to tell you the truth." It's the other meaning of "honestly": an expression of exasperation and disapproval, as in "honestly, did you think you were going to get away with that?" There's no reason to include that, or the "pure and utter garbage", or even the "I can live with them". All of those just mean "I dislike this."

The point of the email is that the rationale for the code change in question is wrong, and even dangerous. We can just say:

These changes look OK, but I'm not sure about the rationale.

Next:

The fact is, using a union to do type punning is the traditional AND STANDARD way to do type punning in gcc. In fact, it is the *documented* way to do it for gcc, when you are a f*cking moron and use "-fstrict-aliasing" and need to undo the braindamage that that piece of garbage C standard imposes.

So the commit message that talks about how horrible union aliasing is is pushing a story that is simply wrong. Using the standard to push it - the same standard that came up with the completely mis-guided aliasing rules - is not a valid argument.

We probably don't need to talk about "f*cking moron". The caps in "AND STANDARD" is another way to indicate frustration, like the "honestly" above. Likewise for "the fact is", the emphasis on "*documented*", "braindamage", "piece of garbage", and "pushing a story that is simply wrong". None of these carry any meaning about the technical problem; they're just expressions of anger. "Horrible" is a reference to what someone else said, so we'll leave it in.

Including all of this anger doesn't help the project or the reader. It takes up space and it makes the reader feel bad. We can argue that it "helps" people on Reddit who read the thread and think it's funny. But if you want to read people yelling about how bad code is, there are blogs for that; it doesn't need to be done on the mailing list for one of the most important software projects in the world.

Most critically, it's possible to describe the danger of Andy's rationale without the yelling:

Unions are the traditional and standard way to do type punning in gcc. They're also the documented way to do it for gcc when you use "-fstrict-aliasing" and need to undo the behavior imposed by the C standard.

I think that the commit message that talks about how horrible union aliasing is is wrong. Justifying this with the standard doesn't work because the standard specifies undesirable behavior to begin with.

I left the "is wrong" in. A lot of people would leave this out, but sometimes things really are wrong, and sometimes it's important. Given Linus' tone, he clearly thinks that it's important here, and I have tremendous faith in Linus as a C programmer.

I added "I think that..." because Linus acknowledges the possible existence of justifications that don't appeal to the C standards. He even asks Andy to provide those justifications, which we'll see soon. An unqualified "this is wrong" is inconsistent with that, so there's no need to use that harsher form.

Next, Linus yells at Andy directly:

Andy, what is the background for trying to push this idiocy? Don't tell me "the C standard is unclear". The C standard is _clearly_ bogus shit (see above on strict aliasing rules), and when it is bogus garbage, it needs to be explicitly ignored, and it needs per-compiler workarounds for braindamage. The exact same situation is true when there is some lack of clarity.

None of that serves any purpose; we already said that the standard specifies dangerous behavior that we don't want. We can drop all of this.

Next:

This is why we use -fwrapv, -fno-strict-aliasing etc. The standard simply is not *important*, when it is in direct conflict with reality and reliable code generation.

The *fact* is that gcc documents type punning through unions as the "right way". You may disagree with that, but putting some theoretical standards language over the *explicit* and long-time documentation of the main compiler we use is pure and utter bullshit.

I've said this before, and I'll say it again: a standards paper is just so much toilet paper when it conflicts with reality. It has absolutely _zero_ relevance. In fact, I'll take real toilet paper over standards any day, because at least that way I won't have splinters and ink up my arse.

The first paragraph here is a very strong argument, and it manages not to berate anyone! After that, Linus is back on his bullshit, both literally and figuratively. We can cut most of it, collapsing the second paragraph into one sentence and removing the third paragraph completely:

This is why we use -fwrapv, -fno-strict-aliasing etc.: if the standard conflicts with reliable code generation, the standard loses. GCC documents type punning through unions as the "right way", which I agree with. You may disagree, but if so you need to provide a justification beyond theoretical standards language.

That call to action at the end is important: if Andy disagrees, he should make his argument, but without appealing blindly to the standard. In the original, that call to action gets lost in all of the "bullshit"s and *emphasis* and _other emphasis_ and OTHER EMPHASIS and toilet paper.

Next:

So I want to see actual real arguments, not "the standard is unclear". When documented gcc behavior says one thing, and the standard might be unclear, we really don't care one whit about the lack of clarity in some standard.

This is just repetition; we already said that reliable code generation is more important than the standard. We can drop that paragraph.

Next:

So what's the _real_ reason for avoiding union aliasing?

There are competent people on standards bodies. But they aren't _always_ competent, and the translation of intent to English isn't always perfect either. So standards are not some kind of holy book that has to be revered. Standards too need to be questioned.

Suddenly Linus flips back from ranting to sensible advice. I like the "what's the real reason?" question, because it reiterates Linus' request for justification. If Andy doesn't want to follow GCC's recommendations on aliasing, he needs to provide an actual argument, not a simple appeal to authority. We'll leave the repetition in because this is both the core of the email and a genuine question that Linus is asking. I took a bit more editorial liberty here than in the sections above because this part is important:

There are competent people on standards bodies. But they make mistakes, and the translation of intent to English isn't always perfect. In this case, the standard conflicts with reality, so we ignore the standard. With that in mind, is there another, practical reason for avoiding union aliasing?

Now, here's the fully rewritten email:

These changes look OK, but I'm not sure about the rationale.

Unions are the traditional and standard way to do type punning in gcc. They're also the documented way to do it for gcc when you use "-fstrict-aliasing" and need to undo the behavior imposed by the C standard.

I think that the commit message that talks about how horrible union aliasing is is wrong. Justifying this with the standard doesn't work because the standard specifies undesirable behavior to begin with.

This is why we use -fwrapv, -fno-strict-aliasing etc.: if the standard conflicts with reliable code generation, the standard loses. GCC documents type punning through unions as the "right way", which I agree with. You may disagree, but if so you need to provide a justification beyond theoretical standards language.

There are competent people on standards bodies. But they make mistakes, and the translation of intent to English isn't always perfect. In this case, the standard conflicts with reality, so we ignore the standard. With that in mind, is there another, practical reason for avoiding union aliasing?

This is a much better email. It has 43% as many words, but loses none of the meaning. It's still forceful and unambiguous. With fewer words, it's easier for someone to absorb the core message about unthinking deference to standards.

It also doesn't berate anyone, building a needlessly antagonistic culture around the project. Writing this email instead of the original email doesn't require any extra work, and will save mileage on Linus' (or your) fingers besides.

If you were in the "I'm afraid that being nicer would hurt Linux" group, do you think that this email is worse? Is there any risk of a reader not understanding that the author disapproves of their reasoning and thinks that it's dangerous?

The Biggest and Weirdest Commits in Linux Kernel Git History

2017-02-12T00:00:00Z

We normally think of git merges as having two parent commits. For example, the most recent Linux kernel merge as I write this is commit 2c5d955, which is part of the run-up to release 4.10-rc6. It has two parents:

2c5d955 Merge branch 'parisc-4.10-3' of ...
|
*- 2ad5d52 parisc: Don't use BITS_PER_LONG in use ...
*- 53cd1ad Merge branch 'i2c/for-current' of ...

Git also supports octopus merges, which have more than two parents. This seems strange for those of us who work on smaller projects: wouldn't a merge with three or four parents be confusing? Well, it depends. Sometimes, a kernel maintainer needs to merge dozens of separate histories together at once. Having 30 merge commits, one after another, would be more confusing than a single 30-way merge, especially if that 30-way merge was conflict-free.

Octopuses are more common than you might expect. There are 649,306 commits in the kernel's history. 46,930 (7.2%) are merges. Of the merges, 1,549 (3.3%) are octopus merges. (This is as of commit 566cf87, which is my current HEAD.)

$ git log --oneline | wc -l
   649306
$ git log --oneline --merges | wc -l
   46930
$ git log --oneline --min-parents=3 | wc -l
    1549

As a comparison point, 20% of all Rails commits are merges (12,401 out of 63,111), but it has zero octopus merges. Rails is probably more representative of the average project; I expect that most git users don't know that octopus merges are even possible.

Now, the obvious question: how big do these octopus merges get? The ">" lines here are continuations; the command is written in five lines total. All of the commands in this post are as I typed them into the terminal while experimenting, so they're not necessarily easy to read. I'm more interested in the conclusions and include code only for the curious.

$ (git log --min-parents=2 --pretty='format:%h %P' |
>  ruby -ne '/^(\w+) (.*)$/ =~ $_; puts "#{$2.split.count} #{$1}"' |
>  sort -n |
>  tail -1)
66 2cde51f

66 parents! That's a lot of parents. What happened?

$ git log -1 2cde51f
commit 2cde51fbd0f310c8a2c5f977e665c0ac3945b46d
Merge: 7471c5c c097d5f 74c375c 04c3a85 5095f55 4f53477
2f54d2a 56d37d8 192043c f467a0f bbe5803 3990c51 d754fa9
516ea4b 69ae848 25c1a63 f52c919 111bd7b aafa85e dd407a3
71467e4 0f7f3d1 8778ac6 0406a40 308a0f3 2650bc4 8cb7a36
323702b ef74940 3cec159 72aa62b 328089a 11db0da e1771bc
f60e547 a010ff6 5e81543 58381da 626bcac 38136bd 06b2bd2
8c5178f 8e6ad35 008ef94 f58c4fc4 2309d67 5c15371 b65ab73
26090a8 9ea6fbc 2c48643 1769267 f3f9a60 f25cf34 3f30026
fbbf7fe c3e8494 e40e0b5 50c9697 6358711 0112b62 a0a0591
b888edb d44008b 9a199b8 784cbf8
Author: Mark Brown <[email redacted for privacy]>
Date:   Thu Jan 2 13:01:55 2014 +0000

    Merge remote-tracking branches [65 remote branch names]

This broke some history visualization tools, provoking a reaction from Linus Torvalds:

I just pulled the sound updates from Takashi, and as a result got your merge commit 2cde51fbd0f3. That one has 66 parents.

[...]

It's pulled, and it's fine, but there's clearly a balance between "octopus merges are fine" and "Christ, that's not an octopus, that's a Cthulhu merge".

From what I can see, this unusual 66-parent commit was an otherwise mundane merge of various changes to the ASoC code. ASoC stands for ALSA System on Chip. ALSA is the sound subsystem; "system on a chip" is a term for a computer packed into a single piece of silicon. Putting those together, ASoC is sound support for embedded devices.

Now, how often do merges like this happen? Never! The second-place merge is fa623d1 with "only" 30 parents. However, the large distance from 30 to 66 parents isn't surprising with sufficient context.

The number of parents for a git commit is probably distributed according to a fat one-sided distribution (often informally called a power law distribution, but that's usually not strictly correct for reasons that aren't interesting here). Many properties of software systems fall into fat one-sided distributions. Hold on; I'll generate a plot to be sure... (much nitpicking of chart layout ensues). Yes, it's fat and one-sided:

To be terse and coarse about it, "fat one-sided" means that there are far more small things than large things, but also that the maximum size of the things is unbounded. The kernel contains 45,381 two-parent merges, but only one 66-parent merge. Given enough additional development history, we can expect to see a merge with more than 66 parents.

Lines of code per function or per module are also fat and one-sided (most functions and modules will be small, but some will be large; think of a "User" class in a web app). Likewise for the rate of change for modules (most modules will change infrequently, but some will change constantly; think of "User" again). These distributions pop up everywhere in software, appearing as straight lines on log-log plots like this one.

OK, so much for the biggest merge in terms of parent count. What about the biggest merge in terms of divergence? By divergence, I mean the difference between the two branches being merged. We can measure that by simply diffing the merge's parents against each other and counting the lines in the diff.

For example, if a branch diverged from master a year ago, changed one line, and then was merged back into master, all of the changes to master during that time would be counted, as would the changes on our branch. We can come up with more intuitive notions of divergence, but they're difficult or impossible to calculate because git doesn't retain branch metadata.

In any case, as a starting point for calculating divergence, here's the divergence for the most recent kernel merge:

$ git diff $(git log --merges -1 --pretty='format:%P') | wc -l
     173

In English, this command reads: "diff the two parents of the most recent merge against each other, then count the lines." To find the most-diverged merges, we can loop through every merge commit, counting the number of diff lines in a similar way. Then, as a test, we'll search the results for all merges with exactly 2,000 lines of divergence.

$ (git log --merges --pretty="%h" |
   while read x; do
     echo "$(git diff $(git log --pretty=%P $x -1) | wc -l)" $x
   done > merges.txt)
$ sort -n merges.txt | grep '\b2000\b'
    2000 3d6ce33
    2000 7fedd7e
    2000 f33f6f0

(This command takes a long time to run: around twelve hours, I think, though I was away for much of it.)

I expect merge size to follow a fat one-sided distribution, just like the parent counts did. It should show up as a straight line on a log-log plot. Let me check... yep:

(I've binned the diff sizes by rounding them into 1,000-line buckets; otherwise there aren't enough samples to form a useful curve.)

The bottom right is ugly partly due to quantization and partly due to small sample sizes caused by a lack of huge commits, as with the previous plot.

Now, the obvious question: what's the most-diverged merge in history?

$ sort -n merges.txt | tail -1
 22445760 f44dd18

22,445,760 lines of diff! This seems impossibly large – the diff is longer than the entire source code of the kernel.

Greg Kroah-Hartman made this commit on September 19, 2016, during development of 4.8-rc6. Greg is one of Linus Torvalds' "lieutenants" – his close, trusted developers. Roughly speaking, lieutenants form the first level of the Kernel's pull request tree. Greg maintains the stable branch of the kernel, the driver core, the USB subsystem, and several other subsystems.

We need a bit of background before examining this merge more closely. Normally, we think of merges as part of a diamond branch-then-merge pattern:

  A
 / \
B   C
 \ /
  D

Back in 2014, Greg started development on Greybus (a bus for mobile devices) in a fresh repo, as if he were starting a totally new project. Eventually, development on Greybus was finished, and it was merged into the kernel. But, because it was started in a fresh repo, it shared no history with the rest of the kernel source. That merge added another "initial commit" to the kernel, in addition to the commit back in 2005 that we normally think of as the initial commit. Instead of the usual diamond branch-and-merge pattern, the repo now had two separate initial commits:

  A
 / \
B   C

We can see some evidence of this by looking at how many files exist in each of the merge commit's parents:

$ git log -1 f44dd18 | grep 'Merge:'
Merge: 9395452 7398a66
$ git ls-tree -r 9395452 | wc -l
   55499
$ git ls-tree -r 7398a66 | wc -l
     148

One side has a lot of files because it contains the entire kernel source. The other contains few because it's a separate history containing only Greybus.

Like octopus merges, this will strike some git users as strange. But the kernel developers are expert git users and tend to use its features with abandon, though certainly not reckless abandon.

One final question: how many times has this happened? How many separate "initial" commits does the kernel have? Four, as it turns out:

$ git log --max-parents=0 --pretty="format:%h %cd %s" --date=short
a101ad9 2016-02-23 Share upstreaming patches
cd26f1b 2014-08-11 greybus: Initial commit
be0e5c0 2007-01-26 Btrfs: Initial checkin, basic working tree code
1da177e 2005-04-16 Linux-2.6.12-rc2

Just to be clear, if we drew these commits, ignoring all other history, it would look like the graph below.

566cf87 (the current HEAD)
| | | |
| | | *- a101ad9 Share upstreaming patches
| | |
| | *- cd26f1b greybus: Initial commit
| |
| *- be0e5c0 Btrfs: Initial checkin, basic working tree code
|
*- 1da177e Linux-2.6.12-rc2

Each of these four is a distant ancestor of the current kernel HEAD, and none of them has a parent commit. From git's perspective, the kernel history "begins" four different times, with all of those eventually being merged together.

The first of these four (at the bottom of our output) is what we usually think of as the initial commit to git back in 2005. The second is the development of the file system btrfs, which was done in isolation. The third is Greybus, also done in isolation, which we already saw.

The fourth initial commit, a101ad9, is weird. Here it is:

$ git show --oneline --stat a101ad9
a101ad9 Share upstreaming patches
 README.md | 2 ++
 1 file changed, 2 insertions(+)

It just creates a file README.md. But then, it's immediately merged into the normal kernel history in commit e5451c8!

$ git show e5451c8
commit e5451c8f8330e03ad3cfa16048b4daf961af434f
Merge: a101ad9 3cf42ef
Author: Laxman Dewangan <ldewangan@nvidia.com>
Date:   Tue Feb 23 19:37:08 2016 +0530

Why would someone create a new initial commit containing a two-line README file, then immediately merge it into the mainline history? I can't come up with any reason, so I suspect that this was an accident! It doesn't do any harm, though; it's just very strange. (Update: it was an accident, which Linus responded to in his usual fashion.)

Incidentally, this is also the second-most-diverged commit in the history, simply because it's a merge of an unrelated commit, just like the Greybus merge that we looked at more closely.

There you have it: some of the weirdest things in the Linux kernel's git history. There are 1,549 octopus merges, one of which has 66 parents. The most heavily diverged merge has 22,445,760 lines of diff, though it's a bit of a technicality because it shares no history with the rest of the repo. The kernel has four separate "initial" commits, one of which was a mistake. None of this will show up in the vast majority of git repos, but all of it is well within git's design parameters.

(If you liked this post, you might like the Destroy All Software screencasts, several of which build up the kinds of complex shell commands seen in this post, methodically and piece by piece. "History Spelunking With Unix" is especially relevant.)

Announcing The Programmer's Compendium

2017-02-08T00:00:00Z

As part of my plan to branch into media beyond screencasts, The Programmer's Compendium launches today. There's one article right now, Types, which serves as an overview of static and dynamic type system issues for those who aren't already experts in that topic.

Six months ago, I did an informal poll about what people find confusing and got a few hundred replies. "Strong/weak" came in second, which is why I began with this Types article. My goal for the compendium is to go down down the list, more or less, trying to explain these things as frankly as possible. (Though that's been the goal of everything that I've done with Destroy All Software.)

The compendium is a reference work, in that the articles are written to stand alone. Each article is made of sections that can also stand alone and are individually linkable. However, the articles are also written to be read beginning-to-end, with the sections forming a progression from basic issues to more complex and subtle ones.

Destroy All Software is a for-profit company and has made up the majority of my income for the last five years, so the compendium won't be free. However, it also won't be simply buried behind a paywall. I've implemented a scheme that should strike a middle ground.

When DAS subscribers view the articles, the URLs will contain keys specific to their account. They can give those links to anyone, including posting them in public forums. Anyone clicking the link will be able to view the article, whether or not they're subscribed or even logged in. (The keys are randomly generated when subscribers visit articles, so no one outside of DAS can take a key and determine the user who generated it.)

My first goal here is to provide a reference work to demystify confusing topics for my subscribers. However, I have a second goal that's a bit more personal. I hope that compendium articles will be dropped into Internet arguments, ending back-and-forths where people are simply talking past each other.

For example, programmers often argue about "strong" and "weak" types, which are extremely ambiguous and have several different meanings. Usually, these arguments happen because neither side realizes that they've each chosen one of many possible (but conflicting) definitions. Any DAS subscriber will be able to load up the Types section on Strong and weak typing, then drop the link into that discussion. Hopefully, that stops a fruitless argument.

State of DAS: December 2016

2016-12-15T00:00:00Z

Destroy All Software's original series of 90 screencasts was published from March 2011 until March 2013. As far as I know, it was the first subscription-only programming screencast product, and it worked! This was good.

Eventually, I got tired: both from the fixed screencast release schedule and from conference speaking. I was publishing once per week, then later once every other week. I averaged one conference talk per month at my peak, which was far too much for me. DAS was suspended in March 2013.

I then spent three years writing 1.5 books, but published none of it. This was less good, but I learned some things.

DAS relaunched in August 2016. I converted a section of one unpublished book into a screencast series on Computation, explained using code rather than the traditional mathematical terminology and notation. This was good.

The computation series is now almost done. There's one more screencast left, which I expect to release around the new year. I need to decide what comes next. Unlike the original DAS run, I want to plan the future a bit; I want to decide where I'm going, rather than simply finding myself somewhere.

Live Streaming

In September, I did four live streams to test the waters. My favorites were writing a text editor and a compiler from scratch in about an hour each, which got great responses from viewers. The videos were watched around 10,000 times, if I remember correctly, but they're now down because Twitch videos expire eventually. I expect to reprise these topics eventually.

My next experiment is going to be live streaming in earnest, probably once per week to start. I've never written about my DAS recording process in detail, but it's basically a live-streaming process already. I record the full screencast, end to end, over and over again, until it meets my standards. What you watch on DAS is effectively a live recording, with very light editing to remove the parts where I swallow water and so on.

For viewers, true live streaming means that little mistakes will sneak in, which viewers say that they miss from the earlier DAS screencasts. Live streams are also more "organic", for lack of a better term; you and I both know that there's no safety net, and we both seem to like it.

I expect to initially focus on building tools from scratch to understand how they work, taking about an hour each. My topic list is already 57 topics long, but here are a few that I'm especially excited by: "let's build a Zip-like file compressor"; "let's implement the IP and UDP protocols"; "let's write a software-only 3D renderer"; etc., all from scratch.

Unfortunately, I can't actually start doing this yet. My new recording studio is still being built in Spain, with delivery in late February assuming no further setbacks. This, plus time needed to set up the studio and my personal travel plans, mean that I probably can't start streaming in earnest until April.

The Programmer's Compendium

What do I publish from early January until mid April? I could do another screencast series; four months is the right length of time. However, I burned out on screencasting the first time and I don't want to do that again. Fortunately, I have another idea.

In August, I was frustrated by finding yet another simplistic set of statements like "types are a waste of time", "types ensure program correctness", etc. (It was probably on Hacker News, but I've since forgotten.) I wrote 3,400 words on the topic, both as an introduction and as justification for people who don't see why types matter.

That document is meant to be read through end-to-end, which is what most people did, but it's also meant as a reference. You can return to it when someone uses a term that you've forgotten, or makes a claim about types (pro or con) that seems like an oversimplification. It was very popular (~1000 gist stars, ~200 Hacker News comments, etc.), but I never formally published it anywhere.

I'd like to introduce a collection of documents like this as a new section of DAS. Its working title is The Programmer's Compendium. (I have the domain as well; sorry, squatters!) It tackles ideas one at a time, covers them broadly but not deeply, and focuses on the parts that confuse people who aren't experts in the topic.

Each article is broken into subsections corresponding to common misunderstandings. For example, if you see someone overgeneralize about static type systems, you can just link them to the Diversity of Static Type Systems section rather than arguing eight replies deep on an orange website.

My top candidate topics right now are TDD, functional programming, and state management, each of which gets jumbled up in public discussion, just like types do.

Should I do it?

The timeline of these changes is roughly:

Final computation screencast publishes around January 1, 2017.
Compendium articles publish from January until mid-April 2017 (as my recording studio is built, shipped, assembled, and tweaked).
Live streams begin happening (and are archived in 4k here at DAS) starting in mid-to-late April 2017.

This is a big change, but I think that variety is necessary to stop me from burning out again. I also want to explore and advance these media for their own sake.

I'll email subscribers whenever major changes like this happen. And, even for subscribers who are caught unawares, DAS' unconditional refund policy for the most recent month's charge will remain in effect, as always. However, I'd also like to hear what you think in advance.

My question for current (or past, or aspiring) subscribers is: will you follow DAS through these format changes? Are you vexed, flummoxed, or disconcerted by the notion of DAS changing media on this approximately-quarterly schedule? What if all of these media were interleaved into a single mixed publishing schedule, so that DAS alternated between compendium articles, screencasts, and live streams on a biweekly or monthly basis?

You can email DAS' support address or just tweet at my personal Twitter account with your thoughts; either works. (I read every email seriously, though I may not have time to reply to every email when I solicit broad feedback like this.)

Test Isolation Is About Avoiding Mocks

2014-05-15T00:00:00Z

Isolated testing has an easily identified villain: the deeply-nested mock object. Around 21:35 in this discussion, Kent Beck mentions code with "mocks returning mocks returning mocks" and their stifling effect on refactoring. Kent is right: nested mocks make refactoring and maintenance difficult. They're usually a bad idea.

The clarification that's missing from most discussions of mocking, including the one in that video, is that experienced users of mocks rarely nest them deeply. Avoiding numerous or deeply nested mocks is the principal design activity of isolated TDD. Since none of the people in the video isolate or use mocks heavily, it's unsurprising that no one brought this up.

Years ago, when I was new to all of this, I did some truly terrible mocking. There's even a Destroy All Software screencast where I show some of that code as a cautionary tale. The mocks were numerous, deep, and coupled deeply into objects' internals. Refactoring always meant rewriting many deep mocks in tests all over the system. I began my test isolation journey by doing it very, very badly.

I'll be merciful and leave that code out of this post; here's a much less painful method to think about. It computes the average value of all of a user's past transactions, which are related to the user via purchase records. (Some transactions, like refunds, don't correspond to a purchase, which is why there are two tables instead of one.)

def average_transaction_amount(user)
  purchases = user.purchases
  total = purchases.map(&:transaction).map(&:amount).inject(&:+)
  total / purchases.count
end

This method has deep knowledge about the database's structure. It knows that users have many purchases; that purchases have one transaction; and that transactions have amounts. In the absence of significant mitigating factors, this is not good design. Other parts of the system will inevitably need to access, for example, a purchase's transaction. This deep structural knowledge will be duplicated.

When the schema changes, there will be a significant maintenance cost. That cost will be paid in the form of effort spent updating the duplicated points, and in risk of missing one of the updates. (Database schemas are only slightly special; this situation wouldn't change significantly if we were coupled to plain old object relationships.)

To tackle one small piece of the problem, we can remove knowledge of the purchase-to-transaction relationship. We might add an amount method to the purchase, with that method delegating to the associated transaction:

class Purchase < ActiveRecord::Base
  delegate :amount, to: :transaction
end

With this change, code that needs a purchase's amount (like the average_transaction_amount method above) doesn't have to know of, and couple to, the transaction's existence. The map(&:transaction) call disappears from the call chain. Coupling is reduced.

This isn't motivated by some abstract desire to reduce the number of dots in a method. It's motivated by reduced maintenance costs: fewer updates to the code, and fewer potential mistakes, when the purchase-to-transaction relationship changes.

Similar extractions can be done to remove any of the steps in the original method chain while leaving the rest of the chain intact. The obvious refactoring to has_many :through is missing, but it's just another example of the same idea: it reduces coupling and improves the design. After hundreds of small adjustments like this, a system will be much easier to change.

When testing the average_transaction_amount method in isolation, we have very little flexibility in test structure; that's how isolation works. If we want to isolate strictly, removing all dependencies at test time, then we have to write something like:

it "computes average transaction amounts" do
  user = stub(:purchases => [
    stub(:transaction => stub(:amount => 40.0)),
    stub(:transaction => stub(:amount => 80.0)),
  ])
  average_transaction_amount(user).should == 60.0
end

My hope is that this is at least somewhat hard to follow. Despite being hard to follow, it's characteristic of tests that many people write. But is it good?

No. It's bad test design. Those nested stubs are telling us something about the method under test: it reaches deep into its user argument. The code under test can only traverse data that the test creates for it, so deep traversal of objects in the production code leads to deeply nested mocks in the tests. This is true even if the deep traversal isn't otherwise obvious.

In a hundred-line class with a dozen methods, the object traversal is often spread across many methods, each of which traverses only one level. That class is deeply, but invisibly, coupled to its collaborators. A glance at the isolated test tells us this, but getting that information from the code would require a slow, careful reading.

This is a large part of the claim that isolated tests drive better design. Because an isolated test must set up all collaborators as mocks, the only way to reduce the mock complexity is to reduce the depth of collaboration. If we mock three levels deep, and do nothing to reduce that mock depth, then we're spending the effort to isolate without getting the benefit. If, however, we change the design to reduce coupling, then the mock depth will also be reduced, and isolation will earn its keep.

Deeply nested mocks tell us little about mocks, just as deeply nested conditionals tell us little about structured programming. Proficient users of structured programming rarely write deeply nested conditionals; proficient users of mocks rarely write deeply nested mocks.

In both mocking and structured programming, highly nested structure is a visual indication of a potential design problem. In both cases, the problem is hard to see otherwise because it's only expressed in implicit relationships between non-sequential lines of code. Control flow complexity is hard to gauge when we have to trace GOTOs; coupling is hard to gauge when we have to trace calls. Structured programming clarifies control flow; isolated tests clarify coupling.

If the primary design benefit of isolated unit testing is the mocks' visualization of interactions, can isolated tests be obsoleted by sufficiently advanced analysis and visualization tools? This is equivalent to asking: Is merely seeing the design problem sufficient?

Consider function size. It's always visible at a glance. Most of us think that small functions are better, yet hundred-plus line functions are common. Even those of us who like small functions write large ones and regret it. Why don't we react to the clearly-visible size of these functions? Why don't we decompose them?

The reason is that there's no pressure exerted on us. Writing large functions or classes requires less typing than writing many small functions or classes. It's easy to be lazy and give in to the false economy of adding just one more line to a function. With a suite of integration tests, adding one more line to an existing function rarely matters to the test because the test sees few of the system's internal boundaries.

An isolated test for a large, complicated component requires much more effort than a test for a small one. I've been doing this for a long time, relatively speaking, but isolating a nontrivial fifty-line function with ten collaborators would be so annoying that I probably wouldn't even attempt it outside of extreme situations. The cognitive burden of isolation grows very quickly when a function has more than a few collaborators or a couple levels of chained method calls.

The cost of isolating complex code counteracts the desire to be lazy and avoid extracting a new method or a new class. Choosing to isolate is a conscious choice to augment our programming with disciplined structure and visualization, just as choosing structured programming is. It reigns in the size of functions and classes: it will frequently force us to either decompose our large components or give up on isolated testing.

To make that concrete, the code below repeats the original method and its test, this time assuming that we've extracted an amount method on the purchase, as was shown above:

def average_transaction_amount(user)
  purchases = user.purchases
  total = purchases.map(&:amount).inject(&:+)
  total / purchases.count
end

it "computes average transaction amounts" do
  user = stub(:purchases => [stub(:amount => 40.0)),
                             stub(:amount => 80.0))])
  average_transaction_amount(user).should == 60.0
end

That small change makes a big difference in test readability. The production code has slightly less coupling to the database schema, but the test is much more intelligible as a result. Design isn't just reflected in isolated test setup; it's magnified. Isolated tests are a microscope for object interaction.

When I wrote mock-obsessed tests in that Python system years ago, I was easily averaging two mocks per test, and three sounds very likely. Fast forward to the 2010s: Destroy All Software's Rails app has 197 tests, with 99 total mocks used. (Those are 83 stub objects and 16 mock expectations.) 79% of DAS' tests use no mocks or stubs whatsoever, but a small number of tests use several.

Many of those mock-free tests are on models, which I test in integration with the database. (I use skinny controllers and skinny models, which is far too subtle to explain here. My model design practices are explained in a pair of screencasts: 1, 2.)

As an example of a more mock-heavy test, DAS' ChargePurchase class coordinates the payment processor, the user model, the credit card failure model (they're logged for auditing), the credit card, the mailer, and some constants, ultimately producing an order object. It's about fifty lines long with only three branches, all concerned with handling downstream errors. Most of its work is gluing pieces together to express the process of charging.

Despite doing little work itself, ChargePurchase has nine collaborators: six classes referenced by name and three arguments passed in. Most of those collaborators get stubbed (mocks aren't stubs, but I'm being sloppy with my terminology in this post to match common use). None of the stubs are nested, but that's still quite a bit of setup. I tolerate the unusual level of test setup complexity here because I like having the whole purchase-charging process centralized, allowing me to read through it linearly.

ChargePurchase shows that test setup pain is only part of my design heuristic. Sometimes, as in this case, listening to it too closely will lead to code that's less understandable, so I don't listen. Mock setup exposes coupling, remember; not cohesion or other design properties. We're still on our own for those.

My use of mocks has changed significantly over time: even DAS' mock counts would be much lower if I built it today. First, most of its controllers are unit tested, sometimes with partial mocking. Today, I'd write thin controllers that are only tested at the system level. (Like so many of these points, I discuss that in a screencast with Web Apps: When to Test in Isolation.)

Second, DAS' objects call each other whenever it's immediately convenient for them. Today, I'd extract a functional core wherever it was natural, testing the core in isolation with no test doubles at all. (The Functional Core, Imperative Shell screencast is about that.)

I doubt that anything is stubbed three levels deep, although I haven't done a rigorous audit. Maybe there are a couple of system boundaries where I did a bad job in the early days and stubbed deeply. In any case, it's rare for something to be stubbed even two levels deep. When I spend the effort to isolate with mocks, I'm doing it because I want to listen to the design feedback, not fight it.

In addition to avoiding nested mocks, I've been using fewer over time, even when I'm writing isolated tests. The old Python system I mentioned had multiple mocks per test on average. Early DAS code written under time pressure averages around one mock per test. Later DAS code is under half a mock per test. Moving into late 2013, all of Selecta's logic is tested in isolation with no test doubles at all. (That's Functional Core, Imperative Shell again.)

This post was triggered by Kent's comment about triply-nested mocks. I doubt that he intended to claim that mocking three levels deep is inherent to, or even common in, isolated testing. However, many others have proposed exactly that straw man argument. That argument misrepresents isolated testing to discredit it; it presents deep mocks, which are to be avoided in isolated testing, as being fundamental to it; it's fallacious. It's at the root of the claim that mocking inherently makes tests fragile and refactoring difficult. That's very true of deep mocks, but not very true of mock-based isolation done well, and certainly isn't true of isolation done without mocks.

In a very real sense, isolated testing done test-first exposes design mistakes before they're made. It translates coupling distributed throughout the module into mock setup centralized in the test, and it does that before the coupling is even written down. With practice, that preemptive design feedback becomes internalized in the programmer, granting some of the benefit even when not writing tests. There may be other paths to that skill, but I'm still learning from my tests after seven years of isolating around 50% of the time. This path also happens to produce a trail of sub-millisecond tests fully covering every component designed using it, which is alright with me.

One Base Class to Rule Them All

2011-09-05T00:00:00Z

Last week, I released base, a universal Base class for Ruby. Inheriting from it gives your class every instance method, class method, and constant from every module and class in the system. In a Rails app, it results in:

>> Base.new.methods.count
=> 6947

Let's see just how powerful this is. We'll start with a subclass of Base:

class Cantaloupe < Base
end

This class has the methods—all of them. Like, for example, size:

>> p Cantaloupe.new.size
=> 0

The size of the cantaloupe is zero. Fine. But you might be wondering, where did that size method come from? There are so many in Ruby. Is it Hash#size? Array#size? Enumerable#size? String#size?

The answer is: maybe. But the real answer is: this is Ruby! We just call methods and things happen! We don't have to worry about it! Isn't this great?!

You can call any method you want! Like, say, days from ActiveSupport:

>> Cantaloupe.new.days
TypeError: can't convert nil into String
    from base.rb:78:in `new'
    from base.rb:78:in `instantiate_regardless_of_argument_count'
    from base.rb:76:in `each'
    from base.rb:76:in `instantiate_regardless_of_argument_count'
    from base.rb:71:in `call_instance_method'
    from base.rb:47:in `call_method'
    from base.rb:43:in `each'
    from base.rb:43:in `call_method'
    from base.rb:37:in `method_missing'
    from (irb):3

Wait... why didn't that work? Whatever, it's not important. We're Ruby programmers; we have ways around errors! Like this:

>> Cantaloupe.new.days rescue 5
=> 5

See? A cantaloupe has five days. Don't worry about it. It's fine.

You could even ask the Cantaloupe to find_by_email. Why worry about which model it will find? You'll find something by email; that's for sure!

In addition to all of this method-calling convenience, Base also leads to superior object oriented design. Remember the Law of Demeter? The one that says "talk to your friends, but not your friends' friends"? With Base, all other classes are your friends! You can call any of those 6,947 methods while complying with the Law of Demeter!

Since I'm sure you're itching to gem install it, base is available on GitHub. Its README contains more details, examples, and praise from the community. Enjoy!

(Disclaimer: believe it or not, I quite like Ruby. In fact, most of Destroy All Software's screencasts use it. If you're interested in actually building good software—doing the opposite of everything in this post—you might enjoy them.)

Announcing do_not_want

2011-08-19T00:00:00Z

It's August 19th, Why Day, where we take some crazy idea too far. My contribution is donotwant, a gem that disables several core ActiveRecord methods.

ActiveRecord ships with several methods that skip validations, callbacks, or both. For example, decrement, decrement!, and decrement_counter all skip one or both of these, but no two skip the same set.

I've found that even experienced Rails developers often don't know this or can't remember which methods skip what. That scares the crap out of me. Ideally, I'd like to see the syntax for calling these methods indicate their danger. But, there's a much more immediate solution: just disable the methods!

donotwant redefines the dangerous methods in ActiveRecord to raise errors when you call them:

>> User.delete 5
DoNotWant::NotSafe: User.delete isn't safe because it skips callbacks

But it's also smart, and will let gems (including Rails itself) call the original methods. This prevents it from breaking third party code, while still preventing your app from doing anything dangerous.

You can find it on GitHub or gem install it. As of today's release (0.0.1), require 'do_not_want' will monkey patch the changes onto ActiveRecord::Base. Simply adding it to your Gemfile and running your test suite will probably result in many new errors, as donotwant rejects your attempts to call unsafe methods.

Whether you actually want that or not is another question, of course!

Continuous Automated Performance Testing

2011-06-27T00:00:00Z

In BitBacker, I had serious performance requirements: if backups started taking 10% longer, I needed to know. I certainly wasn't going to notice from my day to day development because I wasn't doing large, multi-hour backup runs manually.

My solution was an automated system that iterated over the repository history, running a set of performance scripts for each revision. It then generated a plot of how long each benchmark took for each commit. It generated fancy plots like this (an actual screenshot taken in 2008):

Each colored line is a particular performance metric, with runtime on the vertical axis and commits on the horizontal. The higher a data point, the slower the metric was for that revision. If you look closely, you'll also see error bars generated by running each metric multiple times. This ensured that the plots weren't lying to me.

The red line is how long a snapshot with 1,000 files takes. You can see that it got far slower at one point. Fortunately, seeing this graph made it obvious that I'd made a mistake, and I was able to fix it quickly. By the next sampled revision (ten actual revisions later), it was fixed.

This graph is also useful for verifying improvements. Before the red line gets slow, you can see a few big dips. These were periods of careful performance optimization, and this graph allowed me to remain objective about every performance improvement I introduced. It also kept me from becoming too focused on one metric to the detriment of another. If I optimized large backups, but accidentally made empty incremental backups slow, the graph would let me know.

BitBacker also had automated metrics for peak memory usage. Memory usage can be even trickier than runtime: you don't generally watch it while using the app, and the really nasty memory situations come from large, quick bursts of usage. These are even harder to gauge than runtime problems, but the performance graphs made them clear.

I've done development consulting on many projects since then, none of which had this type of analysis in place. Every time, I've wished that we were doing it, especially on the larger projects. I highly recommend the practice: it will keep performance regressions out of the hands of your users, and it will remove yet another source of stress from your development process.

(If you'd like to try this yourself, Destroy All software has a 13-minute screencast on the topic, in which we build a basic version from scratch with a few lines of RSpec and shell scripts.)

Burn Your Controllers

2011-06-20T00:00:00Z

In my experience, well-designed Rails controllers tend to do these things:

Delegate to SomeClass in lib.
Choose which SomeClass to delegate to based on higher-level state: is the user logged in? Is he an admin?
Choose where to send the user (to a template or to a redirect) based on what SomeClass did when we delegated to it.

Here's an example of (2): a conditional delegation to SomeClass based on higher-level state:

class PostsController
  def edit
    if current_user.admin?
      @post = Post.find_by_id(params[:id])
    else
      render :nothing => true, :status => 403
    end
  end
end

This action decides whether the user is an admin. If he is, it delegates to SomeClass (the Post model) to find the post and implicitly renders the "edit" view. Otherwise, it responds with HTTP 403 Forbidden.

Here's an example of (3): routing out to different responses based on the result of the delegation:

class SubscriptionsController
  def create
    begin
      SubscribesUsers.subscribe!(params)
    rescue CardAuthorizationFailure
      render :new
    else
      redirect_to catalog_index_path
    end
  end
end

I've used a plain-old Ruby object here instead of a model to illustrate that delegation isn't about models. I've also used an exception to indicate an expected failure. This isn't standard Rails practice, but we'll see why I did it in a moment.

Imagine that all controllers are structured in these ways: they can choose where to delegate, and they can choose where to go after delegating, but all other domain-specific work is left to the delegate itself.

My original point (2), delegating based on state like logged in/out, sounds like routing. Perhaps I should be able to route based on these states, as well as which URL was requested.

My original point (3), choosing where to send the user, also sounds like routing. Perhaps I should be able to declare success/failure renders directly in the routes.

With those two changes, controllers only handle point (1), delegation. Every controller action becomes:

class MyController
  def my_action
    SomeClass.some_method
  end
end

In other words, it doesn't need to exist. The question is, how do we declaratively specify that richer routing such that it's actually more readable than the controllers we write today?

Cramming it into Rails-style routes certainly seems reasonable at first glance:

resource :posts do
  # Admins can update posts.
  put :update => "UpdatesPosts#update!", :require => :admin
  # Other people fall through and can't update posts.
  put :update, :render => :forbidden
end

This is equivalent to the following much-more-verbose Rails code, which uses a standard route directing to a standard controller. The controller makes a decision about the user, then either renders or delegates.

# routes.rb
resource :posts, :only => [:update]

# posts_controller.rb
class PostsController
  before_filter :authenticate_user!

  def update
    if current_user.admin?
      UpdatesPosts.update!(current_user)
    else
      render "forbidden", :status => 403
    end
  end
end

I like the route version a lot better. It condenses that entire controller class into one extra declarative line that says exactly what it means.

But what about routing on the result of the method we delegated to? Here's a possible syntax for error cases: we delegate and, if it raises a particular error, we redirect to another place.

get :show => "ShowsPosts#show",
  NotPublishedError => redirect_to(:index)

Newbies would probably find this even more confusing than standard Rails routing because it's conceptually dense. But I'm not a newbie (if I may be so bold), and I'm tired of writing the same controller code over and over again. I want my interaction with the web framework to be more declarative, freeing me to focus on views and plain-old Ruby objects that implement the application's logic.

I don't know whether this exact scheme is a good idea, but it smells pretty good to me so far. Geoffrey Grosenbach has written about this topic as well, but coming from the other side. His approach would be more compatible with larger controllers; mine is more compatible with heavy delegation. I think that either would be an improvement: the route-controller distinction seems to be losing steam.

(If you're interested in the idea of moving application logic into plain old classes, you might like Destroy All Software's Extracting Domain Objects screencast. )