Structure of ubermodules

Ubermodules in application designs get two main types of reactions. One reaction is: it's fine; the application is about users uploading versions of Ruby gems, so User and Rubygem and Version should be referenced everywhere. This is a common sense argument and apt in many cases.

Small applications and some medium-size applications – 5,000 lines of Ruby, say – do fine with this kind of design. Many 2,000-line applications could even be written as a single long source file without suffering much. In a simple system, there's just not enough complexity to cause big problems.

More complex systems are a different story, with the argument against ubermodules growing as the system grows. Seeing why requires us to dig into other structural properties of the system.

From experience, I can confidently predict two facts about rubygems.org. First, all three of User, Version, and Rubygem will be among the top five most-changed source files. Second, the same will be true for line length: these three files will be among the five longest modules.

We'll check my predictions with some shell commands. The first lists source files along with the number of commits that have changed them, printing the top five most-changed files. (It's OK to skip the command itself if it's not interesting. If you do read it, note that > lines are continuations of the command to fit it on the screen.)

$ find app -name '*.rb' |
> while read f; do
>   echo "$(git log --oneline $f | wc -l) $f"
> done |
> sort -n |
> tail -5
      87 app/models/pusher.rb
      92 app/helpers/rubygems_helper.rb
     111 app/models/user.rb
     235 app/models/version.rb
     300 app/models/rubygem.rb

My "in the top five" guess was too conservative! The three most heavily-referenced modules in the module graph – User, Version, and Rubygem – are also the three most frequently changed source files. Now for the second prediction: the ubermodules will be among the longest source files. This command lists source files and their length, limited to the top five.

$ find app -name '*.rb' |
> while read x; do
>   wc -l $x
> done |
> sort -n |
> tail -5
137 app/models/pusher.rb
142 app/models/concerns/rubygem_searchable.rb
159 app/models/user.rb
318 app/models/rubygem.rb
372 app/models/version.rb

Again, I was too conservative: the three ubermodules are also the longest in the entire application.

These two facts will be true in most application; it seems like a natural law of software development. In software systems, the most highly-connected modules are also the most-frequently-changed and the longest in terms of lines.

This is one section of The Programmer's Compendium's article on Software Structure, which contains more details and context.