Package granularity

During their rise to popularity, the Ruby and Rails ecosystems were thought to take package decomposition too far. Both Rails users and Rails detractors would lament projects depending on 100 or 200 or, for large applications, perhaps 500 packages. Ten years later, react-starter-kit, a JavaScript application starter kit, has 1,090 dependencies but no actual user-facing behavior. By comparison, a newly generated Rails 4.2.8 project depends on 55 packages: the 38 shown in the graph above, plus 17 more included as default, but optional, dependencies.

This isn't exactly an apples to apples comparison; no comparison between large projects can be. Still, it's illustrative of the trend: no project in the Ruby world has 1,090 dependencies, and even simple Node modules have more dependencies than all of Rails combined. Package granularity is unambiguously different between these two ecosystems.

It's difficult to draw any conclusion about package granularity from where we stand now. Maybe it will keep increasing. Rails project's 100 or 500 packages were once ridiculed but are now conservative. Will a React project's 1,000 or even 5,000 packages eventually be uncontentious or even conservative? Maybe, but there are reasons to be skeptical.

If a dependency becomes unmaintained or introduces breaking changes, then Rails and its many users are impacted. As a result, Rails' maintainers have to keep most or all of the dependency graph shown above in their heads. Likewise, the maintainers of Rails' dependencies know at least one small corner of the Rails dependency graph: they know that Rails depends on them, so they usually avoid sudden changes that will break Rails.

There have been hints of long-term dependency problems even in the Rails of today. For example, Rails' asset pipeline is implemented by the Sprockets Ruby gem. The main developer of Sprockets was responsible for more than 70% of commits, but left abruptly in 2016. Rails also depends on erubis, an implementation of the ERB templating language whose last commit was six years ago as of 2017.

If these projects were internal to Rails, they would presumably have more contributors and less susceptibility to abandonment. Rails' authors saved time up front by reusing outside packages, but they also exposed themselves to package abandonment risk. Increasing package decomposition, as in Node, increases this abandonment risk; there are simply more maintainers involved. A larger dependency graph also means more dependencies of dependencies – transitive dependencies – so there may be a third-party package "between" us and an unmaintained package.

This raises a question: can't we just fork the unmaintained packages? Yes, strictly speaking, but the devil is in the details. What if a dependency of a dependency of a dependency has a severe security bug, but multiple packages between us and that package are unmaintained? Do we fork them all? If so, we have to learn several packages' guts, each in its author's own style. We have to fork them all, modifying each to use our fork of the next package in the chain.

If we do fork and publish our own security-fixed chain of these packages, other people will begin using them as soon as we publish them. We just became maintainers of code that we don't actually know. If we eventually abandon those forks, we make the community's abandonment problem even worse. Now everyone who used our forked versions is relying on an unmaintained set of packages modified by someone who didn't understand them well.

In short, "just fork it" is usually wishful thinking; it ignores the real complexities of the situation.

On the bright side, smaller packages certainly mean less impact from any given direct dependencies going unmaintained. We may be able to use that fact to mitigate the downsides of Node-style fine-grained packaging.

Suppose that we avoid deep dependencies chains, preferring shallow but wide dependency graphs. A wide and shallow graph means fewer packages between us and any other package, leaving fewer opportunities for distant unmaintained packages. There may be other pitfalls in that approach, and we may not even try it, but there are at least potential mitigations to the downsides of small packages.

Like everything else in software, package ecosystem design is primarily the act of balancing graph complexity at different scales. The big question right now is: how large should nodes be? Packaging is a globally social process, which complicates the issue, but some of the fundamental questions are the same as for functions or modules. How connected are nodes? How large are nodes? Are there ubernodes with inordinate numbers of incoming connections? Are there cycles, and how many nodes are involved? Etc.

Destroy All Software

Package granularity