Code is mostly graphs

The design of a system lies in (1) its units and (2) the connections between those units. For example, imagine an OrderHistory module summarizing users' order histories. It will relate Users to Transactions, so it must reference both of them in some way:

Nodes (ovals) here represent classes or modules, and edges (arrows) represent function or method call relationships. Some method in OrderHistory calls a method in User; some other method, or maybe even the same method, calls a method in Transaction. At this level of granularity, we don't care which methods they are; we're concerned only with the relationships between the modules.

The diagram above is a graph because it's made of nodes and edges. It's also a tree because each node has only one parent – one incoming arrow. Now we introduce a complication that prevents the graph from being a tree.

Suppose that User has a latest_transaction_date method calling into Transaction directly; it doesn't use OrderHistory. Now, there are multiple paths from OrderHistory to Transaction: we can go directly, or we can go via User.

We now abandon this example, though we'll return to module graphs later with a real-world example. To get a sense of the generality of graphs, we'll briefly analyze two more examples with exactly the same graph shape: a function call graph and an inheritance hierarchy.

Call graphs indicate which functions call which other functions; or, in object-oriented terminology, which methods call which other methods. Perhaps we have a method is_allowed_to_access calling is_admin. But both is_allowed_to_access and is_admin check the user's access rights, so they both call find_user. This gives us the call graph:

The shape is the same as what we saw for module relationships between User, OrderHistory, and Transaction, but the meaning here is different: these are calls instead of module dependencies. Design of modules and functions are different skills, but they do reuse our graph reasoning ability, whether consciously (as we've done here) or unconsciously (as in most everyday programming).

We'll examine one more example: inheritance hierarchies in object-oriented languages. Suppose that we're modeling hardware devices in Python. We have a USBMouse class inheriting from USBDevice.

class USBMouse(USBDevice):
    ...

This seems like a trivial inheritance graph with two nodes, where USBMouse depends on USBDevice. There's a complication, though: all Python classes inherit from the base object class. With object considered, the full inheritance graph has the same structure as the other examples above:

The visual orientation here is different to match convention: base classes are usually higher up in the diagram. Still, the structure is the same as our previous examples. We could move these nodes and edges around to look just like the previous examples without changing any of the connections.

Graph relationships exist at all levels of scale, from the data dependencies between expressions in a function up to the dependencies between publicly published packages. This is good news for us: learning about one aspect of software structure reinforces our ability to think about others. It also means that all programmers eventually learn to think about graph relationships, even if that skill remains entirely automatic and subconscious. Programming requires understanding which functions call which other functions, which modules import which other modules, and which classes inherit from which other classes. Getting better at programming means, in part, getting better at thinking through more of those relationships without jumping to the functions and modules, examining them one by one.

Before we move into more detailed examples, here's a list of several places where graphs show up in programming. Some of these will get their own sections; others are only mentioned here to show the breadth of graphs' impact on software structure.

  • Module imports form a module dependency graph.
  • Function calls form a function call graph. (Methods are special cases of functions. The two are equivalent at this level of analysis.)
  • Class inheritance forms an inheritance graph. (It's usually called an inheritance hierarchy, but the hierarchy is a graph in any language with either multiple inheritance or mixins.)
  • Package dependencies in a package manager like NPM, RubyGems, PyPI, etc. form a package dependency graph.
  • The execution order of statements in a function (i.e., which lines can execute after which other lines) form a control flow graph.
  • The flow of data through a program forms a data-flow graph.

This is one section of The Programmer's Compendium's article on Software Structure, which contains more details and context.