Unique Lines of Code

This post is a mirror of https://text.causal.agency/004-uloc.txt.

There are many tools available which measure SLOC: source lines of code. These tools are strangely complex for what they intend to do, which is to estimate the relative sizes of projects. They perform some amount of parsing in order to discount comments in various languages, and for reasons unknown each format their ouput in some oddly encumbered way.

I propose a much simpler method of estimating relative sizes of projects: unique lines of code. ULOC can be calculated with standard tools as follows:

sort -u *.h *.c | wc -l

In my opinion, the number this produces should be a better estimate of the complexity of a project. Compared to SLOC, not only are blank lines discounted, but so are close-brace lines and other repetitive code such as common includes. On the other hand, ULOC counts comments, which require just as much maintenance as the code around them does, while avoiding inflating the result with license headers which appear in every file, for example.

It can also be amusing to read all of your code sorted alphabetically.

Caveats

Estimates such as these should not be used for decision making as if they were data.