Why calculate comment density only by line counts?

Avatar

Quinn Taylor

3 months ago

I have 2 projects registered on Ohloh. Since I strive for well-documented code, I was happy to see that my older Java project (unfortunately on the back burner, for now) fares pretty well in that regard (44% vs. 33% average). However, when I added my more recent Objective-C project, I was quite surprised to see it rated as "Few source code comments", with a density of only 8%.

This seemed completely out of line with the amount of comments I write — my header files alone are 75% comments on average, since the project is a library with lots of public APIs. For samples, see any of the header files at this WebSVN link.

Then I realized that the page that explains the comment density metric says that the figure is measured as lines that are comments. To avoid the frequent reformatting I used to have to do just to stay within 75/80 columns, I converted my comments to be soft-wrapped wherever possible, rather than having hard line breaks. This improves the presentation of the comments, but leads to long lines of comments that only count as one each, even if they occupy more than one line at normal document width. (Mine often span 5 or more.) Hmmmm.

This makes me wonder whether very sparse comments covering many lines would similarly count as "good commenting". Of course, simple automated parsing can't decipher anything about the quality of the comments, but I think it's apparent that just measuring line count may not be so helpful for getting a sense of the true nature of in-code comments.

It would seem to be more accurate to measure something like percent of characters in a file that are comments. Given the common use varying code styles (such as always/never putting curly braces on new lines) it would also seem more prudent to discount extraneous whitespace and calculate based on true proportions, not just a line-based heuristic.

Thoughts?


Avatar

alesplin

3 months ago

I think the major issue (one that is not necessarily automagically measurable) is the accessibility and usefulness of comments in code. If the comments aren't near the code they're describing, or can't be read easily, it doesn't matter how many lines of comments there are, or how accurate/useful they are.

For this reason, I tend to hard-line-break my comments at 80 columns, with lines after the first one indented one tabstop. This allows first of all for instant readability of the entire line of every line of comments, and on the editors that I use (XCode and Vim) allows me to fold succeeding comment lines out of the way when I don't need them. Just a personal preference in implementation of the idea of accessibility of comments when and where they are needed.


Avatar

Quinn Taylor

3 months ago

Thanks for your response, Alex. I agree with you about utility of comments being related to both (1) proximity to the code in question, and (2) quality of the comments themselves. However, although I used to hard-break my comments, for Objective-C and using Xcode, there are several factors that swayed me...

1) Sure, 80-column width is the standard, but in wider windows, hard-wrapped comments (and even code) are arguably a waste of space, and at shorter widths, they wrap in extremely ugly ways, doubling the lines required to view the comment and making them hard to read. On my multi-monitor desktop, I often go wide, and when on my laptop, I often go narrower than 80 columns.

2) When developers "grep" files, it's quite handy to see the entire comment line, not just the chunk that contained a given search string, and works across newlines. (For the same reason, it is often nice to put a // comment on the same line as a declaration to call attention to warnings for a function/method, enum values, etc.)

3) I joke that I have a touch of OCD, and I hate ragged comment blocks and feel compelled to "fix" the formatting whenever I modify the comments. In code where the comments actually represent what the code does, the comments are likely to change frequently. Fixing hard-wrapping is a pain to do, and makes it more difficult to see what has actually changed in a tool like FileMerge. I prefer small, localized changes that don't involve newlines.

4) The Cocoa framework header files use soft-wrapped comments (when they have them) and it's a suggested best practice for Objective-C code.

5) Unwrapped comments can still be folded out of the way, and any decent editor worth its salt can do soft-wrapping. cough, cough Eclipse, I'm looking at you... :-)

6) I use WebSVN to provide a view into my repository (see example), and since web windows are much more likely to be wider than 80 columns, it makes better use of the space.


Avatar

Robin Luckey

3 months ago

Hi Quinn,

I'm with you in spirit, and agree that Ohloh's strategy of simply counting newlines is fairly crude, and easily "broken" by certain coding styles. There are surely more interesting and more accurate ways to measure comments, and Ohloh should try to accommodate variations in style between developers.

From a usability standpoint, however, we want to keep our metrics as familiar as possible. It's hard enough to explain how Ohloh works without also having to define unfamiliar metrics. Whenever Ohloh starts inventing novel metrics, confusion seems to follow, and it puts us in a tough spot to have to explain and defend them.

"Lines of code" might be an arbitrary and flawed way to measure development work, but it's something that nearly everyone understands. Most developers also understand the inherent limitations of this metric.

Given that we don't (yet) have the resources to calculate all of the interesting metrics that we'd like to, we started with the most basic, most familiar metric -- warts and all.

That's why we currently do what we do, but there's no reason that we can't start offering alternatives in the future, as we find the resources to support them.

Ultimately, I would like to see a wide variety of metrics on Ohloh. For instance, it would be great if users could submit code to generate custom metrics, and Ohloh could run the computations and publish the results. Improved strategies to measure the quantity (and perhaps quality) of comments are certainly possible, and I look forward to the day when we can start to offer those kinds of things.


Avatar

Quinn Taylor

about 1 month ago

Sorry for the delay. I did read your response soon after you posted it, and totally understand — resources are finite, and some metrics (both good and bad) are already deeply ingrained. I'm glad to hear there is at least interest in improving the metrics over time.

That said, I realized that the metric wasn't as deeply flawed as I thought. I hadn't realized that Ohloh would count everything in my enlistment as source code (including large XML files created by an OS X desktop app) and that was skewing the metrics. I removed all the enlistments and added only the SVN path to the source directory, and my comment density jumped to 33%, much closer to what I expected. I'm sure it would be higher if I used hard line breaks, but at least this is in the right ballpark. Thanks!