Hi there.
I'm a CPAN admin, and I'm trying to reconcile the line of code counts in Ohloh against some numbers we have.
My issue applies to Perl in particular, but applies equally to many other languages.
The CPAN is as close to a complete source of all Perl as we know of. From our analysis, we know that it contains somewhere in the vicinity of 20,000,000 source lines of code (SLOC).
There are a couple of major Perl projects run outside the scope of the CPAN, but between them they probably only add around 5,000,000 lines of code.
Ohloh can't see into the CPAN, but does scan many of the separate scattered SVN/git repositories that the authors hold the code in.
So assuming Ohloh had perfect coverage of all CPAN feeder repositories, we should expect the amount of Perl counted by Ohloh to be in the vicinity of 25,000,000 lines (plus maybe 50% extra to cover Perl scattered in various smaller projects).
The actual Ohloh Perl SLOC count is in the vicinity of 70,000,000 SLOC.
This is equivalent to two and a half times EXTRA code, on top of the existing 18,000 Perl packages that we would expect Ohloh to find if it had theoretically perfect project coverage.
Suffice it to say that we would be extremely interested in finding out where this code is.
Ohloh makes this somewhat difficult, because the the number of lines of code in each project is buried in a detail page.
It would be extremely beneficial if there was a way to get a list of the projects for each language, sorted by the number of lines of code in that language.
That way we can make a start on finding out where the missing 50,000,000 lines of code are.