Statistics for Drupal are flawed

Avatar

Frando

over 2 years ago

Hey,

the LOC statistics for Drupal are definitely flawed somehow. Ohloh reports 21,103 LOC for Drupal core but there are definitely more.

The reason might be that many PHP files in Drupal have a different extension - namely .module, .engine and .theme. Without counting these, the statistics for Drupal are worthless, unfortunately, as more than half of Drupal's code lives in .module files.

As of today's Drupal CVS HEAD:

find drupal -type f ( -name *.php -o -name *.inc ) -exec egrep -vh '^$' {} \; | wc -l

27220

(= number of non-blank lines in .php and .inc files)

find drupal -type f ( -name *.module -o -name *.theme -o -name *.engine ) -exec egrep -vh '^$' {} \; | wc -l

27351

(= number of non-blank lines in .module, .theme and .engine files)

So, Ohloh is basically ignoring half of Drupal's code.

One solution would be to make either the file types that are used to calculate the LOC or the filetype->languate mapping a project-specific setting.


Avatar

Dietrich Moerman

over 2 years ago

I'm afraid making file extensions project specific would allow many people to "cheat". Since PHP files always contain "<?php" could it be a solution to search for the begin tag in non-.php files? Of course, there's also <? and <% but these are disabled by default and are no guarantee the file contains PHP (it could also be XML or ASP).


Avatar

Jason Allen

over 2 years ago

Greetings all,

Our detector uses file extensions and their contents to try to determine the language contained. As Frando suspects, we do NOT currently recognize .module, .theme and .engine files as php.

Dietrich - we have some disambiguation logic to try and tell if a file should be treated as X or Y. So, the rule COULD be something like:

if extension =~ /.module|.theme|.engine/ AND file.contents =~ /<?php/

I'm willing to try it out. These changes are always tricky cause we run this stuff against millions of files - there's always outliers that make life difficult. Frando, Dietrich - what do you think?


Avatar

Dietrich Moerman

over 2 years ago

I think this would be a nice solution. :)


Avatar

Frando

over 2 years ago

Yup, that should work. All PHP files must contain "<?php", so checking against that sounds like the best thing to do.

Here's a complete list of file endings that Drupal uses at the moment for PHP files:

.php .inc .module .theme .engine .schema .install .profile

This applies to both Drupal (core) and Drupal (contributions).

Maybe just checking all text files against "<?php" would be the easiest and most future-proof?

Thanks for your efforts in fixing this!


Avatar

elmuerte

over 2 years ago

Wouldn't it be easier to use some mime magic on the non-binary files to figure out what they are? The unix 'file' utility does a good job in figuring out the file type:

file index.php includes/common.inc modules/system/system.module

Gives

index.php: PHP script text

includes/common.inc: PHP script text

modules/system/system.module: PHP script text


Avatar

Dietrich Moerman

over 2 years ago

The file utility does exactly what the introduced fix in Ohloh does, it reads the file looking for a PHP open tag. I tried this out myself.

$ file tagadelic.module tagadelic.module: PHP script text

After removing the PHP open tag:

$ file tagadelic.module tagadelic.module: ASCII C++ program text, with very long lines

So, I think the original solution is the best (no need to use third-party and *NIX only binaries).


Avatar

Frando

over 2 years ago

Any news here?


Avatar

Frando

over 2 years ago

bump