Mostly Written In...

Avatar
written by Jason Allen
feb 23 2007
 

Ohloh analyzes projects' source code and determines what programming languages were used to create them. While most projects are written using many languages, a "mostly written" fact simply highlights the top language used (as counted by lines of code).

Comments (79) Subscribe to Mostly Written In...

Avatar

Cato

over 2 years ago

TWiki is mostly written in Perl (all server side code) with only a small amount of JavaScript for AJAX frontends, WYSIWYG editing plugins and skins.

Please set TWiki to be 'mostly Perl', and also update the license 'analysis' to reflect fact that it is entirely GPLed - have set license to GPL already.


Avatar

Cato

over 2 years ago

Another TWiki note: it has been under revision control since at least Feb 2001 (first CVS, now SVN with all CVS checkins imported), based on revision log for one key module. So I think that qualifies as long history of revision control.


Avatar

Doug Napoleone

over 2 years ago

Where are you getting this Javascript and TWiki stuff from? This is a python Django Project!


Avatar

Doug Napoleone

over 2 years ago

oops, Think I misunderstood where this link came from. Why is this in my PyCon-Teck project summary?

I dont knwo why it claims that 60% of the code is javascript when there are only 2 JS files (500 lines) and 10K+ lines of python.

There is an evn:externals of the dojo javascript framework, but ohooh should exclude all svn:externals as that is not development local!


Avatar

Robin Luckey

over 2 years ago

Cato,

Our reports are generated automatically by source analysis. There's nothing that we set or change manually other than the URL to the source control server.

It may be surprising, but this source tree does in fact contain much more JavaScript than Perl. I did a manual review of a local checkout, and including blanks and comments I found roughly 280K lines of JavaScript and 235K lines of Perl. This matches the Ohloh report very closely.

The licenses are likewise determined entirely by parsing the source code, and all of the licenses listed in the report do appear in the code. For instance, the Academic Free License appears in many of the .js files in the DojoToolkitContrib subdirectory.

We do allow you to edit the "overall license" of the project, which has already been set to GPL for this project.

We are working on a feature right now that will allow you to see a list of the exact files which we consider to be covered by each license and language, and hopefully that will help clear up any confusion. In the meantime I'm happy to help with any specific questions.

derivin,

Ohloh does not include any externals in the report.

I think you are mistaken on your line counts. It's true that there are only a few .js files in PyCon-Teck, but with a quick examination of the source it was easy to verify that they contain over 22K lines of javascript -- not 500.


Avatar

Nlaw

over 2 years ago

Re: MLDONKEY

The mostly written in C/C++ for mldonkey is incorrect. It's actually written in a language called OCAML

Regards NL


Avatar

madth3

over 2 years ago

ohloh clearly needs to work in in this feature, judging by the comments.

Struts2 is Java, but since it includes dojo is marked as mostly Javascript.


Avatar

Stephan.Schmidt

over 2 years ago

Obviously the problem with Javascript may be that JS ist distributed as source code in a project. So if you add several JS libraries (prototype etc.) to the project then Ohloh probably (wrongly) adds those files to your projects and rises your JS count. It's probably not clever enough to detect JS libraries you only distribute. Worse: It may add the JS licenses to your projects and claim your project uses X and Y licenses.


Avatar

Jason Allen

over 2 years ago

Nlaw: Ohloh doesn't recognize OCAML yet. It's on our todo list.

madth3 & stephan: Your assumptions are correct. Ohloh doesn't know why (or how) files were added - so it treats them all the same.

Worse: It may add the JS licenses to your...

this reflects the feature as it was designed. Our license file sniffer is meant to be a starting point for people who care what additional licenses a project might include. Even if the governing license is X, you sometimes have to pay attention to additional embedded licenses.


Avatar

cbbrowne

over 2 years ago

Prolog is also obviously not a language covered yet; the system thinks all my Prolog code is Perl...


Avatar

hpages

over 2 years ago

Same for R; the system thinks all the R code that Bioconductor is primarily written in is XML... It would be better, maybe, to report something like "unknown language" instead. In fact, not all files in a project can be tagged with a programming language: documentation, config files, data files, etc... Bioconductor packages contain a lot of data!


Avatar

Andres Almiray

about 1 year ago

Its funny that JideBuilder is written 100% in Groovy but the report says "mostly in JavaScript"


Avatar

Jason Allen

about 1 year ago

Andres: I downloaded JideBuilder's source code and did a very primitive scan:

> find . -name '*.js' -exec cat {} \; | wc -l
> 5979
> find . -name '*.groovy' -exec cat {} \; | wc -l
> 2533

Javascript has over twice the raw number of lines of code over groovy.


Avatar

Jason Allen

about 1 year ago

hpages: You bring up some interesting points. I'll need to give the 'unknown suggestion' more thought.

Regarding the documentation, config, data, etc..: I think the key here is to try to understand what id hand-authored vs machine-generated as well as what is documentation vs what is code. I think measuring all types of contributions is useful.

Finally, regarding R: We've been taking a break from adding more languages, but hopefully we'll have a chance to catch up soon - and add R.

Thanks for the heads up.


Avatar

Jason Allen

about 1 year ago

cbbrowne: Doh! We have a file extension conflict. ".pl" is currently mapped to being a perl file. I thought that Prolog files were supposed to end with '.P'.

I'll have to write some detection code to disambiguate them. Any suggestions on what to look for?

Off the top of my head:

Not counting lines that begin with #include', if there are more lines starting with '%' then '#' - then assume prolog.

If there are any lines ending in ':-' - assume prolog.

I'm pretty sure these 2 rules alone would solve your specific case (canada2003). Any feedback would be appreciated...


Avatar

Brian Downing

about 1 year ago

SBCL (http://www.ohloh.net/projects/5299) is written in Common Lisp, but Ohloh says it's primarily written in C/C++.

We have C components:

:; find . -name '*.[ch]' | xargs wc -l | tail -1
  34115 total

but they are dwarfed by the amount of code in Lisp:

:; find . -name '*.lisp' | xargs wc -l | tail -1
  395570 total

Commits to .lisp files don't seem to be counted as Lisp either.

In fact, those 400,000 lines of code don't even show up in the code report:

http://www.ohloh.net/projects/5299/analyses/latest


Avatar

Brian Downing

about 1 year ago

By the way, if you want to try and differentiate Common Lisp from random generic Lisp, Common Lisp sources usually have a line that looks like one of these:

(in-package :cl-user)
(CL:IN-PACKAGE "MY-PACKAGE")
(common-lisp:in-package #:this-is-rather-long)

so maybe looking for a line like:

/^\(([^\)]*:)?in-package\s/i

would be a decent tact? It won't get everything, but it'll get a lot.

There's no standard for file extensions, but *.lisp is quite common, *.cl is also used, and I believe *.lsp is used by 8.3 holdovers.


Avatar

indeyets

about 1 year ago

There is a problem with OCaml projects. For some reason they are treated as Objective-C

Objective-C extensions: m, mm

OCaml extension: ml


Avatar

Duncan Grisby

about 1 year ago

omniORB is mostly written in C++, not Perl! In fact, there are only about 300 lines of Perl in the whole thing, compared to over 300,000 lines of C++.


Avatar

luciash

about 1 year ago

Mostly written in JavaScript ?

Code says it correctly (mostly written in HTML+CSS) but on the Report tab Ohloh Summary states "Mostly written in JavaScript" which is obviously wrong...


Avatar

Robin Luckey

about 1 year ago

@luciash,

This is by design, but in your case our design is not appropriate.

Our system categorizes XML, HTML and CSS as "markup" languages, and does not consider them when determining the "Mostly written in" summary.

We did this because many projects have a great deal of HTML documentation or XML configuration, which can obscure the fact that the project is actually writting in something like Java. In the vast majority of cases, it is wrong to decide that the main language of a project is HTML or XML.

In your case, however, it looks like your project is specifically about CSS, so it would be appropriate to list this projects as "Mostly written in HTML". This is a rare case, and our system isn't smart enough to figure this out.

Any thoughts about how to work around this in your case?


Avatar

luciash

about 1 year ago

@Robin Luckey

Thanks a lot for your nice reply. Now I get it !

Here's my suggestion which imho shouldn't be hard to implement:

What about having some switch (maybe radio button or dropdown) to determine in project info setting if it is mostly markup/template/documentation project or coding/programming project ? Default state would be set as is now. Then it would count on with markup files (CSS/HTML/TXT) or exclude them as you currently count in "Project Cost" and generate the Ohloh Summary based on that.

Would be great new enhancement for some projects like mine :)


Avatar

Andres Almiray

about 1 year ago

Jason (On JideBuilder's code), you're right, if the whole project is scanned that way, JavaScript will win over Groovy, but the fact is that the javascript code is part of the documentation, not production code. So I guess that without a project profile (how files are distributed) the measures will not be 100% accurate, but as each project may follow its own convention, this task will be huge and almost impossible to accomplish automatically.

Would it be possible to add more information (optional step of course) when registering a project, like source dir, test dir, doc dir ? that might help the tools make more accurate measurements =)


Avatar

Michał Słaby

about 1 year ago

Jason, I like the language chart, but sometimes it is inaccurate in terms of language importance to the project. In my project it is PHP which is the most important language, but I use tons of Javascript for minor things the project can live without. It would be lovely to override preferred language in project settings.


Avatar

boran

about 1 year ago

Hi, First off: your tool is interesting! FreeNAC is mostly (90%?) PHP, with some embedded SQL statements. I do not know why you say its mostly SQL, there must be something about our code.. It also say 0% comments which is not true either.

Also I submitted the trunk for analysis, as per your instructions. But, most of the (incremental) work takes place in the branches, not trunk, which is used for very new features.

First SVN submits were in June'06, not this winter, so there seems to be an ageing issue too.

Regards, Sean


Avatar

Robin Luckey

about 1 year ago

Hi boran,

I was initially a bit puzzled, but the answer seems to be this commit, which includes 27,000 lines of SQL. That dwarfs the rest of the project, and results in Ohloh's conclusion that this is a SQL project. You can see this checkin as the huge spike at the end of the codebase graph.

I think there is a bug somewhere in our code regarding the 0% comments presentation. Our analysis found that the PHP code is 15.6% comments, and when you add in the SQL it comes down to 9.2% overall. I'll have to investigate why we show 0% in the factoid. This is the first time I've seen this particular problem.

If most of the ongoing development work is happening on a branch, then by all means, go ahead and change the enlistment to the branch and remove the trunk. (There seems to be two kinds of projects in the world: those that develop on the trunk and drop releases into branches, and those that develop on branches and drop releases into the trunk.)

Where did you find "this winter"? In the codebase graph and in the commit tables our first activity shown is in August of 2006, with the addition of the README and the first bin/port_scan files.


Avatar

Robin Luckey

about 1 year ago

Hi boran,

So the problem with the comments appears to be that when we generate the factoid, we only look at the 'main language', and we compare your comment ratio in the main language with the comment ratio for all other projects that used that same main language.

In your case, we decided the main language was SQL, and tragically, this project contains only 11 lines of comments in its 27000+ lines of SQL. That rounds down to zero. :-)

The only fix I can see is for us to change our heuristic for determining main language -- perhaps changing from total lines of code to total commits or something along those lines. I'll post a bug ticket about this.


Avatar

Victor

about 1 year ago

Hi,

about the prolog files, i think that if you can find lines that begins with :- and/or lines that contains one word followed by a :- it will work.

By one word i mean one predicate, something like : bla(blo, [hop], lol(bing))

I don't think that rules can have multiple predicate in the head ...

I know that Vim look at the % to recognize them


Avatar

Tobu

about 1 year ago

Unison is mistakenly reported as "objective C" instead of ocaml (or objective caml, which may explain the confusion).


Avatar

timeless

about 1 year ago

Some mozilla related projects are complaining about being called shell script based projects because of a single file (configure).

Configure (typically generated by configure.in) is interesting in that it's likely to get thousands of small changes. As such total commit count is not going to solve the problem of "total lines of code misdetermines the language"

For reference: http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/configure 1.1916 cltbld 2007-10-13 14:21 Automated update from host egg.build.mozilla.org

Configure of course tends to be massively long.

Unfortunately, a java project which commits its javadoc (or a perl project that commits its html generated pod) will be dwarfed in the total number of files (documentation > sources).

I think the trick is to try to generate an algorithm that determines "copying". If configure seems to be a derivative copy of configure.in (good luck), or foopy.html seems to be a derivative copy of foopy.pl / foopy.java, then it should be discounted.

I'm not sure how well that'll work, a project with heavy manual documentation will probably still be penalized, but overall it might enable you to count by numbers of files.

Another thing you could do is drop languages used in only one file.

I was wondering if this would penalize the mozilla despot project (which I believe is not yet imported into ohloh, I might do that just to see what happens)

http://mxr.mozilla.org/webtools/source/despot/

From memory, despot basically has one file (despot.cgi) and a help file (help.html), but it does actually have a couple of other .pl files, so today it'd probably be counted as perl. When enough of despot is converted to use .templ's, despot would probably be called html.

And I don't know that I'd mind.

Probably the simplest solution is to list the major language, and the second language and a note about which calculation would make the second language the major language.

It of course only works for 2, and would fail for perl+html+js, but it should probably help for configure :).


Avatar

syaskin

about 1 year ago

To add to this comments, Queplix is written 100% in Java2EE, but it includes GWT Google Web Tool Kit. However, the project was marked as "written mostly in Java".


Avatar

Krzysztof Foltman

about 1 year ago

What about ignoring configure scripts (and perhaps autogenerated makefiles) in language detection?


Avatar

arnoschn

about 1 year ago

Hey guys,

what about adding some control files to the repositories root:

ohloh.ignore:

*.js

ext/*

ohloh.languages:

*.phtml=PHP

*.pl=Perl

etc..

Regards, Arno


Avatar

arnoschn

about 1 year ago

ohloh.externals:

could reference the usage of other opensource projects, so that this is not counted to belong to this project but maybe as a boost for the kudo rank of the used project?

For example:

path = Name [ProjectId]

ext/.* = ExtJs Framework [123123]


Avatar

bluesmoon

about 1 year ago

I have a bunch of sample files that are used as input to my program and do not constitute source code of the program. Ohloh however looks at these files as well, and counts them as part of the source code.

The result is that a project that is 100% perl is reported as Mostly Javascript, since its job is to parse HTML files.

Project name: RSSyn


Avatar

Lester L. Martin II

about 1 year ago

Can you change the Project's Mostly written in part from C# to D as most of the C# stuff is old(I am switching it totally over to D). Please do so quickly.


Avatar

Lester L. Martin II

about 1 year ago

that's for Dinstaller


Avatar

dons

about 1 year ago

Haskell projects appear to be listed as C/C++ .

So while the commits from the git repo (I had to convert from darcs to upload) appear, instead, it appears *.cabal files are treated as C/C++. This is the Haskell make-like system, so should probably be treated as Haskell source too.

See, e.g., project xmonad

Haskell extensions: .hs .hsc .lhs .cabal Comments introduced with: -- Nested comments : {- and -}


Avatar

Imortis

about 1 year ago

FreeBASIC Compiler Is marked as being written mostly in VisualBASIC. This is wrong. FreeBASIC is a self-hosting compiler. This means that FreeBASIC is written mostly in FreeBASIC.


Avatar

NeoStrider

about 1 year ago

Angstron is not written in shell script! I have a few scripts , but a have other gazillion of .h files ,with lots of C++.

btw, great site!


Avatar

Hagen Möbius

about 1 year ago

NewStrider, the language your project is written in is determined by what language has the most lines of code. Your entire project has around 32k lines of which 21k fall to your shell script "configure".

Remove it from your repository. You don't need it there anyway because that is what you have the autogen.sh for.


Avatar

dons

about 1 year ago

The gtk2hs project source is identified as mostly Pascal, not Haskell. :)

This library uses the c2hs preprocessor, meaning it has files with extension *.hsc, *.chs and *.chs.pp -- these are Haskell files.


Avatar

Robin Luckey

about 1 year ago

Hi dons,

I'll add this to the bug list over at labs.ohloh.net. If it really is just more file extensions, then it's an easy change, but fixing it also requires us to recount all of projects with these extensions. The recount queue is very long right now, so we might have to delay the fix.


Avatar

Arc "warthog"...

about 1 year ago

"Mostly written in Python"

Your scanner has mis-matched Pyrex code (.pyx .pxd .pxi), a hybrid of C and Python, and Python, or it's picked up setup.py and two small .py files as the only source code.

This applies to SymPy and PySoy at the very least, I know there's at least a dozen other major projects that use Pyrex.


Avatar

Adrian Pop

about 1 year ago

Hi,

I know that you are dealing with a gzillion languages out there, but you could also add Modelica on your list. The file extension is ".mo".

Cheers adrpo/


Avatar

Dag-Erling Smørgrav

about 1 year ago

"Mostly written in shell script" for Munin is not entirely correct. The core and many of the plugins are written in Perl; only some the plugins are written in shell script.


Avatar

Robin Luckey

about 1 year ago

Hi Dag-Erling,

You can download our source code line counter Ohcount from labs.ohloh.net and it will show you the detailed results for our counts.

I ran the tool against a local checkout of Munin and found this:

Language        Files       Code
--------------  -----
shell              58       2916
perl                7        427
css                 1        169
html                1         64
dmd                 7         43

I did some hand inspection of the files that Ohloh believes are shell script, and it looks correct to me. There are a lot of *.in files written for bash, and not very many perl scripts.

If you find some particular mistakes in Ohcount please let us know.


Avatar

Robin Luckey

about 1 year ago

Hi masterfreek,

This is a good idea; good enough that it comes up every now and then in our forums. I'm in favor of something like this, but it will probably be a while before we have the time to implement it.


Avatar

Tushar Joshi

about 1 year ago

NetBeans project categorized in mostly written in JavaScript

http://www.ohloh.net/projects/netbeans#

When I know that the project is mostly written in Java. There must be some way to tell Ohloh about the mostly written language.


Avatar

mray

about 1 year ago

Not sure about the Python count either. Zenoss is listed as a C++/C project, but a quick count of my source tree shows 2500 C files, 1 CPP file and nearly 9000 Python files.


Avatar

Jack Repenning

about 1 year ago

SCPlugin is marked mostly shell, actually shell: 34% Objective C: 32% HTML: 13% C: 13%

but David A. Wheeler's SLOCCount calls it Objective-C: 6954 (47%) C: 6894 (46.4%) C++: 379 (2.6%) Ruby: 285 (1.9%) Shell Script: 261 (1.8%)

which I find much more credible


Avatar

Jack Repenning

about 1 year ago

OK, really strange: ohcount 1.0.1 says sensible things about scplugin (objective_c: 85 files, shell: 10 files, c 12 files), so why does ohloh say wrong things?


Avatar

Jack Repenning

about 1 year ago

Interesting other discovery: my run of ohcount showed 8497 lines of shell, which rather startled me (though still not constituting "mostly," as claimed in the project summary). I looked, it's true, sort of ... but 8127 of those lines are in a copy of "libtool" that we heisted from MacPorts, we don't maintain ourselves. I wonder if there shouldn't be a way to exclude something like that? It's binary, read-onlyu to us....


Avatar

Stuart Yeates

about 1 year ago

The unhelpful answer: what are you doing with a It's binary, read-only file in your version control then? The easy way to fix this is so that only versioned stuff is in your version control system. Pull the rest in at compile/run time

The helpful answer: this is a known problem. It also applies in spades to ./configure files in lots of C/POSIX projects and ant/xml files in lots of java projects.


Avatar

Sam Steingold

about 1 year ago

clisp is claimed to be written mostly in shell script due to a bunch of configure scripts. there should be a way to eliminate generated files from the count; at least the generated configure files have "Generated by GNU Autoconf" on the 4th line.


Avatar

Robin Luckey

about 1 year ago

Hi Jack,

The main cause is that ohcount considers configure to be a shell script, while sloccount ignores configure files completely.

There are two possible changes we could make: first, ohcount does have a special language category for autoconf, and we could modify ohcount to put configure files in this category. Alternatively, and this may be the more popular idea, we could ignore configure files in the same way that sloccount does.

It seems to be fairly common practice for a project to include some amount of third-party code or tools in their source trees, and it seems like we will need to accommodate that somehow in the long run.

We've had a lot of requests to flag certain files and directories to be ignored by Ohloh, perhaps using a robots.txt-style file. I'd like to get this implemented, but it will probably be a while before we have the time to do this.

To be fair to Ohloh, 'libtool' is not a binary, it really is a shell script.

The line counts available online at Ohloh are occasionally obsolete, because ohcount is continually receiving new patches, and it takes a while to recount all of the projects on Ohloh using the latest ohcount. It looks like the SCPlugin counts are up-to-date, however.

If there are still any mysteries here, let me know and I'll try to answer them.

Thanks, Robin


Avatar

qu1j0t3

about 1 year ago

Your language sniffer doesn't work well for me. This project: http://www.ohloh.net/projects/10160 is claimed to be "Assembly" but it does not contain 1 line of assembler. It's a 100% C project.


Avatar

Robin Luckey

about 1 year ago

@qu1j0t3:

Are we talking about http://www.telegraphics.com.au/svn/picide/trunk?

Looking through a local checkout of this project, I see a lot of *.asm files and no C at all.

We must be talking about different repositories.


Avatar

costaju

12 months ago

One another: dtach is marked as being mostly shell-script.

That is a funny one :)

Probably the automated process doesn't rule-out the ./configure script (which is obviously wrong!). Being a relatively small C program, the configure script wins by KO :)

This should be corrected...


Avatar

Robin Luckey

12 months ago

Hi costaju,

I think you're probably right -- this is the ./configure script taking over.

We changed our line counter recently to recognize ./configure scripts as "autoconf" instead of "bash script", but for the change to take effect we need to recount all of the code on Ohloh, which takes a while :-).

I'll go ahead and schedule a clean recount of dtach. That should clear up the issue and report it as a C program.

Thanks, Robin


Avatar

Mike Laughton

11 months ago

Hi Robin,

Are there any plans for Ohloh to consider the "libtool" shell script to be its own separate grouping, much like what happens with configure/autoconf? For my small C project (libdmtx) the actual C code might never grow bigger than the scripts generated by Autotools.

Ordinarily this wouldn't be a big deal, but I'm a little worried that people learning about my project for the first time might be scared off by the prominent "Mostly written in shell script" on the summary page. Those who are seeking a high performance library might immediately move on to the next search result without reading any further.

Thanks, Mike


Avatar

Robin Luckey

11 months ago

Hi Mike,

I'm not familar with the details of libtool, but this sounds like something that wouldn't be terribly difficult to do.

We don't have any time scheduled out now for working on our language parsing, but we are happy to take patches if you or someone you know can find the time. The source is over at http://labs.ohloh.net -- following the example of autoconf might make libtool support pretty easy to figure out.

Let me know if you have questions, Robin


Avatar

mcpierce

6 months ago

I guess no fix or update's been done for this? It's reporting my project, which is written in Ruby and for which we have written *no* Javascript, beyond small pieces of boilerplate from the Rails engine, as being "most Javascript".


Avatar

Robin Luckey

6 months ago

Hi mcpierce,

I assume you're talking about project ProjXP.

This project does have a lot of Javascript in it, in the /public/javascripts directory. It's about as much as the entire Ruby content of the rest of the project.

However, I think there might be something going wrong with our analysis -- it looks like the line count totals in our report do not match the totals I get when I do a manual count. I'll investigate.

If you want to try out our line counter on your own, you can get the sources here.

Thanks,

Robin


Avatar

qu1j0t3

5 months ago

Robin If you view the URL in my post, you'll see that it refers to a 100% C project (psdparse) yet your sniffer incorrectly identifies it as Assembly. https://www.ohloh.net/p/10160/analyses/latest


Avatar

Jason Allen

5 months ago

hi qu1jot3,

i checked out psdparse's enlisment - and tried it out on my box here:

svn checkout http://www.telegraphics.com.au/svn/picide/trunk

By looking at the files it included, I mostly see .asm and .inc files. Is it enlisted in the right repository?


Avatar

qu1j0t3

5 months ago

Jason, Robin: My apologies. 100% user error on my part. The enlistment pointed to the wrong project. In this case your sniffer is innocent. I'll review my projects sometime to make sure the license and language sniffing is accurate, it seemed to have problems with license detection in the past.

Thanks and apologies for the false report.


Avatar

Briel

5 months ago

The code analysis for Djime shows about 3k lines of javascript. About a month ago we had some jQuery files within the project, but moved it out, since it wasn't a part of the project. Still Ohloh is showing that the project is mostly written in javescript.

with your tool, I got a count of:

177 lines of javascript 2539 lines of python

My local dev version is a bit ahead of the master branch, but the ratio should still be the same.

It would be nice if you could change the most written in from javascript to python, since this clearly is the case.

Thanks.


Avatar

Robin Luckey

5 months ago

Hi Briel,

This look like a case of a known bug in our code counter.

Our online code counter is cumulative, in that it incrementally adds the changes from each new commit to a running total.

This gets into trouble when identical edits appear on one or more merged branches. This leads to doubly-counted edits, and incorrect totals.

I know what needs to happen to fix the issue, but it's going to take a while to get all the necessary code changes made.


Avatar

jedirock

4 months ago

My project, BOINC Manager (https://www.ohloh.net/p/boincmanager), has mostly written in C++, which is correct, as most of the code at this point is a C++ library we're using. However, we have negative lines of XML, so C++ shows as 147% of our project, and XML is <1% and -5600 lines of code. Is this related to the cumulative counting bug above?


Avatar

Robin Luckey

4 months ago

Hi jedirock,

The negative lines of XML code result from the fact that our reports are updated incrementally, but our source code line counter is always improving, and this can lead to inconsistencies.

Here's the technical reasoning, if you are interested. Previously, our line counter did not recognize many types of XML. Our counter recently began recognizing more types of XML, at the same time that this project began removing some XML.

As a result, our system correctly subtracted some XML, but we'd never recognized any before, so we had a negative total.

To fix the problem, I ran a complete historical recount using the latest version of our counter. The results are now correct.

Let me know if there are any more problems,

Robin


Avatar

jedirock

4 months ago

Thanks Robin. I'll keep an eye on it, let you know if there's any more problems. Not sure what XML could've been removed, but I've been in the process of cleaning up the trunk with a new contributor, possible something in there. Thanks again.


Avatar

Thorsten Glaser

4 months ago

The MirPorts Framework is actually mostly written in BSD make. It is understandable that make is skipped for regular projects which may have huge build systems, but in this case, the build system is the actual project.


Avatar

Matt Behrens

3 months ago

I'm guessing the same problem Thorsten has identified is also what's seen in the Solaris Package System.


Avatar

Briel

3 months ago

I posted some time ago about the code counter not working properly, apparently a known bug that would take a while to fix. Would like to know if there is any progress on that end?


Avatar

Robin Luckey

3 months ago

Hi Briel,

This bug has not yet been fixed. If an identical code edit appears on two separate git branches, it will still be counted twice by Ohloh.

There has been some progress towards a fix, and we have been experimenting with some new code for computing line counts. However, I can't predict when a new counter will be deployed. It will probably be several more months at least.

Thanks, Robin


Avatar

Briel

3 months ago

Hi Robin, thanks for your speedy reply.

It doesn't seem like the problem is like you describe. The reason being that there only is one branch on the tracked git repository. Also the numbers of actual javascript lines is close to 200, but ohloh say that we have around 2800 lines. I think the problem is, that somewhere down the line when the jQuery library file(s) was removed from the project itself, ohloh's counter didn't get that and is still counting those files as being part of the project. At least that's the only thing that's making sense to me, at the extra lines of code is roughly the lines of code we had for jQuery. I could be completely off here, but I think it would be worth checking out, to see if there is a bug in the ohloh tracker regarding deleting files. We have previously remove other jQuery files that ohloh did track, so I'm not sure what is causing this problem. But I highly doubt that it could a problem with ohloh counting branches twice. The differece is just too big (15 times), and with the main repository having 1 branch and my own having 2 that doesn't count for the big difference.

Another thing is that the amount of python code is pretty precise, so it wouldn't make sense that the javascript files would get counted extra.

Well enough of me talking about lines of code in the project. Hope this helps you finding out the cause of the problem.

~Jakob


Avatar

jpike

3 days ago

Please add the .zsh extension for the zsh scripting language. Zsh is an excellent and commonly used shell, very similar to bash with some great extensions. What is currently parsing bash will cope with it.