Mostly Written In...

Avatar
written by Jason Allen
feb 23 2007
 

Ohloh analyzes projects' source code and determines what programming languages were used to create them. While most projects are written using many languages, a "mostly written" fact simply highlights the top language used (as counted by lines of code).

Comments (51) Subscribe to Mostly Written In...

Avatar

Cato

about 1 year ago

TWiki is mostly written in Perl (all server side code) with only a small amount of JavaScript for AJAX frontends, WYSIWYG editing plugins and skins.

Please set TWiki to be 'mostly Perl', and also update the license 'analysis' to reflect fact that it is entirely GPLed - have set license to GPL already.


Avatar

Cato

about 1 year ago

Another TWiki note: it has been under revision control since at least Feb 2001 (first CVS, now SVN with all CVS checkins imported), based on revision log for one key module. So I think that qualifies as long history of revision control.


Avatar

Doug Napoleone

about 1 year ago

Where are you getting this Javascript and TWiki stuff from? This is a python Django Project!


Avatar

Doug Napoleone

about 1 year ago

oops, Think I misunderstood where this link came from. Why is this in my PyCon-Teck project summary?

I dont knwo why it claims that 60% of the code is javascript when there are only 2 JS files (500 lines) and 10K+ lines of python.

There is an evn:externals of the dojo javascript framework, but ohooh should exclude all svn:externals as that is not development local!


Avatar

Robin Luckey

about 1 year ago

Cato,

Our reports are generated automatically by source analysis. There's nothing that we set or change manually other than the URL to the source control server.

It may be surprising, but this source tree does in fact contain much more JavaScript than Perl. I did a manual review of a local checkout, and including blanks and comments I found roughly 280K lines of JavaScript and 235K lines of Perl. This matches the Ohloh report very closely.

The licenses are likewise determined entirely by parsing the source code, and all of the licenses listed in the report do appear in the code. For instance, the Academic Free License appears in many of the .js files in the DojoToolkitContrib subdirectory.

We do allow you to edit the "overall license" of the project, which has already been set to GPL for this project.

We are working on a feature right now that will allow you to see a list of the exact files which we consider to be covered by each license and language, and hopefully that will help clear up any confusion. In the meantime I'm happy to help with any specific questions.

derivin,

Ohloh does not include any externals in the report.

I think you are mistaken on your line counts. It's true that there are only a few .js files in PyCon-Teck, but with a quick examination of the source it was easy to verify that they contain over 22K lines of javascript -- not 500.


Avatar

Nlaw

about 1 year ago

Re: MLDONKEY

The mostly written in C/C++ for mldonkey is incorrect. It's actually written in a language called OCAML

Regards NL


Avatar

madth3

about 1 year ago

ohloh clearly needs to work in in this feature, judging by the comments.

Struts2 is Java, but since it includes dojo is marked as mostly Javascript.


Avatar

Stephan.Schmidt

12 months ago

Obviously the problem with Javascript may be that JS ist distributed as source code in a project. So if you add several JS libraries (prototype etc.) to the project then Ohloh probably (wrongly) adds those files to your projects and rises your JS count. It's probably not clever enough to detect JS libraries you only distribute. Worse: It may add the JS licenses to your projects and claim your project uses X and Y licenses.


Avatar

Jason Allen

12 months ago

Nlaw: Ohloh doesn't recognize OCAML yet. It's on our todo list.

madth3 & stephan: Your assumptions are correct. Ohloh doesn't know why (or how) files were added - so it treats them all the same.

Worse: It may add the JS licenses to your...

this reflects the feature as it was designed. Our license file sniffer is meant to be a starting point for people who care what additional licenses a project might include. Even if the governing license is X, you sometimes have to pay attention to additional embedded licenses.


Avatar

cbbrowne

11 months ago

Prolog is also obviously not a language covered yet; the system thinks all my Prolog code is Perl...


Avatar

hpages

11 months ago

Same for R; the system thinks all the R code that Bioconductor is primarily written in is XML... It would be better, maybe, to report something like "unknown language" instead. In fact, not all files in a project can be tagged with a programming language: documentation, config files, data files, etc... Bioconductor packages contain a lot of data!


Avatar

Andres Almiray

10 months ago

Its funny that JideBuilder is written 100% in Groovy but the report says "mostly in JavaScript"


Avatar

Jason Allen

10 months ago

Andres: I downloaded JideBuilder's source code and did a very primitive scan:

> find . -name '*.js' -exec cat {} \; | wc -l
> 5979
> find . -name '*.groovy' -exec cat {} \; | wc -l
> 2533

Javascript has over twice the raw number of lines of code over groovy.


Avatar

Jason Allen

10 months ago

hpages: You bring up some interesting points. I'll need to give the 'unknown suggestion' more thought.

Regarding the documentation, config, data, etc..: I think the key here is to try to understand what id hand-authored vs machine-generated as well as what is documentation vs what is code. I think measuring all types of contributions is useful.

Finally, regarding R: We've been taking a break from adding more languages, but hopefully we'll have a chance to catch up soon - and add R.

Thanks for the heads up.


Avatar

Jason Allen

10 months ago

cbbrowne: Doh! We have a file extension conflict. ".pl" is currently mapped to being a perl file. I thought that Prolog files were supposed to end with '.P'.

I'll have to write some detection code to disambiguate them. Any suggestions on what to look for?

Off the top of my head:

Not counting lines that begin with #include', if there are more lines starting with '%' then '#' - then assume prolog.

If there are any lines ending in ':-' - assume prolog.

I'm pretty sure these 2 rules alone would solve your specific case (canada2003). Any feedback would be appreciated...


Avatar

Brian Downing

10 months ago

SBCL (http://www.ohloh.net/projects/5299) is written in Common Lisp, but Ohloh says it's primarily written in C/C++.

We have C components:

:; find . -name '*.[ch]' | xargs wc -l | tail -1
  34115 total

but they are dwarfed by the amount of code in Lisp:

:; find . -name '*.lisp' | xargs wc -l | tail -1
  395570 total

Commits to .lisp files don't seem to be counted as Lisp either.

In fact, those 400,000 lines of code don't even show up in the code report:

http://www.ohloh.net/projects/5299/analyses/latest


Avatar

Brian Downing

10 months ago

By the way, if you want to try and differentiate Common Lisp from random generic Lisp, Common Lisp sources usually have a line that looks like one of these:

(in-package :cl-user)
(CL:IN-PACKAGE "MY-PACKAGE")
(common-lisp:in-package #:this-is-rather-long)

so maybe looking for a line like:

/^\(([^\)]*:)?in-package\s/i

would be a decent tact? It won't get everything, but it'll get a lot.

There's no standard for file extensions, but *.lisp is quite common, *.cl is also used, and I believe *.lsp is used by 8.3 holdovers.


Avatar

indeyets

10 months ago

There is a problem with OCaml projects. For some reason they are treated as Objective-C

Objective-C extensions: m, mm

OCaml extension: ml


Avatar

Duncan Grisby

10 months ago

omniORB is mostly written in C++, not Perl! In fact, there are only about 300 lines of Perl in the whole thing, compared to over 300,000 lines of C++.


Avatar

luciash

10 months ago

Mostly written in JavaScript ?

Code says it correctly (mostly written in HTML+CSS) but on the Report tab Ohloh Summary states "Mostly written in JavaScript" which is obviously wrong...


Avatar

Robin Luckey

10 months ago

@luciash,

This is by design, but in your case our design is not appropriate.

Our system categorizes XML, HTML and CSS as "markup" languages, and does not consider them when determining the "Mostly written in" summary.

We did this because many projects have a great deal of HTML documentation or XML configuration, which can obscure the fact that the project is actually writting in something like Java. In the vast majority of cases, it is wrong to decide that the main language of a project is HTML or XML.

In your case, however, it looks like your project is specifically about CSS, so it would be appropriate to list this projects as "Mostly written in HTML". This is a rare case, and our system isn't smart enough to figure this out.

Any thoughts about how to work around this in your case?


Avatar

luciash

10 months ago

@Robin Luckey

Thanks a lot for your nice reply. Now I get it !

Here's my suggestion which imho shouldn't be hard to implement:

What about having some switch (maybe radio button or dropdown) to determine in project info setting if it is mostly markup/template/documentation project or coding/programming project ? Default state would be set as is now. Then it would count on with markup files (CSS/HTML/TXT) or exclude them as you currently count in "Project Cost" and generate the Ohloh Summary based on that.

Would be great new enhancement for some projects like mine :)


Avatar

Andres Almiray

9 months ago

Jason (On JideBuilder's code), you're right, if the whole project is scanned that way, JavaScript will win over Groovy, but the fact is that the javascript code is part of the documentation, not production code. So I guess that without a project profile (how files are distributed) the measures will not be 100% accurate, but as each project may follow its own convention, this task will be huge and almost impossible to accomplish automatically.

Would it be possible to add more information (optional step of course) when registering a project, like source dir, test dir, doc dir ? that might help the tools make more accurate measurements =)


Avatar

Michał Słaby

9 months ago

Jason, I like the language chart, but sometimes it is inaccurate in terms of language importance to the project. In my project it is PHP which is the most important language, but I use tons of Javascript for minor things the project can live without. It would be lovely to override preferred language in project settings.


Avatar

boran

8 months ago

Hi, First off: your tool is interesting! FreeNAC is mostly (90%?) PHP, with some embedded SQL statements. I do not know why you say its mostly SQL, there must be something about our code.. It also say 0% comments which is not true either.

Also I submitted the trunk for analysis, as per your instructions. But, most of the (incremental) work takes place in the branches, not trunk, which is used for very new features.

First SVN submits were in June'06, not this winter, so there seems to be an ageing issue too.

Regards, Sean


Avatar

Robin Luckey

8 months ago

Hi boran,

I was initially a bit puzzled, but the answer seems to be this commit, which includes 27,000 lines of SQL. That dwarfs the rest of the project, and results in Ohloh's conclusion that this is a SQL project. You can see this checkin as the huge spike at the end of the codebase graph.

I think there is a bug somewhere in our code regarding the 0% comments presentation. Our analysis found that the PHP code is 15.6% comments, and when you add in the SQL it comes down to 9.2% overall. I'll have to investigate why we show 0% in the factoid. This is the first time I've seen this particular problem.

If most of the ongoing development work is happening on a branch, then by all means, go ahead and change the enlistment to the branch and remove the trunk. (There seems to be two kinds of projects in the world: those that develop on the trunk and drop releases into branches, and those that develop on branches and drop releases into the trunk.)

Where did you find "this winter"? In the codebase graph and in the commit tables our first activity shown is in August of 2006, with the addition of the README and the first bin/port_scan files.


Avatar

Robin Luckey

8 months ago

Hi boran,

So the problem with the comments appears to be that when we generate the factoid, we only look at the 'main language', and we compare your comment ratio in the main language with the comment ratio for all other projects that used that same main language.

In your case, we decided the main language was SQL, and tragically, this project contains only 11 lines of comments in its 27000+ lines of SQL. That rounds down to zero. :-)

The only fix I can see is for us to change our heuristic for determining main language -- perhaps changing from total lines of code to total commits or something along those lines. I'll post a bug ticket about this.


Avatar

Victor

7 months ago

Hi,

about the prolog files, i think that if you can find lines that begins with :- and/or lines that contains one word followed by a :- it will work.

By one word i mean one predicate, something like : bla(blo, [hop], lol(bing))

I don't think that rules can have multiple predicate in the head ...

I know that Vim look at the % to recognize them


Avatar

Tobu

6 months ago

Unison is mistakenly reported as "objective C" instead of ocaml (or objective caml, which may explain the confusion).


Avatar

timeless

6 months ago

Some mozilla related projects are complaining about being called shell script based projects because of a single file (configure).

Configure (typically generated by configure.in) is interesting in that it's likely to get thousands of small changes. As such total commit count is not going to solve the problem of "total lines of code misdetermines the language"

For reference: http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/configure 1.1916 cltbld 2007-10-13 14:21 Automated update from host egg.build.mozilla.org

Configure of course tends to be massively long.

Unfortunately, a java project which commits its javadoc (or a perl project that commits its html generated pod) will be dwarfed in the total number of files (documentation > sources).

I think the trick is to try to generate an algorithm that determines "copying". If configure seems to be a derivative copy of configure.in (good luck), or foopy.html seems to be a derivative copy of foopy.pl / foopy.java, then it should be discounted.

I'm not sure how well that'll work, a project with heavy manual documentation will probably still be penalized, but overall it might enable you to count by numbers of files.

Another thing you could do is drop languages used in only one file.

I was wondering if this would penalize the mozilla despot project (which I believe is not yet imported into ohloh, I might do that just to see what happens)

http://mxr.mozilla.org/webtools/source/despot/

From memory, despot basically has one file (despot.cgi) and a help file (help.html), but it does actually have a couple of other .pl files, so today it'd probably be counted as perl. When enough of despot is converted to use .templ's, despot would probably be called html.

And I don't know that I'd mind.

Probably the simplest solution is to list the major language, and the second language and a note about which calculation would make the second language the major language.

It of course only works for 2, and would fail for perl+html+js, but it should probably help for configure :).


Avatar

syaskin

6 months ago

To add to this comments, Queplix is written 100% in Java2EE, but it includes GWT Google Web Tool Kit. However, the project was marked as "written mostly in Java".


Avatar

Krzysztof Foltman

5 months ago

What about ignoring configure scripts (and perhaps autogenerated makefiles) in language detection?


Avatar

arnoschn

5 months ago

Hey guys,

what about adding some control files to the repositories root:

ohloh.ignore:

*.js

ext/*

ohloh.languages:

*.phtml=PHP

*.pl=Perl

etc..

Regards, Arno


Avatar

arnoschn

5 months ago

ohloh.externals:

could reference the usage of other opensource projects, so that this is not counted to belong to this project but maybe as a boost for the kudo rank of the used project?

For example:

path = Name [ProjectId]

ext/.* = ExtJs Framework [123123]


Avatar

bluesmoon

5 months ago

I have a bunch of sample files that are used as input to my program and do not constitute source code of the program. Ohloh however looks at these files as well, and counts them as part of the source code.

The result is that a project that is 100% perl is reported as Mostly Javascript, since its job is to parse HTML files.

Project name: RSSyn


Avatar

Lester L. Martin II

5 months ago

Can you change the Project's Mostly written in part from C# to D as most of the C# stuff is old(I am switching it totally over to D). Please do so quickly.


Avatar

Lester L. Martin II

5 months ago

that's for Dinstaller


Avatar

dons

5 months ago

Haskell projects appear to be listed as C/C++ .

So while the commits from the git repo (I had to convert from darcs to upload) appear, instead, it appears *.cabal files are treated as C/C++. This is the Haskell make-like system, so should probably be treated as Haskell source too.

See, e.g., project xmonad

Haskell extensions: .hs .hsc .lhs .cabal Comments introduced with: -- Nested comments : {- and -}


Avatar

Imortis

4 months ago

FreeBASIC Compiler Is marked as being written mostly in VisualBASIC. This is wrong. FreeBASIC is a self-hosting compiler. This means that FreeBASIC is written mostly in FreeBASIC.


Avatar

NeoStrider

3 months ago

Angstron is not written in shell script! I have a few scripts , but a have other gazillion of .h files ,with lots of C++.

btw, great site!


Avatar

Hagen Möbius

3 months ago

NewStrider, the language your project is written in is determined by what language has the most lines of code. Your entire project has around 32k lines of which 21k fall to your shell script "configure".

Remove it from your repository. You don't need it there anyway because that is what you have the autogen.sh for.


Avatar

dons

3 months ago

The gtk2hs project source is identified as mostly Pascal, not Haskell. :)

This library uses the c2hs preprocessor, meaning it has files with extension *.hsc, *.chs and *.chs.pp -- these are Haskell files.


Avatar

Robin Luckey

3 months ago

Hi dons,

I'll add this to the bug list over at labs.ohloh.net. If it really is just more file extensions, then it's an easy change, but fixing it also requires us to recount all of projects with these extensions. The recount queue is very long right now, so we might have to delay the fix.


Avatar

ArcRiley

3 months ago

"Mostly written in Python"

Your scanner has mis-matched Pyrex code (.pyx .pxd .pxi), a hybrid of C and Python, and Python, or it's picked up setup.py and two small .py files as the only source code.

This applies to SymPy and PySoy at the very least, I know there's at least a dozen other major projects that use Pyrex.


Avatar

Adrian Pop

3 months ago

Hi,

I know that you are dealing with a gzillion languages out there, but you could also add Modelica on your list. The file extension is ".mo".

Cheers adrpo/


Avatar

Dag-Erling Smørgrav

2 months ago

"Mostly written in shell script" for Munin is not entirely correct. The core and many of the plugins are written in Perl; only some the plugins are written in shell script.


Avatar

Robin Luckey

2 months ago

Hi Dag-Erling,

You can download our source code line counter Ohcount from labs.ohloh.net and it will show you the detailed results for our counts.

I ran the tool against a local checkout of Munin and found this:

Language        Files       Code
--------------  -----
shell              58       2916
perl                7        427
css                 1        169
html                1         64
dmd                 7         43

I did some hand inspection of the files that Ohloh believes are shell script, and it looks correct to me. There are a lot of *.in files written for bash, and not very many perl scripts.

If you find some particular mistakes in Ohcount please let us know.


Avatar

masterfreek64

28 days ago

I got an idea...

why don't you make a robots.txt like system to mark "external" folders ( javascript libraries , autogenerated code , etc) that does not belong into the project? For example my project, OblivionOnline contains a LOT of external libraries , because all dependencies are in the repository to make stuff easier...


Avatar

Robin Luckey

28 days ago

Hi masterfreek,

This is a good idea; good enough that it comes up every now and then in our forums. I'm in favor of something like this, but it will probably be a while before we have the time to implement it.


Avatar

Tushar Joshi

26 days ago

NetBeans project categorized in mostly written in JavaScript

http://www.ohloh.net/projects/netbeans#

When I know that the project is mostly written in Java. There must be some way to tell Ohloh about the mostly written language.


Avatar

mray

7 days ago

Not sure about the Python count either. Zenoss is listed as a C++/C project, but a quick count of my source tree shows 2500 C files, 1 CPP file and nearly 9000 Python files.