Excluding Code/Markup

Avatar

Michel Jung

5 months ago

Hi there

Is it possible to exclude code from statistic? There's a lot of generated code which leads to a "4 person year" project even it's only a few months old and uses a third-part-library.

Greez


Avatar

Nicolas A. Barriga

5 months ago

I have a similar problem. I have a project with a 3rd party library in the same repository, so it lists "3 person year" instead of the correct 1-4 months.


Avatar

Robin Luckey

5 months ago

Sorry, this is not currently possible.

For a long time, there has been a good idea floating around: Ohloh should support some kind of robots.txt-like file that would allow you to instruct Ohloh to ignore or give special treatment to certain directories.

I think that's a great idea, but we've simply never had the development resources required to get it done.

If you are using Subversion, there may be a workaround, but it's a lot of work: rather than enlisting your entire trunk in Ohloh, you can individually add every directory except the directory containing the 3rd party library. If you have a lot of directories, I can appreciate that this may not be a realistic option.

Thanks, Robin


Avatar

jfuerth

5 months ago

A few of the projects I manage show as "mostly written in XSLT" because we have the DocBook stylesheets in our SVN repositories. We'd also appreciate a way to exclude a particular directory.


Avatar

tpokorra

5 months ago

I have added a ticket: http://labs.ohloh.net/ohcount/ticket/317; unfortunately I am not a Ruby coder (yet); anyone else up for it?

(see also thread https://www.ohloh.net/topics/3356?page=1#post_10651)


Avatar

IBBoard

5 months ago

I'm hitting this with my project as well. I've just added an app I'm writing called "WarFoundry". It has a System.Windows.Forms (Microsoft .Net Windows native) front end. All of the ".resx" (resource) files are bumping our XML line count way out of proportion. The Glade files for the GTK# front-end are probably doing the same thing. Both are all auto-generated files.

Unfortunately the "multiple directories" idea won't work because resx files are in the same place as .cs and Glade files are in a folder below the main code. What would be great for that situation would be an "ignore file name pattern" option :)


Avatar

Peter Bex

5 months ago

A better workaround for Subversion is to make the 3rd party code an svn external. Last time I checked, Ohloh doesn't traverse externals.

You can do this by adding the 3rd party code outside the regular code tree, for example at /3rdparty instead of /trunk/3rdparty and make /trunk/3rdparty an external pointing to /3rdparty.


Avatar

Michel Jung

5 months ago

Ok you all got some ideas, but i can't agree with most of them. I'd say we all don't want to instrumentalize our project structure just to go well with the ohloh statistics.

The only good solution I see is to set paths/patterns to exclude in the ohloh control panel.


Avatar

IBBoard

5 months ago

Looks like there's a minor false-alarm with my project :) While digging around I found that I'd included the Log4Net documentation as well as the DLL (I hadn't paid attention to what was in the .xml file). It appears that although .resx files are XML, Ohloh doesn't pick them up as such.

Still, the general idea of "filterable paths" for when a project does include code or files that are being picked up but aren't wanted in the count is a good one :)


Avatar

okinsey

3 months ago

This is really necessary as it is currently what holds me from adding my primary git repositories to ohloh. As it is now, Im manually syncing the changes into a separate subversion repo where I only enlisted the src directory.

But I have one issue with the include/exclude thought, why dont we just classify paths into categories like sourcecode, data, docs, external etc. For example:

/lib/ > external

/docs/ > documentation

data/ > data

* > sourcecode

It shouldn't be to hard to match each path against this during counting, and assigning the score to the appropriate category..


Avatar

Robin Luckey

3 months ago

Hi okinsey,

I agree with your idea -- I'd always visualized this as more of a tagging system than an exclude/include system.

Initially, we might only honor the "ignore" tag, but as time goes on we might allow code to be tagged in all kinds of interesting ways.


Avatar

okinsey

3 months ago

@robin, good to hear that, but the main question still remains - is this feature ever going to be implemented?

The topic has existed for quite some time, and most of the solutions presented has been quite easy to implement.


Avatar

Robin Luckey

3 months ago

Sorry, I can't make any estimate when we will get to this.

We are currently focused on performance and reliability issues. We're physically moving to a new data center, and we are redesigning our source control processing for better scalability. I can't guess how long it will be before we have free cycles to add new features.

And while agree that the solutions on this thread are good ones in principle, when you think about actually implementing them, they turn out to be surprisingly complicated.

Does a robots.txt-style file apply to all revisions of a repository, or just particular revisions? If I rename or move code, I'll need to change my robots.txt. How does the time axis of robots.txt work? How do I know which revisions are covered by which robots.txt?

How do I confirm that Ohloh processed my robots.txt correctly? How will Ohloh explain that some code is ignored intentionally? Currently, Ohloh doesn't even let me browse the code at all. How can I debug the reason for missing/extra code?

Finally, what happens after someone makes a change to the robots.txt? One small change might require Ohloh to do a full recalculation across the full history of the project, which might take a week of server time on a large project. How will we avoid that?

Those are some reasons why this feature still does not exist. I'd really like to get it done, but it's messier than it seems at first. Maybe after we've hired some more help... :-)


Avatar

okinsey

3 months ago

The easy answer would be not to use a file present IN the repository, but rather require the person enlisting the repository to supply the patterns for categorizing befor any processing takes place. These patterns could be immutable, and hence, would not lead to any extra processing - rather, by having an ignore tag (that actually caused the parser to ignore the files) you would free cpu cycles for more important work.


Avatar

IBBoard

3 months ago

Just to go back and correct one of my previous statements, it appears that Ohcount does count .resx (.Net's "resources wrapped in XML") files as XML - https://www.ohloh.net/p/WarFoundry/commits/48803516?page=4 - as well as .manifest files (which are XML, but are also auto-generated) - https://www.ohloh.net/p/WarFoundry/commits/48803516?page=5 - and a few others.

So, from a C# project point of view then filtering based on extensions to remove auto-generated code from the list would be useful :) Personally, I'd prefer an Ohloh-based solution checking file paths against a pattern rather than some extra file to put in the repo (which "contaminates" it with unnecessary junk).

The recalculation problem could be an issue, unless it just never gets applied retrospectively (much like a commit - once you make it then it is always there, which is why my line count spiked like crazy because of some XML docs!)


Avatar

hero6

3 months ago

你好