"Ignoring files/folders, ie. javadocs?"



Is it possible to ignore parts of a repository, for instance, we (http://www.ohloh.net/projects/3946) have our javadoc tree checked into svn above the root, and this gives the project an unhealthy weighting towards HTML.

Is there a way to ignore the /docs/ folder, or does this need to go into the wishlist forum?

Another question.. should I want to ignore the javadocs folder? It's mostly meaningless, as it's all generated, but it does add significant value to the project (but maybe not as much as writing proper docs of that length by hand), etc.

Also, there's some stuff generated by JavaCC/JJTree (http://javacc.dev.java.net/) and friends (ie. http://trac.uwcs.co.uk/choob/cgi-bin/trac.cgi/browser/trunk/src/uk/co/uwcs/choob/support/ObjectDBClauseParser.jj), they all have /... Generated.. / on the first line, and don't contribute to the project at all, is there a way to ignore them?

(For anyone who's curious, both of those are in SVN such that a user checking stuff out won't have to do random code/docs generation themselves.)

Avatar

Faux

over 6 years ago
 

Hi Faux,

The ability to ignore folders is a common request, and it's one we've been thinking seriously about implementing. It's pretty common for a project to include a lot of 3rd party libraries or build tools in their source control, and it's not correct to attribute these things to the project. It's really a question of developer resources at this point.

Personally, I feel that you shouldn't be so eager to ignore your docs folder. A lot of Ohloh users seem to be concerned about having a lot of XML or HTML in their projects, but I'm not sure where this concern comes from. Enlighten me?

Another feature we've tossed around is the ability to label directories as containing documentation or test code, although our ideas for this are a little more vague. This would help identify developers who don't write documentation or tests, and we could generate independent reports for the separate sections of code.

Ignoring source code that was generated by a tool is another feature we've been thinking about. It's not as high on our priority list right now, but we would like to filter out this type of code. I think it's doable by simply looking for some common phrases in the first comment block of a file.

You're not alone with these requests, and as we have resources available we'll be addressing them.

Thanks, Robin

Avatar

Robin Luckey

over 6 years ago
 

Enlighten me?

The documentation generated by javadoc is just a "transform" of the sourcecode (and associated comments) into another form.

To pick a silly example (I don't feel this strongly about ignoring the javadocs, but..), say we had a subversion branch where we replaced all the tabs with eight-spaces to keep some people who dislike tabs happy? Should that branch be included?

This would help identify developers who don't write documentation..

Comments (ignoring ones that aren't just removed code, giving extra credit to appropriately formatted comments (ie. javadoc (/*) and doxygen (/!) etc.)) are probably an extra indication of documentation than HTML, especially if the HTML is measured by line?

Avatar

Faux

over 6 years ago
 

Ignoring folders/files of code simply included from other projects would be a very welcome addition, especially for the scripting language apps, where it is customary to pack all the components / libraries used into the application bundle (in the forum thread about "PHP eats Ruby etc..." some people complained about the amazing LOC count of php applications.I think removing 'included' code would help a lot in normalizing those cases, i.e. lots of stuff included makes php development real fast).

As a side note, a very useful metric would be (mostly for libraries / components / frameworks, I guess) the number of projects that bundle a given application i their distribution. I have no idea how this could be gathered, though. Maybe checksumming every file and especially the directory listings (file names + sizes), and comparing them across projects?

Avatar

Gaetano Giunta

over 6 years ago
 

I would say that excluding directories manually from the normal statistics is significantly more important than trying to "classify" directories (e.g. as docs, generated, etc.). It'd still be nice to see the stats for them in individual commits, and such, but not in overall project stats or in overall user stats - one project I work on has attributed 93k lines of JS and 19k lines of CSS to one person because they checked in a JavaScript and HTML toolkit we use (the project's also now claiming to be 77% JS). If you classify things, you then get into the situation of trying to decide what is counted and what isn't, which is likely a minefield.

Avatar

James Ross

over 6 years ago
 

I would say ignoring folders would be the easiest first step and should reduce the whole problem to a minimum.

I just thought of putting a ignore-flag here in the ohloh-projects for single folders. But I guess this is problematic as everyone can edit projects here.

So what do you think about putting a "ohloh.ignore" or whatever file into those folders via SVN or CVS which tells Ohloh to simply ignore the folder where the file resides? This ensures that only the developers can do this task as you normally have to login to CVS or SVN and the one file should not bother anyone.

Avatar

anse

over 6 years ago
 

An alternative way could be to put one single "ohloh.txt" file into the project's root-folder and fill this file with command-lines similar to those you find in a robots.txt on a webserver. This would accommodate developers which don't want to put countless ignore-files into many folders. For example:

ignore docs/

ignore libs/3rdparylibrary/

[etc.]

This would be extremely useful in my eyes and should not be too complicated to implement. What do you think?

Avatar

anse

over 6 years ago
 

I like the idea of it being related to the RCS in some way, but (at least, for subversion), how about using properties?

These could be on either the root or specific directories, I'd think that individual directories (or files) would be better.

I personally think that, for legacy RCSes (where properties aren't avaliable), the robots.txt way would be better, but you'll have to be careful about defining where the "root" is (ie. it'd need to be the root of the import, as apposed to the root of the repository).

Avatar

Faux

over 6 years ago
 

If you want to implement something to ignore certain folders, I wouldn't use a file called "ohloh.txt" or something like that. Keep it generic. There are other sites out there who provide statistical data for projects.

I think the best way to go would be to define something publicly and then let others use those specs too if needed.

For example, call the file "statrobot.txt" and use the same specs as the robots.txt file used by web search engines.

Avatar

Stefan Küng

over 6 years ago
 

As robin mentioned, this has to be a high-request item (for many various reasons) and would be a great feature to have. Every project I work on that has an ohloh project page could actually use this feature (mostly 3rd party dependency sources that need to be ignored). That said, I'm sure there'd be some dissention on how to go about specifying paths to ignore or classify them.

Personally, I wouldn't want to have a file in my project's SCM system (whether it be CVS or SVN or otherwise) that was specific to ohloh if I didn't have to. A .ohlohignore or something similar to a .cvsignore might be fine, but it would seem better to keep the metadata with the context that needs it -- i.e., as part of the ohloh project page through the web interface. Especially given that it seems like project enlistment updates are progressing more automatic now, the stats would eventually sort themselves out per any ignore/classification settings.

Avatar

sean

over 6 years ago
 

I hope this hasn't been ignored, as I can't find it anywhere on the project admin page. Our project has about twice as much 3D model data stored in XML files as code, and it grossly distorts all the otherwise useful statistics Ohloh gathers.

Avatar

Calder

almost 4 years ago
 

I have a project just added that is basically only one files. But there's some "upstream" files in there, which totally skew a measurements. The project is a rewrite of some PHP code, with the PHP code still being included for reference... but that makes the project a "mostly PHP" project. Dang. ;-P

Avatar

Jürgen A. Erhard

over 3 years ago
 

I would also like this ignoring folders option. developing a cms and just adding fckeditor makes the project look like js when its actualy php.

Avatar

dogmatic69

over 3 years ago
 

Agree. My project is full of VS project files, CBP project files and Codewarrior project files and is being classified as XML project and not C++! I would like to have a way to ingore certain files by masks and certain paths too.

Avatar

Danny Angelo Ca...

over 3 years ago
 

Not much to add other than "+1"

Avatar

Graeme

over 3 years ago
 

I also would love to see this feature. Especially for the original poster's Javadoc. It would be nice if project administrators could deselect certain languages from appearing as a part of their statistics.

anse's suggestion for an ohloh.txt would also make a lot of sense for our project as well.

Avatar

david_jurgens

about 3 years ago
 

This would be great for third party libraries.

Avatar

Christoffer Niska

over 1 year ago
 

Christoffer and all,

See: https://www.ohloh.net/blog/LatestUpdatesToIgnoringFilesandDirectories

Many projects now use this to good effect. Just remember, it takes a while for the request to be processed. It usually is in effect on the next update.

Thanks!

Avatar

ssnow-blackduck

over 1 year ago
 

@ssnow-blackduck: I actually found this seconds after I posted to this thread. Great feature.

Avatar

Christoffer Niska

over 1 year ago
 



 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.