Hi,
I note that you've had some more minor hiccups with update queues, and had a few thoughts as to how they can potentially be addressed.
In the shorter term, the problem you of course are encountering is that you are continually adding more projects as well as trying to update existing ones whenever possible. Of course, this takes server resources for each project, so updates cannot be in real time - probably.
What I'd suggest is a few small things, with a third in the future.
- Allow projects to specify update intervals For example, I have a number of projects enlisted in Ohloh (InspIRCd, ircc, obsidian, svnbot, etc) - most of these recieve incredibly infrequent updates or I simply don't care about them that much, so I'd be willing to set an update interval of a month on all except the active ones. This might be a sensible default for people to change, too.
Building on this, it would/should then be possible to decrease update checking if a project decreases in activity, and re-increase the updates as it increases in frequency - so their interval would be hit with a fuzzy modifier to balance things out a bit.
Diffs Modify counting to only count changes on diffs, e.g. read each changeset individually via svn diff (or equivilant) and do counting on that, applying the delta to the overall project statistics. You may already do this, I confess to not yet having read ohcount's source.
Notificaition Instead of (or possibly in addition to) this, you could operate the updates on a notification basis: that is, the projects that notify you of updates (via a post-commit hook in SVN or whatever) get bumped further up the queue - or perhaps the queue could be entirely comprised of this, but I am not sure how well other VCS support hooks like this. Advantages are that it would update projects that most "need" updating, possible disadvantage being that it continually bumps a really huge project which means frequent recounts of something large.
Time based Schedule delays in updates depending on how long it takes to line count src - sure, you'd be talking small amounts, but they all add up - so more frequent updates of more projects is going to be beneficial. Downside is that this is rather a superficial stopgap to the problem.
Future: Either allow client-based (as in the people that use ohloh) distributed counting (probably not the most practical of ideas, and perhaps prone to over/under reporting), allow projects to supply their own count data (via a callback HTTP page or something?) - more suitable, but again prone to people doing stupid things for a laugh, so perhaps correct with a periodic ohloh recount (once per month/two)..
--
Just some random thoughts on helping scalability. I do a lot of scalability based stuff at work, though mine is in a slightly different arena of the web, so it was interesting to put my brain to use like this :)