Committed to Code

CIA

CIA is a system for tracking open-source projects in real-time over the Web or IRC. Developers can see the latest changes to their code immediately, users can subscribe to see the latest bugfixes in their favorite programs.

This project is managed by toysoldier5.

Project Tags Tagged as rss statistics scm reporting irc version_control web

Code Analysis


News

Google revisited

Hi folks!

Just a quick post for all the folks on google code: We have sane hooks now.

How does it work? The folks at google came up with quite an elegant system
for allowing pretty much any commit hook,
without them needing ... [More] to run any custom code on their end:
the HTTP POST hook.

How does it work?
Basically, whenever you make a commit,
their system generates a hunk of easily-parsed text
(in a format called JSON) representing commit information,
and submits it to some webserver you specify via a HTTP POST.

How do I use it?
You need to be the administrator of your google code project.
Then, click on the Administer tab where you configure your project.
This time, you need the Source page.
There, on the bottom, you'll have a text field called Post-Commit URL.
Enter the following URL:

http://cia.vc/deliver/simplejson/

That's it. All commits you make from now on should be reported to CIA.vc
(with project name being what the project is called by google code).

From what I've seen, latency is pretty good, some 5-10 seconds from commit to IRC.

Once you've set this up, remember to turn off polling (or the email to our googlecode mail parser), since you'll get one notification per subsystem noticing that something happened.

Is this only for google?
Unfortunately, so far it is.
However, their idea is quite a good one:
This kind of hook does not need a lot of work on the server side,
the JSON is easy to generate (aka "hack up") in pretty much any language,
and it should extend nicely should some other VCS need different entries.

So if someone feels like writing a HTTP-POST hook for some popular version control system, they might come in useful.
Maybe we can convince some other hosting providers to supply a similar system.

So Far,
Karsten "BearPerson" Behrmann [Less]


Filtering Google

Oh no, another post within the same month!
Horsemen spotted in the sky, flying pigs imminent!

Anyway, where was I...

Right, there's now a way to get commits from Google Code into CIA.vc
that doesn't go through the hackish ... [More] SVN poller. Read on for details.

SVN and hooks
As we all know, there's a new player on the block of public open-source hosting:
Google Code. Apparently, as a whole bunch of people are using it,
they do their stuff fair enough:
SVN, wiki, bugtracking, downloads, the works.

Wait, no CIA.vc hooks? Nope. Back in the day with sourceforge and CVS,
you could install your own hooks, so things worked. When sourceforge added SVN,
people couldn't add their own hooks, but I gather the web config interface
has a "CIA.vc hook" checkbox. But without hosting provider support,
SVN users are out of luck when it comes to custom hooks.

The SVN poller
That's why we added the SVN repository poller. It wakes up a few times an hour
(or whenever any mail arrives on a special address) and scans the configured
repository, checking if there were any new revisions since it last looked.
If there are, it enters a new commit with the right data into the system.
So usually you'll set it up with the default poll delay of 15 minutes (anything below what it can make, roughly a poll every 20-30 minutes with our load,
gets silently upped to that interval) or, even better,
subscribe the ping email address to your commit mailing list.

That works well enough, and produces excellent XML commit data,
but it's a bit hackish - the polling is horridly inefficient,
and I need to run yet another service on our poor machine.
Also, it's sometimes a tad slow.

Parsing E-Mail to XML
So I figured "If we already get the commit data from the mailing list,
can't we use that somehow?" and resurrected an old set of scripts.
What we can do now is pick up mail sent out by Google Code
(via the "Activity Notifications/all subversion commits" field on the
"project summary" pane of the "administration" tab)
and try to parse the commit.

The results are not as fancy as the repository polling,
those mails were meant for humans to read, not machines, after all.
So I can't always figure out filenames,
and since there's no unambiguous "end of Log message" tag
I currently cut log messages at the first empty line.

But I think it's slightly faster than the polling method,
and it's certainly more elegant. It's still not instantaneous,
mostly because it seems Google's email machinery takes a minute or two.
But if you want, use it! Simply change the mail settings (or mailing list)
to send to "cia googlecode@" instead of "ping whatever@".
(And turn off periodic polling, if you have it enabled,
or you'll end up getting your commits twice)

Feedback is always welcome!
(the nick I sign my posts with, at this domain; or just comment on the blog)
We'll probably have a few corner cases I didn't catch,
but with some work we should be able to turn this into
yet another Good Way to get commits into CIA.vc!

You're with google?
Note to any google employee (especially the google code folks) reading this:
I'm sure that when we work together, we can do better.
Drop me an E-mail and I'll work something out.
I'm a geek, so pretty much any method you come up with to pipe commits,
I can handle.
It's always nice to see your commit show up on IRC
while your finger is still hanging over the enter key,
and we should be able to make that happen.

So Far,
Karsten "BearPerson" Behrmann [Less]


Reporting in

Umm, yeah, that's right, I'm still here.

Instabili... segmentation fault (core dumped)
Sorry if we've seemed a bit unstable lately - we've had a couple of unexpected (and partly unexplainable) problems crop up, and while I take care of ... [More] things whenever I see them come up, that seeing part could be improved somewhat 8)

Things should be looking better soon, though.
I've worked a bit on infrastructure that'll allow me to notice problems earlier and better,
as well as made sure I get (and read) information from all the pieces of the system when stuff goes boom.

On the hosting side, we have a very interesting lead
that may see significant improvements to our system.
I don't want to give out any details as long as I don't have anything solid,
so I don't make anyone look bad by accident,
but stay tuned to this channel for more information when we have it.

Recent Changes
What else have I been up to?

I've gone and scrubbed our blog comment spam.
I already disabled links in comments a while ago,
to cut down on all those people who saw "cool, a blog with open comments"
but didn't see the rel="nofollow" we put on all comment links.
We still got a lot of idiotic comments, though,
which I can only guess must be some kind of "Hey guys, here's a blog with open comments"
magic strings in the spam community. Or something. Well, gone now.

So I'll just have to periodically scrub the comments.
I don't want to take direct steps against automated comments just yet -
if you've got an RSS reader that allows you to directly post comments,
I applaud your ingenuity.

On the subject of spam, it seems someone got the idea to use the project pages for spamming.
Let's hope that trend doesn't continue.
I'd hate to have to set up a wikipedia-like army of "recent changes" monitors.

In what's probably our most important component, the IRC bots,
I've tuned the freenode settings - they should connect much faster now when stuff is restarted,
and fixed a bug that prevented them from properly connecting to EFnet.
I hope I've also increased the general connect speed to any network,
but we'll have to see how that goes next time it needs a restart.

Open issues
What will I be working on?

Obviously, the hosting change I've hinted at above
is going to take some working-out.

Also, we've had some interesting bugs with unicode / UTF-8 in commits.
I muchly hope we're at the point where commits get through,
no matter the charset,
but I'm afraid we currently replace all 8bit characters with '?'.
Pieces of the core don't work happily with unicode,
we'll need to fix that.

The advent of distributed SCM's like git have seen an (almost) entirely new problem:
Currently, most hooks take each commit pushed into the central repository
and send out a notification about it.
Normally, that's exactly what you want.

Things get a bit interesting if someone checked in from his vacation
to deliver the 100 commits he wrote while away, and they take a machine-gun-march through the system.
Or occasionally I see someone merging branches, and the hook script sending a commit for every merged commit.
We'll have to change the hook scripts to detect this kind of thing and just say "push of 100 commits"
or something. Stay posted, when I get such a script I'll put it up and tell everyone to use it ;)

I guess an alternative that might work for some people would be this:
Instead of putting an on-push hook on the central repository,
put an on-commit on each developer's repository -
that way, you'll get instant notification what everyone is working on,
and can ask him/her to push it up when it looks interesting.

Oh whee, that got much longer than I intended.
I'd better stop and get some actual work done again now ;-)

So Far,
Karsten "BearPerson" Behrmann [Less]


Still alive, honest!

Hi folks, this is your local programmer speaking.
As you may have noticed, cia.vc hasn't exactly set records in reaction times and reliability lately (error 500, anyone?). I'm one of the folks currently working behind the schemes to keep things ... [More] up and running.

You may also be aware that CIA's original author and maintainer, Micah Dowty, has been fairly quiet lately and isn't working on it as much as he used to.

To squash the rumors, I'd like to state that these two things are not related ;-)

Introducing myself
I'm one of the people who jumped in to do maintenance and code work. Some may know me as "BearPerson" from the freenode network or from the Source Mage GNU/Linux distribution.

Anyway, I'm afraid I haven't had much time to spend on code and mostly did maintenance lately ("Is CIA down?" - "Unlikely, let me see... Ah, it's knee-deep in swapspace, let me clean that up...")

However, I plan to spend some time on the code, especially optimizing and removing a few "walls" we're hitting currently.

So while the situation right now isn't exactly ideal, it's being worked on, and I hope that Sometime Soon we'll have a CIA.vc that's nicely responsive again even when there's a bunch of load on the system.

For the curious
What is the problem right now?

Well, in a nutshell, it's one of scale. We've grown. I count 3582 projects in the stats, or 6639 counting sub-projects. Whenever I see the web part causing high load, I can count on seeing several search engines' web crawlers in the http logs.

While we should be able to cope with the load (only a couple of commits on average per minute), there's a bit of trouble when a "backlog" of requests temporarily builds up in memory and we end up escaping to swapspace, slowing down request processing and building the backlog even further.

So, while throwing more hardware at the problem would push out the wall further, I'm going to focus my time on making things work on what we have right now. Because they should. I'm a programmer, not a sysadmin.

I guess that's it for now. Please bear with us as we live through the current bumps in the road :-) And in the meantime, feel free to come into #cia on freenode with suggestions and problem notices, I'll be there.

So Far,
Karsten "BearPerson" Behrmann [Less]


New Google Code project

To encourage collaboration, I set up a new mailing list and Google Code project for CIA. Development has been moved from the old subversion repository (at svn.navi.cx) to Google Code.

If you want to help with CIA development or administration, please use the new mailing list. Also, please use the new bug tracker :)


Read all CIA articles…

Edit RSS feeds.