CIA

[22 total ]
Google revisited

Hi folks!

Just a quick post for all the folks on google code: We have sane hooks now.

How does it work? The folks at google came up with quite an elegant system
for allowing pretty much any commit hook,
without them needing ... [More] to run any custom code on their end:
the HTTP POST hook.

How does it work?
Basically, whenever you make a commit,
their system generates a hunk of easily-parsed text
(in a format called JSON) representing commit information,
and submits it to some webserver you specify via a HTTP POST.

How do I use it?
You need to be the administrator of your google code project.
Then, click on the Administer tab where you configure your project.
This time, you need the Source page.
There, on the bottom, you'll have a text field called Post-Commit URL.
Enter the following URL:

http://cia.vc/deliver/simplejson/

That's it. All commits you make from now on should be reported to CIA.vc
(with project name being what the project is called by google code).

From what I've seen, latency is pretty good, some 5-10 seconds from commit to IRC.

Once you've set this up, remember to turn off polling (or the email to our googlecode mail parser), since you'll get one notification per subsystem noticing that something happened.

Is this only for google?
Unfortunately, so far it is.
However, their idea is quite a good one:
This kind of hook does not need a lot of work on the server side,
the JSON is easy to generate (aka "hack up") in pretty much any language,
and it should extend nicely should some other VCS need different entries.

So if someone feels like writing a HTTP-POST hook for some popular version control system, they might come in useful.
Maybe we can convince some other hosting providers to supply a similar system.

So Far,
Karsten "BearPerson" Behrmann [Less]

Filtering Google

Oh no, another post within the same month!
Horsemen spotted in the sky, flying pigs imminent!

Anyway, where was I...

Right, there's now a way to get commits from Google Code into CIA.vc
that doesn't go through the hackish ... [More] SVN poller. Read on for details.

SVN and hooks
As we all know, there's a new player on the block of public open-source hosting:
Google Code. Apparently, as a whole bunch of people are using it,
they do their stuff fair enough:
SVN, wiki, bugtracking, downloads, the works.

Wait, no CIA.vc hooks? Nope. Back in the day with sourceforge and CVS,
you could install your own hooks, so things worked. When sourceforge added SVN,
people couldn't add their own hooks, but I gather the web config interface
has a "CIA.vc hook" checkbox. But without hosting provider support,
SVN users are out of luck when it comes to custom hooks.

The SVN poller
That's why we added the SVN repository poller. It wakes up a few times an hour
(or whenever any mail arrives on a special address) and scans the configured
repository, checking if there were any new revisions since it last looked.
If there are, it enters a new commit with the right data into the system.
So usually you'll set it up with the default poll delay of 15 minutes (anything below what it can make, roughly a poll every 20-30 minutes with our load,
gets silently upped to that interval) or, even better,
subscribe the ping email address to your commit mailing list.

That works well enough, and produces excellent XML commit data,
but it's a bit hackish - the polling is horridly inefficient,
and I need to run yet another service on our poor machine.
Also, it's sometimes a tad slow.

Parsing E-Mail to XML
So I figured "If we already get the commit data from the mailing list,
can't we use that somehow?" and resurrected an old set of scripts.
What we can do now is pick up mail sent out by Google Code
(via the "Activity Notifications/all subversion commits" field on the
"project summary" pane of the "administration" tab)
and try to parse the commit.

The results are not as fancy as the repository polling,
those mails were meant for humans to read, not machines, after all.
So I can't always figure out filenames,
and since there's no unambiguous "end of Log message" tag
I currently cut log messages at the first empty line.

But I think it's slightly faster than the polling method,
and it's certainly more elegant. It's still not instantaneous,
mostly because it seems Google's email machinery takes a minute or two.
But if you want, use it! Simply change the mail settings (or mailing list)
to send to "cia googlecode@" instead of "ping whatever@".
(And turn off periodic polling, if you have it enabled,
or you'll end up getting your commits twice)

Feedback is always welcome!
(the nick I sign my posts with, at this domain; or just comment on the blog)
We'll probably have a few corner cases I didn't catch,
but with some work we should be able to turn this into
yet another Good Way to get commits into CIA.vc!

You're with google?
Note to any google employee (especially the google code folks) reading this:
I'm sure that when we work together, we can do better.
Drop me an E-mail and I'll work something out.
I'm a geek, so pretty much any method you come up with to pipe commits,
I can handle.
It's always nice to see your commit show up on IRC
while your finger is still hanging over the enter key,
and we should be able to make that happen.

So Far,
Karsten "BearPerson" Behrmann [Less]

Reporting in

Umm, yeah, that's right, I'm still here.

Instabili... segmentation fault (core dumped)
Sorry if we've seemed a bit unstable lately - we've had a couple of unexpected (and partly unexplainable) problems crop up, and while I take care of ... [More] things whenever I see them come up, that seeing part could be improved somewhat 8)

Things should be looking better soon, though.
I've worked a bit on infrastructure that'll allow me to notice problems earlier and better,
as well as made sure I get (and read) information from all the pieces of the system when stuff goes boom.

On the hosting side, we have a very interesting lead
that may see significant improvements to our system.
I don't want to give out any details as long as I don't have anything solid,
so I don't make anyone look bad by accident,
but stay tuned to this channel for more information when we have it.

Recent Changes
What else have I been up to?

I've gone and scrubbed our blog comment spam.
I already disabled links in comments a while ago,
to cut down on all those people who saw "cool, a blog with open comments"
but didn't see the rel="nofollow" we put on all comment links.
We still got a lot of idiotic comments, though,
which I can only guess must be some kind of "Hey guys, here's a blog with open comments"
magic strings in the spam community. Or something. Well, gone now.

So I'll just have to periodically scrub the comments.
I don't want to take direct steps against automated comments just yet -
if you've got an RSS reader that allows you to directly post comments,
I applaud your ingenuity.

On the subject of spam, it seems someone got the idea to use the project pages for spamming.
Let's hope that trend doesn't continue.
I'd hate to have to set up a wikipedia-like army of "recent changes" monitors.

In what's probably our most important component, the IRC bots,
I've tuned the freenode settings - they should connect much faster now when stuff is restarted,
and fixed a bug that prevented them from properly connecting to EFnet.
I hope I've also increased the general connect speed to any network,
but we'll have to see how that goes next time it needs a restart.

Open issues
What will I be working on?

Obviously, the hosting change I've hinted at above
is going to take some working-out.

Also, we've had some interesting bugs with unicode / UTF-8 in commits.
I muchly hope we're at the point where commits get through,
no matter the charset,
but I'm afraid we currently replace all 8bit characters with '?'.
Pieces of the core don't work happily with unicode,
we'll need to fix that.

The advent of distributed SCM's like git have seen an (almost) entirely new problem:
Currently, most hooks take each commit pushed into the central repository
and send out a notification about it.
Normally, that's exactly what you want.

Things get a bit interesting if someone checked in from his vacation
to deliver the 100 commits he wrote while away, and they take a machine-gun-march through the system.
Or occasionally I see someone merging branches, and the hook script sending a commit for every merged commit.
We'll have to change the hook scripts to detect this kind of thing and just say "push of 100 commits"
or something. Stay posted, when I get such a script I'll put it up and tell everyone to use it ;)

I guess an alternative that might work for some people would be this:
Instead of putting an on-push hook on the central repository,
put an on-commit on each developer's repository -
that way, you'll get instant notification what everyone is working on,
and can ask him/her to push it up when it looks interesting.

Oh whee, that got much longer than I intended.
I'd better stop and get some actual work done again now ;-)

So Far,
Karsten "BearPerson" Behrmann [Less]

Still alive, honest!

Hi folks, this is your local programmer speaking.
As you may have noticed, cia.vc hasn't exactly set records in reaction times and reliability lately (error 500, anyone?). I'm one of the folks currently working behind the schemes to keep things ... [More] up and running.

You may also be aware that CIA's original author and maintainer, Micah Dowty, has been fairly quiet lately and isn't working on it as much as he used to.

To squash the rumors, I'd like to state that these two things are not related ;-)

Introducing myself
I'm one of the people who jumped in to do maintenance and code work. Some may know me as "BearPerson" from the freenode network or from the Source Mage GNU/Linux distribution.

Anyway, I'm afraid I haven't had much time to spend on code and mostly did maintenance lately ("Is CIA down?" - "Unlikely, let me see... Ah, it's knee-deep in swapspace, let me clean that up...")

However, I plan to spend some time on the code, especially optimizing and removing a few "walls" we're hitting currently.

So while the situation right now isn't exactly ideal, it's being worked on, and I hope that Sometime Soon we'll have a CIA.vc that's nicely responsive again even when there's a bunch of load on the system.

For the curious
What is the problem right now?

Well, in a nutshell, it's one of scale. We've grown. I count 3582 projects in the stats, or 6639 counting sub-projects. Whenever I see the web part causing high load, I can count on seeing several search engines' web crawlers in the http logs.

While we should be able to cope with the load (only a couple of commits on average per minute), there's a bit of trouble when a "backlog" of requests temporarily builds up in memory and we end up escaping to swapspace, slowing down request processing and building the backlog even further.

So, while throwing more hardware at the problem would push out the wall further, I'm going to focus my time on making things work on what we have right now. Because they should. I'm a programmer, not a sysadmin.

I guess that's it for now. Please bear with us as we live through the current bumps in the road :-) And in the meantime, feel free to come into #cia on freenode with suggestions and problem notices, I'll be there.

So Far,
Karsten "BearPerson" Behrmann [Less]

New Google Code project

To encourage collaboration, I set up a new mailing list and Google Code project for CIA. Development has been moved from the old subversion repository (at svn.navi.cx) to Google Code.

If you want to help with CIA development or administration, please use the new mailing list. Also, please use the new bug tracker :)

CIA in moderation

Overzealous bots
Christel of Freenode fame just posted a blog entry on the annoying retry-on-ban behaviour of CIA bots. It has some hints for Freenode users who find a rogue CIA bot in their channel. Banning the bot will cause it to periodically ... [More] attempt re-joining. Quieting the bot ( q) and/or removing it using CIA's web interface are the recommended immediate solutions.

This is a long standing bug in CIA's bot daemon. It has a strong self-preservation instinct. If an action it attempts times out, it will retry. If it detects that it isn't in the channels it expects to be in, it will try to correct that. These were important features in making the bot network as robust as it is now, but these features are problematic when the bot encounters some server-side feature (like a ban, a channel key, or a redirection) that it doesn't understand.

Ideally, the bots need to understand these features and report them back to the CIA messaging daemon and the web interface in a useful way. The web interface for your bot should inform you that the bot has been banned, and give you as a user the opportunity to ask it to retry.

I need your help
So, why hasn't this already been fixed? CIA is still a one-man show, and I don't have a lot of time to devote to it any more. In the four years or so since this service was first launched I graduated from college. I got a real job, a boyfriend, new hobbies. What little time I do feel like donating to CIA is often consumed by ongoing maintenance tasks: backups, cleaning up after crashes, acting as moderator, answering e-mail. There's a lot of work to do, and the backlog is growing.

I've been running CIA on my own and paying for it mostly out of my own pocket for quite a while now. I'm just concerned that this isn't sustainable. Unless more people start contributing to the project, I may be forced to discontinue it.

Part of the problem is that, while the CIA codebase is open source, CIA isn't a software project. It's a service. Open source services are much harder to manage than open source software projects. They require consistent and trustworthy dedication, and they require all developers/maintainers to share server resources.

That said, there are various roles you can take on in CIA right now that would be helpful. These range very roughly from easiest to hardest:

Casual supervisor.

Keep an eye on the server, via public monitoring methods (the Bot Cloud page, for example). Make sure it's smooth and responsive, and that the IRC bots aren't being abused. Report problems to a moderator.

Moderator.

Watch for abuse, especially abuse of the IRC bots. Take actions such as disabling accounts or forcibly removing bots. You will need a privileged CIA account.

Support Representative.

Answer e-mail about CIA, answer people's questions on IRC. Currently all the CIA email is going directly to me, so your first task would be to set up a mailing list for CIA inquiries.

Planner.

Where should CIA be going? How are people using it, and how should the project be evolving? How can we best structure the project to allow more people to contribute to its code and administration? You will need to do some research, both on the web and in CIA's codebase. You will probably want a privileged CIA account, for posting blog entries.

Server Admin.

Keep the server running smoothly. You'll need to understand how the various daemons in CIA interact, and you'll be responsible for keeping the machine as a whole running smoothly. Start out by experimenting on the development VM. When you're ready, you'll need SSH access to the primary server (cia-vm1).

Webmaster.

You'll be responsible for fielding feature requests and bug reports that deal with the web-based frontend for CIA. In your spare time, work on larger projects like transitioning the rest of the "old" web frontend (The front page, RSS feeds, stats pages) to run on the new system. You will be dealing mostly with the new web site codebase, which uses Django. Start this job by researching the existing CIA codebase and writing/testing patches using the development VM. When you're ready to deploy to the production server, contact a Server Admin.

IRC developer.

You will need to be skilled at producing robust and scalable networked applications, and you'll need a good understanding of the IRC protocol and CIA's requirements. CIA's bot server will need to be retrofitted or replaced in order to prevent abuse and improve scalability.

Core developer.

You'll need to understand the big picture of CIA's codebase, including the RPC daemon, the database, and the stats system. You should be experienced with SQL, but unafraid to get your hands dirty with lower-level data storage mechanisms when necessary. Core developers will be responsible for continued development and improvement of CIA's central message delivery and storage/retrieval mechanisms. Large projects in this arena include XMPP integration for publish/subscribe, improving scalability of the stats system, and implementing advanced message searching capabilities.

I hope this list inspires at least a few people to contribute. If you're interested, the best place to go is the #cia channel on irc.freenode.net. There is no development mailing list for CIA yet, but feel free to create one.

The best ways to contact me currently are IRC and e-mail, though I know I'm pretty far behind on dealing with CIA e-mail at the moment. (This is why the Service Representative job above is so important ;) I'll try to check my CIA mail more frequently, but the first few volunteers to join this project will need to be capable of working relatively independently. I don't know how much time I'll be able to devote to mentoring and bootstrapping. For CIA to be successful, we will need volunteers that are really motivated and passionate about the project.

Thank you all in advance for your continued interest in CIA. It is my hope that this service can continue operating far into the future with a true community of developers and admins backing it. [Less]

New IP address

Due to re-routering and such at ye olde datacenter, CIA.vc is now on a new IP address: 208.69.182.149.

The old IP will continue to work for about a week, but if you have any old /etc/hosts entries or DNS caches, it's time to flush them now. Also, don't panic when the IRC bots start logging on from a new address now.

That is all.

Run your own CIA virtual machine

I just posted a new release of the CIA Development and Small Deployment (DSD) virtual machine.

This is a self-contained disk image which includes a pre-configured CIA instance running on Ubuntu Linux. You can run it using the free-as-in-beer ... [More] VMware Server. Use it to:

Test-drive CIA in the privacy of your own server
Set up an internal CIA server for your company
Fix bugs and develop new features

Changes from the previous release:

Updated to the latest CIA code from Subversion.
There is now a Django admin user by default. (username admin, password changeme)
The blogging module in CIA is set up.
The VM is now a much smaller download.
Compatibility with VMware Workstation 5.5, Server, and Player. I botched the virtual disks in the first release such that it would only run on Workstation 6.0, which is still in beta.
Networking should now start on boot even if the VM's MAC address changes.
Added a bare-bones set of stats:// rulesets, so commits will appear on the web site. You can use the command-line ruleset and metadata editor tools to customize the server's stats collection behaviour.

Download

Download the CIA DSD VM, 2007-04-07 release (152 MB)

Important:
This VM is not secure out-of-the-box. Several steps must be taken manually to secure it. See the development documentation or the notes embedded in the VM. This VM is not recommended for use on public sites unless you know what you're doing.

Tour
After downloading the virtual machine and extracting it, open it in VMware Workstation, Server, or Player. This example will use VMware Server for Linux. Here you can see the VM's release notes, and you may want to edit its networking settings.

The default is to use bridged networking, which gives the VM an IP address visible to the LAN your physical machine is attached to. I switched this to NAT on my laptop, since it isn't always connected to a network.

Power on the VM and let it boot. You should see a typical Linux console login prompt. Log in as cia with the default password changeme. If the VM was able to acquire an IP address, you should be able to see it with ifconfig. In this example, ifconfig tells me that my VM has the IP 192.168.42.128. Sure enough, I visit http://192.168.42.128 in a web browser and I'm greeted with a CIA front page that has no projects or authors listed.

So, what works?

You can send commits to this server over XML-RPC, and they'll show up on the web. (You may want to customize the stats:// rulesets and metadata, however, using CIA's command line tools.)
You can log in as the default admin user, or you can create a new account.
Users can manage their metadata just like they do on the main CIA site, and they can create IRC bots.
Repository polling should work, however all features that require e-mail (including the repository pinger) will require additional setup.
The documentation browser and blog work. You can customize the documentation content by editing the files in ~/cia/doc, and you can make blog posts over the web as any registered admin user.

An exercise for the reader
Other features may require additional setup, and there are several additional steps you'll want to take before deploying this server. This virtual machine image is still experimental. I can't guarantee any support for it, but I'll try to help if you email me or drop by the #cia channel on Freenode.

This VM is really just a starting point. Patches welcome :) [Less]

Yahoo beats Google on 301 redirects?

Last Friday, March 23rd, this site changed domains from cia.navi.cx to cia.vc. This was achieved by sending an HTTP 301 Permanently Moved response to most types of requests made on the cia.navi.cx domain.

The resulting data surprised ... [More] me:

Hits from Google began declining the moment the site moved, and they've been declining since:

Hits from Yahoo are increasing:

There seems to be a noticeable difference between the ability of Google and Yahoo to properly deal with these redirects. I'm not just talking about their ability to follow the redirects themselves- I'm referring to their ability to quickly and accurately update their index and page rankings accordingly.

It's possible that the hits from Google will be back to normal after Google has finished fully re-indexing the site, and it's also possible that this is a statistical fluke. However, I would still assert that Yahoo is handling the move much more gracefully. First let's try a Google search for "cia.vc":

Now let's try Yahoo: [Less]

CIA Search (beta!)

As of today, all pages on the CIA site should have an interactive search box at the top, just left of the navigation links.

The quality of the search results will be somewhat mediocre. Currently it's doing a simple substring search on stats ... [More] paths and titles. Sorry, it can't search the actual commit messages yet.

Browser compatibility is also limited at the moment. It should work on IE6 , Firefox, and Safari, but there are sure to be bugs.

The result quality and the accessibility will be improved over time. If you have any bug reports or suggestions, feel free to leave a comment. [Less]