Reprocessed, by Matt Patterson

Something approaching a weblog

Introducing dpkg-tools: Ubuntu and Debian package building for Rubygems and more

Something I found myself wrestling with a lot towards the end of my time at the BBC, and something I've been thinking a lot about since, is the deployment of entire servers, not just the deployment of software onto servers. I've spent nearly ten years working with Linux distributions and their OS package management systems, primarily Red Hat and Debian-based systems (Ubuntu, lately). I've also spent a lot of time working with Python and Ruby's OS-independent packaging systems (Distutils, a tiny bit of Setuptools, and Rubygems).

What I've been wanting to do is to tie down configuration management and deployment processes so that Rails apps, and the server they ran on, could be easily deployed as a Virtual Machine on a machine running the Xen virtualisation system. Basically to tie the server configuration management problem much more tightly to the application deployment problem.

Systems such as Puppet tackle one aspect of the problem: the question of what to install, and the related question of how it should be configured. I'm interested in that, but mainly I've been interested in the question of how do you get things -- apps, supporting libraries, supporting software services -- onto a server in a sensible fashion, and how do you ensure that they're going to work with the base server's software and OS. Debian and Red Hat's answer to these questions has been OS packaging: rpm and yum for Red Hat and dpkg and apt for Debian. With this you get pre-compiled binaries which are guaranteed to work because they were compiled against all the same libraries that you'll have installed, and you get dependency tracking and resolution. Dependency tracking means you can guarantee (assuming a quality package) that you've got everything installed that needs to be installed, and that everything's the right version to work with everything else. This is great, and is one of the things that's made Linux server distros so compelling - if you need an X (mail server, web server, whatever), there's a good chance you can get it installed and minimally working with a single command.

OS-independent packaging systems for languages, like Rubygems, solve a very similar problem. The difference is that they assume that you're able to sort out the underlying OS stuff yourself, making sure you've got the right C compiler, or supporting libraries, or whatever installed. What they give you is a cross-platform way of getting their language-specific stuff installed, often with dependency tracking, but just for other packages of the same kind, so Rubygems gives you dependency tracking which covers other Gems, but not the OS.

Which brings me back to Rails apps. The big problems with Rails apps and getting them installed and working is making sure that all the Ruby bits, and any supporting C stuff (like the MySQL bindings, or libxml2), are installed, and installed correctly. So, we have a mechanism for getting a Rails app onto a box -- application deployment -- in Capistrano, a mechanism for installing and managing Ruby libraries in Rubygems, and a mechanism for installing and managing the underlying OS in dpkg/apt and rpm/yum. What we don't have is an easy mechanism for tying the OS layer to the Ruby layer in a meaningful way. With a tool like Puppet we can get the result we want, but we have to jump through more hoops: We need to figure out what OS stuff is required for the Ruby stuff we want to deploy, and then get Puppet to download (and often compile) the things we need. It's a bit brittle though: Where OS packages have their dependencies listed and baked in to the package, with a Puppet recipe that compiles up some arbitrary piece of software (the MySQL Gem, for argument's sake), you need to list any dependencies in your recipe.

The problem is pace layering. The OS packages are a different layer and move at a different speed to your Puppet recipe. So, a security update might force a change to the OS MySQL client libraries which means that your MySQL Gem no longer works. If you catch this beforehand and update your Puppet recipe, once the security update has been deployed you can recompile the MySQL Gem, for minimal downtime. If you don't catch it you have a machine where your App is suddenly no longer able to talk to the database, even though all the MySQL command line tools say that everything's fine.

My approach to this problem has been to bake the Ruby layers into OS packages, so that I'm using one dependency declaration and resolution system. This allows you to do several really cool things, not least simplifying the number of commands you need to use to get your app deployed. This blog is running on Ubuntu, and to install the entire system from bare OS (caveat: it's a VM on a Xen host and my image creation script added my private Apt repository to Apt's configuration) required one command:

aptitude install reprocessed

Then I did a cap deploy, and hey presto, everything worked.

I don't see this as a replacement, or even as a competitor, for systems like Puppet. I think that instead, it allows you to be more declarative in your Puppet recipes. Instead of saying 'This machine should be an app server running X, so install Y package, download and compile Rubygem Z...' you can simply say 'This machine should be an app server running X'

I'm a big fan of the Ubuntu Linux distribution, which is Debian-based and so uses .deb packages and the dpkg and apt infrastructure. The set of tools I've built to work with these packages are called dpkg-tools. You can go and look at dpkg-tools on Github, where you can get the source, and where most of the non-RDoc documentation will live. There's also the Lighthouse page and the Rubyforge project, which has all the RDoc. dpkg-tools is split into three main parts: dpkg-gem, which facilitates building .debs from Gems; dpkg-etc, which makes it really easy to package up system configuration information; and dpkg-rails, which makes packages from Rails apps and handles declaring configuration for an Apache 2/Mongrel/mod_proxy_balanacer setup.

A brief digression into motivations and potential political pitfalls

It's worth noting that there's been a lot of controversy over Rubygems and Debian -- see Debian's pkg-ruby activity's policy statement, and Pelle Braendgaard's RubyGem is from Mars, AptGet is from Venus blog post. My personal opinion is that, while Rubygems has its fair share of issues, it's faced with a very different set of constraints from Debian's, and has to operate across wildly different operating systems. Because of this, I think that Debian's position is a little harsh, and perhaps misses the point. I also think that the major bone of contention, the FHS (Filesystem Hierarchy Standard) has some serious blindspots, particularly around dynamic languages and what happens when the line between library and directly-executable begins to get very blurry. Debian's Ruby maintainers group has seen fit to respond to this by making some very strange changes to the Rubygems library itself, and shipping this patched version as the official Debian Rubygems package.

My solution, therefore, is not something I'm proposing as an official solution to replace all the existing Debian Ruby packages, rather, it'a a more pragmatic solution that is firmly targetted at my needs. I provide what amounts to a 'normal' version of Rubygems-the-library, and then make .deb packages of Rubygems that put all the bits where the normal Rubygems expects to find them. While I'd love to see Debian-based distros and Rubygems play nicer together as standard, when I started this I didn't have the time to wait, and I didn't think I understood the issues well enough to make a considered contribution. Hopefully it'll prove useful to some people, and will at least provide a talking point for others...

Back to the subject at hand

I'll be covering each of the bits of dpkg-tools in future blog posts, starting with dpkg-gem, the Rubygem packaging part, early next week. In the meantime, please direct comments to the usual address (matt at this domain).

Not forgetting:

This page is: