Reprocessed, by Matt Patterson

Something approaching a weblog

dpkg-gem: Ubuntu and Debian package building for Rubygems

Well, it's been a couple more weeks than I promised (that Christmas, it gets everywhere), but here's the first follow-up from my dpkg-tools from December.

My big problem with anything in deployment is the ease with which you can get it onto your application server, and the degree of confidence you have that it will work right, and work right first time. I've always liked Debian packages because setting up a system was made very easy: you could guarantee that the a given piece of packaged software would install trivially (the apt-get install package-name dance), and would work -- since Debian packages know about their dependencies, attempting to install Apache's mod_php (for example) would make sure that Apache and PHP were also installed, and install them if they weren't.

After several years spent seeing the evidence of the difficulties manually installing and configuring software on a large (at least, larger-than-one) array of servers I started to think that we should just package everything as OS packages, because at least then you could properly test a from-scratch deployment before the fact, and get that all-important guarantee that not only would things install easily, but they'd actually work. My pet irritant here is that if you try to install the MySQL rubygem on a Ubuntu box without the MySQL client installed, it won't work, since it needs to be compiled against the libmysqlclient shared library. You also need a C compiler (and basically all the packages covered by Debian/Ubuntu's build-essential metapackage). If you can get that rubygem into a package, then you can just install it, and all the dependencies will come right along...

So, that's what dpkg-gem is for. It takes a rubygem and makes it into something which can be built as a .deb package using the standard .deb package building toolchain.

The 10,000 foot overview

The .deb package building process for non-native (by which is meant not-already-Debianed) software looks a little like this:

  1. Add the .deb package metadata to the software, which basically amounts to adding the debian/ directory and its various files.

  2. Make any changes you need to to bring the software into compliance with whatever Debian policy points are relevant and are needed to actually make it build. Add package metadata to debian/, like build and install dependencies. Making it build may well involve changing the debian/rules file, which is a Makefile with specific targets for the various (defined in the Debian Policy Manual) stages of package building (triggers for make and make install or their equivalents). There's some overlap with #3 here, since you'll use package builders to test building works.

  3. Actually build .deb and .dsc packages (reliably and repeatably). You do this using a tool like dpkg-buildpackage (Part of the dpkg-dev package), which performs all the build steps necessary and spits out a .dsc source package, a .deb binary package, or both at once. You can split the process in two, just generate source packages and gets those built using an automated builder, which is what Debian and Ubuntu do to generate all the packages in their distributions -- thousands of packages are too many to do by hand

  4. Make the resulting binary packages available from an APT repository, which basically means putting things on a web server, along with some specially generated indexes for apt-get. There are a variety of tools for doing it.

How dpkg-gem fits into this

The hard bits about turning a .gem into a .deb are numbers 1 and 2 above, and where dpkg-gem comes in. Given that gems are themselves packages, there's an awful lot of redundancy if you create all the .deb metadata and build scripts by hand, not to mention plenty of scope for handwork introducing discrepancies. The solution I chose was to using Rubygems own classes to introspect a gem's own metadata to create the .deb metadata, and to use Rubygems itself during the build process. The resulting package would then be, essentially, the things that would have been added to your filesystem after you'd run gem install. The process looks something like:

  1. dpkg-gem downloads the gem into a directory it creates named gemname-rubygem-X.Y.Z
  2. Next, we introspect the gem and create the debian/ directory and the various metadata files: debian/control, debian/changelog, debian/copyright, and debian/files.
  3. Now we copy the default dpkg-gem debian/rules script, which is actually a cheat. The script is really a Rakefile, so gets copied into the root of gemname-rubygem-X.Y.Z (as Rakefile). debian/rules is a one-line shell script that invokes rake and passes in all its arguments. This is to get around the fact that a 'normal' debian/rules file is actually a Makefile, with the shebang line #!/usr/bin/make -f. make's quite clever and will act like a shell here, sucking in the rest of the file. rake's not (yet) able to do that, so we need this little bit of indirection.

Once we're there, the standard .deb toolchain will work just fine: All the metadata is there, and we have a Rakefile which behaves like a debian/rules file should. There is a caveat. Currently, you need to have Rubygems 0.9.4 to make all this work - because of certain limitations that were present in Rubygems (problems with choosing which version of a gem to grab, and problems putting binary scripts in the right place, primarily) I had to delve fairly deep into the internals in a couple of places, and it's broken in 0.9.5 and above. You also need my Rubygems package for Debian / Ubuntu, because the as-standard Rubygems package has some really quite nasty hacks in it, which would break system-installed (i.e. installed from packages) gems. You can get the two .debs from Rubyforge.

On the assumption that you're on a Ubuntu box, here's an example which installs dpkg-tools and packages a simple gem with no dependencies (but with some C code in it, for a little spice).

# first let's install Rubygems and dpkg-tools
# assuming we've downloaded the Rubygems .debs
$ sudo aptitude update
$ sudo aptitude install build-essential
$ sudo dpkg -i librubygems-BLAH
$ sudo dpkg -i rubygems-BLAH

# nasty little first-time bootstrap coming up
$ sudo gem install rake dpkg-tools

# now for the first packaged gem, Rake. Then, dpkg-tools itself
$ dpkg-gem rake
$ dpkg-gem dpkg-tools

# note the -d option to dpkg-buildpackage. It turns off dependency checking
# You really don't want that usually...
$ cd dpkg-tools-rake-0.8.3; dpkg-buildpackage -d -rfakeroot; cd ../
$ sudo dpkg -i dpkg-tools-rubygem_0.8.3-1.all.deb
$ cd dpkg-tools-rubygem-0.3.5; dpkg-buildpackage -d -rfakeroot; cd ../
$ sudo dpkg -i dpkg-tools-rubygem_0.3.5-1.all.deb

# now we'll be able to satisfy gems' build dependency on dpkg-tools itself
# and build the RDiscount gem as a package
$ dpkg-gem rdiscount
$ cd rdiscount-rubygem-0.BLAH; dpkg-buildpackage -rfakeroot; cd ../
$ sudo dpkg -i rdiscount-rubygem.BLAH.deb

Setting up the build machine is the biggest bit (but even that only requires 8 commands).

If you were to package a gem with dependencies (like Rails), then dpkg-gem pulls all the dependencies down too, so you don't have to worry about missing anything.

What about packages with native dependencies?

This is the hardest part of any automated package generation. My approach so far has been to maintain a lookup table in dpkg-gem itself which matches gems with native dependencies to the package providing that dependency, so the MySQL gem gets its package control file generated with a build dependency on the libmysqlclient-dev package. (Embarassingly, I've just noticed a bug - we don't generate install dependencies. Oh well, a ticket has been filed.) I'm quite unhappy with the lookup approach and am looking into other methods. Currently apt-file looks like the best approach, but to properly automate that at the moment needs some kind of lookup, but probably not such a brain dead one. Currently, dpkg-gem only knows about gems with native extension dependencies which I've been using and have told it about, which means the MySQL gem. If you want more in, then I'll be adding more entries to the lookup, and please file tickets at the Lighthouse project.

Package building and distribution

It's one thing to be able to turn a gem into a package and install it locally, but unfortunately it's quite another to be able to install it using aptitude or apt-get on many machines. Steps three and four from the 10,000 foot overview are the industrial application of the packaging techniques, the bit which makes them most useful, and the bit which is the hardest to effectively manage.

Debian and Ubuntu have automated package build farms. There are packages like pbuilder and sbuild, which automate satisfying build-dependencies and provide guaranteed-clean build environments through chrooted minimal OS installs (and other techniques).

Distribution, at least as far as APT is concerned, is a matter of setting making a filesystem hierarchy which APT understands available, either locally or via a web server. There are a variety of ways to do this, from the 90% hand-rolled approach, through to more automated solutions. The end result is that you have a network-accessible package repository so you can easily deploy your packages to one or a number of machines. I use this, and have a dedicated repository with all my Gem related packages, as well as application deployment stuff, which makes setting up new machines (I use VMs, so tend to just spin up a new one if I want to test something out) a breeze.

If you want to hand-roll an APT repository, you can see this Debian repository HOWTO. If you want a more automated solution try reprepro. When I set up a reprepro-based APT repository, I found this Reprepro tutorial really handy.

Next steps

There are a couple of obvious next steps around dpkg-gem itself, involving native extensions with native dependencies, and updating the code to work with current Rubygems itself, which hopefully now means less mucking with internals, and definitely more engagement with Rubygems folks (around API issues and to test out some ideas I have around native dependencies).

The other pressing issues (for me at least) are automated package building, which means writing a dependency checker and wiring it into pbuilder or sbuild, and making a public APT repository for gems available. This is something I've been talking about with a few people since early last year, so I guess I should get a shift on.

Further into the future, the next obvious thing to look at is to make dpkg-gem (and the whole dpkg-tools) suite package-format agnostic, and be able to produce packages .rpm and other package formats with the same toolchain.

Phusion's DebGem

In the time between starting and finishing this post, Phusion have released their DebGem product. My first impressions are that it's great that someone else thought this was a problem worth solving, but I need to take a closer look. I'm interested that they haven't adopted a package suffix (like dpkg-gem's -rubygem suffix) to prevent package name clashes (there are thousands of Debian packages, so clashes are inevitable). But it looks pretty good.

Not forgetting:

This page is: