Categorisation (and its discontents)
I have a few problems with the way that categorisation works in blogs. The current approach is typified by a filing cabinet approach: things exist in a category, and can exist secondarily in other categories. This is all well and good, but it seems to me that this kind of noun-based classification misses out on vast tracts of the kinds of relationships between what we write and the links between them and the wider world (which is essentially what categorisation is about, after all: we map our mental model into terms which other people can use to help search our stuff).
There are other mechanisms in use, faceted classification being the other big one (there's a good introduction at Boxes and Arrows. And notes on implementation of such a scheme at Pixelcharmer's blog. The basic idea there is that you record information about several aspects of your post, and you find things by querying the facets 'Show me things with topic="gubbins", media="video", and time="last tuesday"'. This is all well and good too, but it does seem, um, sledgehammerish.
I like some aspects of the faceted system, particularly the ability to specify what you're saying about something (for example, this post could be 'theorising about categories'). I like the idea of being able to add a verb to qualify the category noun, allowing me to distnguish between theorising about categories and code examples about categories.
I also like Mark Pilgrim's 'eccentric category names' which, at first glance, suggest a way of using ambiguous category names ('Those that tremble as if they were mad', 'Those that resemble flies from a distance') to allow for things that kind of go together to be happily put in the same file. On closer inspection, it seems that these categories are, more or less, just ordinary categories with extraordinary names. This is a bit of a shame, (and I'm ready to be corrected) but not too surprising, because ambiguous categories don't help search any.
What I want, then, is something which combines ambiguity with useful search. I don't want to categorise posts, really. I'm happy to say this is post is like that post, but really I'd prefer if my system did that for me. Corpora software's Jump does some or all of this, which is quite an exciting prospect. Clare and I met up for coffee with their senior linguist at the weekend -- he's a friend of hers from Uni -- and it all sounds pretty good, and beats Bayesian filtering on the head.
So, I write a post, send it to my server, and wait. The server does an analysis on the post and notes that it's like some other posts I've written. It makes a note of this, and adds links to the new post to whichever of its various top level file boxes it reckons are appropriate. It decides on how to make these file boxes by some kind of analysis of the whole corpus of my work, picking out major themes, recurring topics, sensible divisions.
However much I like the idea of that, I'm going to be stuck with Movable Type's categorisation for a while yet, so I'll maybe start trying to refine the way you can work with these: Pixelcharmer's egregious use of regexes to gain faceted classification (although Tim Appnell has made this into a plugin) in Movable Type suggest that you can do a lot with what's there if you're willing to work at it.
- 15.12.2003, 16.49
- File under: information architecture, categorisation, taxonomy, blogs-as-artefacts