Handling files - a prelude to building a Desktop content graph

Following my earlier series of articles on building a content graph (based on content in web content management systems), I am now looking at whether I can do something similar for managing desktop content (that is, content in files).

In this article I am exploring various strategies for managing files in order to be able to find them again later. If you have previously read my articles you will know that I am no fan of full-text search engines. It may seem perverse given that these tools are ubiquitous, and indeed I use them just like everyone else.

But search engines are blunt instruments, and all suffer from the dual problems of poor precision and little capacity for handling information structure. Operating systems don't do much to help here; for example MacOS allows you to tag files in the Finder. but you can't search for tags using Spotlight. There are some commercial tools to help you to find files (Find Any File and HoudahSpot on the Mac, for example). There are other applications that allow you to build content databases with internal indexing, and I've tried many of these. I'm just going to talk in detail about one of them, which is the one I use most of the time now.

Devonthink 3

This is one of the leading information management tools available for the Mac. It is a kind of file-based content database with, in addition to full-text indexing, support for tagging using Finder tags and a lot of (too complex for me) other support tools. When you import (or index) documents into Devonthink it will look for Finder tags and apply them within the program. One of the navigation mechanisms is by tag, so it's easy to find which content items have been tagged with a Finder tag.

Devonthink is also useful in terms of the breadth of content types that it will handle. As well as text, it works well with rtf(d), pdf, MS Office and LibreOffice files, various graphics formats and many others.

Devonthink has one other design feature that works particularly well for me. You can use the program in two different modes, with the content held either internally or externally. With an internal database, you import documents into the program's database. From this point on, the original file is not used by Devonthink; instead, the indexing and general management of the content happens within the Devonthink database, which is basically a single large file.

If you prefer, you can manage your Devonthink content externally. In this case, you simply point the indexer at the content in the file system and Devonthink indexes it. The content itself stays untouched in the file system. This approach works well for me, because it's the lightest touch approach; the file content is still usable exactly as it was before Devonthink indexed it. So this is how I use Devonthink. I don't use more than a tiny proportion of its features; to be honest, it feels rather over-engineered. Take a look at the screenshot below to see just how complex the user interface gets. I will, in passing, note one of its irritating features. Yes, it's the search tools. There are two different search fields available, without any hint as to which you should use in which circumstances.

Powerful but overwhelming complex - the Devonthink UI

Returning to tagging, in the final analysis, we are still just talking about text-based tagging; simple keywords (here is an article on why taxonomies are better than keywords). I would like to see a classification taxonomy as an information structure in its own right. This is a core design principle of the Desktop content graph, as I will show in a later article. For now, let's look at how we might make tagging a bit more straightforward and useful.

Down the rabbithole; Finder tags

Here is a brief diversion down the rabbithole on tags in the MacOS Finder. Take a look, if you're interested, then come back to continue from here.