Desktop content graph part 2. Tags and taxonomy concepts
In this article I am starting to look at turning Finder tags into taxonomy Concepts. If you haven't read parts 0 and 1 you can find links to them here.
- Part 0 Setting the scene
- Part 1 The model for the Desktop content graph
- Part 2 Tags and taxonomy concepts (this article)
- Part 3 Building the initial Desktop content graph system
I wrote about Finder tags in part 0 of this series. Finder tags give you a simple way to attach keywords to files. Mac OS doesn't make extensive use of these tags, though tools like Devonthink capture Finder tags and allow you to navigate through a content database by tag, which is useful.
In graph terms, what I really want is to have my content objects linked to taxonomy concepts rather than tagging them with keywords. And when I use the word concept I'm really talking about SKOS Concepts.
A SKOS Concept extends the functionality of a simple keyword considerably, by providing a URI, object and data properties and, most importantly here, the ability to create graph-based linked data by virtue of classification of content objects with a predicate, all derived from a content model. In technical terms we would have a triple that looks like this:

So we have Finder tags, and we need to have SKOS Concepts. I'm going slightly to pre-empt the next article, where I will be building a Desktop content graph application, by showing how it is possible to take keywords and make them into SKOS Concepts. Here is a screenshot from the application. It shows the configuration window in which I'm building SKOS Concepts. The left hand list is a de-duplicated list of the Finder tags that I've collected when adding files to the graph workspace. The right hand list shows how these look when converted.

I will describe this application in much more detail in the coming articles. For now, I'm just going to talk about how this component of the model comes together. In the last article, I mentioned that one of the classes in the model is Concept. This models a SKOS Concept. You can read all about SKOS in detail in my article on this website. But for the purpose of the Desktop content graph, all that you really need to know is that a SKOS Concept is a piece of structured information that can be used to classify another piece of information, such as a content object.
SKOS Concepts only need to have two properties: a URI and a preferred label. The preferred label is the structured replacement for the text content of a tag, and the URI is the immutable identifier for the Concept.
We can therefore make new SKOS Concepts from a Finder tag (provided that the tag itself is unique - I'm dealing with that in the programme development). Here is the process:
First, convert the string for the Finder tag (stringToHash) into a hash:
myHash = lowercase(doEnHex(md5(stringToHash)))
The hash produced here is basically a 32-character hexadecimal string that contains the digits 0-9 and the letters a-f. Then convert the hash into a UUID, by simply parsing it into smaller pieces in a structured fashion with a pattern like this: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.
myUUID = Left(myHash, 8) + "-" + Mid(myHash, 9, 4) + "-" + mid(myHash, 13, 4) + "-" + mid(myHash, 17,4) + "-" + mid(myHash, 21)
Finally, prepend the UUID with the Base URI for the content model:
myURI = kBaseURI + myUUID
IN the example in the screenshot above, the Finder tag Olympus is converted into the URI http://schema.tellurasemantics.com/ContentGraph/9dcdbb68-a8a6-3caa-2937-9592f4d2017f.
I have to put in a couple of provisos here.
- Generating UUIDs using this method really requires a reasonable amount of entropy in the incoming string. Otherwise the namespace for the UUID will be too small, leading to UUID clashes. So ideally I should include some kind of salt in the incoming string, to make it long enough to be guaranteed unique.
- The Base URI above (https://schema.tellurasemantics.com) is fine, but it misses an opportunity. Ideally since the URI is for a taxonomy then the Base URI should reflect that; something like https://taxonomy.tellurasemantics.com.
- I really should also allow for the user to create new concepts.
I'm going to deal with both of these in the application as I develop it.
That's it for this article. Thanks for reading; I hope you found it useful and that you will be back for the next part.
