Friday, January 08, 2010

Metadata, Taxonomies, Vocabularies - Do What?

Imagine standing in front of a wall of cubbies filled with parchment scrolls, and not being to find the Tractatus you know exists in there. Or no – make that try to find the right health insurance claim form on your corporate intranet. Your predicament is as old as civilization itself. And content classification structures, including tools like metadata and taxonomy, were invented thousands of years ago to deal with it.



But let’s say you want to build one of these taxonomy or metadata things yourself. You might first want to figure out what those terms mean – though ironically enough, there are creative differences over how they’re defined and used. Nonetheless, the following offers a high-level description of these key terms.

Taxonomies arrange content objects into relationships. The folder structure on your computer is a taxonomy of files and groupings. A good office supplies website uses a taxonomy of products so you can find and purchase supplies. The government uses taxonomies as well; for instance, the DoD Core Taxonomy is a set of categories for people, processes and technologies to support the fulfillment of Department of Defense missions. Pretty interesting stuff!

Taxonomies are also knowledge maps that enable you to see the shape of the knowledge domain. The tree structure of a site map tells you at a glance how the sections and subsections of a website relate to one another. And if you’ve ever held a copy of Roget’s thesaurus, you’ve had a taxonomy of the English language in your hand.

Metadata are detailed elements that give us a way to categorize and standardize how things are described within a taxonomy. The best-known scheme for metadata is Dublin Core, a set of 15 elements including “title,” “creator,” and “subject.” A metadata specification defines the attributes of each element (e.g. name, definition, comments, and references), but doesn’t dictate its vocabularies or the format in which the elements appear. By separating the metadata scheme from the vocabularies, you give yourself a system that is flexible and technology-independent.

Vocabularies are lists of words used to categorize content objects. A list of U.S. federal agencies, a list of attributes of wines, and a list of music genres are all examples. If a controlled vocabulary is being used, then every website or database would call each term by the same name (or a taxonomy would specify a clear and explicit relationship among the terms being used). So “Department of Veterans Affairs” or “VA” could be terms in a controlled vocabulary, while “Vets Affairs” would (probably) not be. In fact, lists of terms that don’t adhere to an explicit, agreed-upon, and managed classification structure are known as “uncontrolled vocabularies.”


Taken together, metadata, controlled vocabularies, taxonomies, thesauri, ontologies, and other content classification structures provide a system for making statements about information objects – statements that allow things to be findable, manageable, and even interoperable. Ultimately, it’s all about getting your hands on that damned form. Or finally being able to curl up with a nice, warm Tractatus.

No comments:

Post a Comment