Creating SKOS Taxonomies of Museum Subject Hierarchies Using CASPAR

Museums typically rely on terminology control, i.e., sets of agreed terms, to consistently describe the artefacts in their collections, specifying objects and concepts depicted or showcased, subjects, materials used, etc. This way, information does not ‘get lost’, and can be efficiently retrieved whenever users wish to have access to it. Such sets of controlled terms are frequently deployed as taxonomies.

Museum Taxonomies as Open Data

It is very common for museums to publicly release their taxonomies, often accompanied by metadata describing (parts of) their artefact collections. This facilitates scholars and other interested stakeholders to carry out further research and develop third-party applications “on top” of the data. However, a frequent drawback is that these resources are often released in formats like JSON or CSV, which substantially undermines content interoperability. Towards addressing this bottleneck, Semantic Web technologies, with ontologies being the most prominent tool, open up new and exciting possibilities for (a) establishing semantic interoperability between heterogeneous datasets, and (b) developing intelligent applications powered by the underlying semantics within the datasets.

The Simple Knowledge Organization System

For the representation of taxonomies and subject hierarchies in a well-established Semantic Web-compliant format, the World Wide Web Consortium ( W3C ), the core international standards organization for the World Wide Web, has developed the Simple Knowledge Organization System ( SKOS ). Since August 2009, SKOS is the W3C recommendation for representing taxonomies and subject hierarchies based on the Resource Description Framework ( RDF ) standard for data exchange on the Web, allowing metadata to be shared and retrieved across different applications.

For organizations such as museums, converting their existing taxonomies into a SKOS-compliant format is a non-trivial and rather costly task, demanding experience in Semantic Web knowledge representation formats and relevant technologies.

Converting a Taxonomy into a SKOS Vocabulary

For organizations such as museums, converting their existing taxonomies into a SKOS-compliant format is a non-trivial and rather costly task, demanding experience in Semantic Web knowledge representation formats and relevant technologies. Towards addressing this challenge, this blog post presents the deployment of our in-house CASPAR semantic data integration platform that substantially speeds up and simplifies the process.

Use Case: Tate’s Subject Hierarchy

As an illustrative use case, we will generate a SKOS taxonomy from Tate’s subject hierarchy using CASPAR. Tate UK has publicly released its subject hierarchy as a set of JSON files organized in three levels. For example, in this artwork by Robert Blake, the relevant subject headings are found below in the ‘Explore’ section. An excerpt from the list of subjects for the specific artwork is ‘religion and belief’ > ‘universal religious imagery’ > ‘blessing’. Another heading relates people with action types and adults. Currently, Tate’s subject hierarchy features 16 Level-0 (i.e., top-level) entries, 142 Level-1 entries, and 2251 Level-2 entries, allowing for a rich indexing toolkit in the traditional sense.

Semantic Data Integration with CASPAR

CASPAR converts the input files into a semantic Knowledge Graph (KG) – in this case an RDF SKOS taxonomy – through the definition of mappings between input data fields and respective ontology concepts. The specification of mappings is compliant with a proprietary Domain-Specific Language (DSL) based on JSON syntax. This public GitHub repository contains all the relevant resources, including the generated SKOS taxonomy and the mappings for converting Tate’s JSON files into a taxonomy via CASPAR. The taxonomy’s namespace (permanent URI) is http://w3id.org/tate-skos# and a snapshot generated through SKOS Play is seen below:

Sample SPARQL Queries

Below is a list of indicative SPARQL queries for retrieving information from the generated taxonomy:

Deploying CASPAR in Other Use Cases

CASPAR is totally domain-agnostic and can be easily deployed in virtually any domain. If you would like to know more, or if you wish to deploy CASPAR for your use case, please feel free to contact us . We will be more than happy to discuss your specific needs in depth!

Stratos Kontopoulos