The KNOT Data Model - Controlled Vocabularies
KNOT-DM makes use of various authority records and controlled vocabularies for descriptions of key concepts and technical details.
KNOT-DM makes use of external authority records and controlled vocabularies drawn from the requirements of DCAT as well as those of the project as a whole. These are listed below. In addition, two controlled vocabularies have been created specifically for the project, detailed in the next section. See the Relationships section of the Module page for details on how concepts from these vocabularies and authorities can be attached to the classes representing activities, entities, agents, and locations.
- The Licenze controlled vocabulary from AgID is used to specify licenses (however it notably does not include some key license types like GNU).
- The Vocabolario delle Aree CUN, dei Macrosettori, dei Settori Concorsuali e dei Settori Scientifico-Disciplinari delle Università Italiane from the OntoPia network is used to specify academic disciplines. It's recommended to use concepts at the Macro Sector level, especially for interdisciplinary projects, unless enough information is available to identify the relevant Scientific Disciplinary Sector.
- Geonames is used to specify geographical locations.
- Wikidata is used to specify geographical locations, temporal periods, and people (lowest level of priority if another vocabulary or authority record in the list can specify the necessary information).
- VIAF is used to specify people.
- TaDiRAH, the Taxonomy of Digital Research Activities in the Humanities, is used to specify the types of research activities. It's recommended to use concepts from the first two levels of the taxonomy unless a more specific concept is more accurate (for example encoding) and to limit it to three, to avoid overusage. TaDiRAH can be used to specify both the specific activities involved in a research project, for example the fact that contributors programmed something, as well as the activities available to end users via data services, such as for example the ability to manipulate data.
- The Data Theme, File Type, Access Right, Language, and Frequency authority tables from the EU are used to specify various technical aspects of DCAT classes.
- The ADMS Status vocabulary is used to specify the state of a dataset’s distribution.
The KNOT Controlled Vocabularies
The KNOT Controlled Vocabularies were developed to answer specific needs of the data model which were not appropriately covered by existing vocabularies, namely the need to classify the types of digital scholarly objects KNOT documents and the need to have an authority control for technologies used by these objects. This resulted in the creation of the KNOT Taxonomy and KNOT Technology Thesaurus, two SKOS-based controlled vocabularies that are part of the KNOT-DM.
Both vocabularies were developed in an iterative process to refine their hierarchical structures and scope. The Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works published by the Getty Research Institute was used as a reference to help define which types of vocabularies to use while the SKOS standard from the W3C was chosen to represent these vocabularies in RDF as it "provides a standard, low-cost migration path for porting existing knowledge organization systems to the Semantic Web" [1]. Existing vocabularies specific to the field of Digital Humanities were also used as reference including TaDiRAH, PARTHENOS Vocabularies and the DHA Taxonomy, with the latter two originally considered as solutions to the needs of the project but found to not be precise or extensive enough.
Taxonomy was chosen to answer the needs for an “orderly classification” of the digital scholarly objects investigated by the project with a simple hierarchical structure through which to arrange the different types and detail their relationships [2]. The thesaurus meanwhile was chosen to document the technologies used in research projects and objects as it is a more complex type of controlled vocabulary that reflects a “semantic network of unique concepts” with a multiplicity of relationships and the necessary controls “recommended for use as authorities in databases relating to art and cultural heritage.” [2]
Figure 2 details the SKOS structures used for both vocabularies. The taxonomy only employs skos:ConceptScheme
and skos:Concept
, in keeping with its simpler hierarchical structure, while the Thesaurus also makes use of skos:Collection
to enable a deeper structure. skos:related
is used to create non hierarchical connections between concepts within the same vocabulary while skos:closeMatch
and skos:relatedMatch
are used to connect the concepts to others in external vocabularies, primarily the Getty Art & Architecture Thesaurus, the Library of Congress Subject Headings, the DHA Taxonomy, and the PARTHENOS Vocabularies.
Both the KNOT Technology Thesaurus and Taxonomy are intended to be “living, growing tools” [2] that will be updated throughout the duration of the KNOT project. During their development a list of editorial guidelines was created (see below), which are intended as the first point of reference for any update to either vocabularies.
The KNOT vocabularies are available directly from our GitHub in both SKOS format and with additional DCAT information, similarly to the SKOS vocabularies available from Schema, the Italian national catalogue of semantic data.
The KNOT Taxonomy
The list of concepts included in the KNOT Taxonomy reflects the most common types of digital scholarly objects encountered during the design phase of the KNOT-DM, based on a census of academic research projects from Italian universities focused on the Humanities.
The taxonomy is not intended as a complete or accurate picture of the landscape of digital scholarly objects produced by academic research in Italy but rather as a starting point from which to begin addressing this complex topic with a view to incorporate the learnings into the guidelines that the KNOT project will produce. While there have been efforts within the international Digital Humanities community to create useful taxonomies of research methods, tools, and activities, notably the DiRT Directory of digital research tools [3] and TaDiRAH, less focus appears to have been given to the objects such methods, tools, and activities might produce. This makes classification of digital scholarly objects a particularly interesting and useful line of investigation for the project.
The concepts for the taxonomy were chosen and defined based on both existing literature and institutional definitions as well as by considering the classifications used by the research projects in order to account for both practical usage as well as ideal usage. As such a concept like Database reflects both its technical definition as well as its use by many research projects to describe a collection of resources and their metadata while a concept like Digital Archive is in practice almost always used to reflect a collection of items from different entities unified thematically which stands in contrast to the traditional definition of a collection of records originating from one person or organization and created in the course of their activity.
As the KNOT project progresses the intention is to further refine the Taxonomy to include relevant concepts as well as find ways to reconcile practical and ideal uses.
Figure 3 details the KNOT Taxonomy structure and concepts.
The KNOT Technological Thesaurus
Concepts for the KNOT Technology Thesaurus were drawn from the technologies found in the same census data used for the Taxonomy, with particular attention to concepts related to the technology stack of many web-based applications. This was due to existing thesauri often lacking in concepts relating to this particular point of interest in our project. Concepts were then further expanded to also include research tasks, services, and tools and to therefore have the thesaurus be a more encompassing reference point for various relevant technological aspects of digital scholarly objects.
From this the decision was taken to use skos:Collection
to help create a deeper hierarchy as well as groupings for concepts that share something in common with the decision to begin with five collections: Computer & Information Sciences (representing research tasks and techniques), Data Storage, Formal Languages (representing programming and other technological languages), Information Technology Architecture (representing the IT stack), and Services & Tools (representing things that research projects both create or make available). As with the Taxonomy, these collections are not intended to be perfect or exact but rather reflect the needs of the project in its first year and act as a starting point from which to think about the specific issues of detailing the technologies that make digital scholarly objects examples of digital cultural heritage.
Unlike the taxonomy, definitions for concepts in the thesaurus were kept as simple and factual as possible and wherever possible taken from trusted sources such as the Springer Handbook series or existing vocabularies such as the Getty AAT and LoC Subject Headings.
KNOT Controlled Vocabularies Editorial Guidelines
The editorial guidelines are intended as the first point of reference when updating the KNOT vocabularies. Updates to the vocabularies can be suggested by anyone but only implemented by a member of the KNOT project team.
- Required fields for a new concept are:
skos:prefLabel
andskos:altLabel
,skos:definition
,dcterms:source
, andskos:closeMatch
orskos:relatedMatch
. Structural elements, such asskos:narrower
,skos:broader
, andskos:related
should be included based on where the concept will be placed in the hierarchy. - The placement of a new concept should first be considered within the existing hierarchy, using
skos:narrower
orskos:topConceptOf
. Only if no existingskos:Collection
is sufficient to accommodate this new concept should an update to the existing hierarchy be considered, starting with the creation of a new collection. skos:prefLabel
should always favor the most common term in use for the concept that is not an abbreviation.skos:altLabel
should always include abbreviations as well as other terms that may be more common within the fields the term is used in (jargon).- Do not repeat shared elements of concepts in their definition but rather include them in the
skos:ConceptScheme
definition. This applies primarily to elements that concepts in the Taxonomy all have in common such as the use of digital-born and/or digitized objects or the fact that the research project’s question is a common theme for the selection of data. - For something to qualify as a service it should be reusable and accessible.
skos:definition
can differ from accepted definitions of the concept to reflect the point of view of the project but should do so based on evidence. Definitions should also where possible make clear what affordances the concept offers users.- Types of affordances include knowledge production and/or extraction, searching/findability, collaboration/participation, interactivity, analysis, multimodality, intertextuality.
- In the thesaurus concepts are described as simply as possible referencing the type of technology (e.g. software), relevant adjectives (e.g. open source), relevant used technology (e.g programming languages) and well-known use cases (e.g. a LAMP stack).
skos:related
can be used to create relevant internal links such as between a piece of software and the programming language used to create it. - In the taxonomy
skos:related
is used to create internal links between concepts that could also be understood asskos:narrower
in the real world. This is because it is preferred for concepts in the taxonomy to beskos:topConceptOf
unless it is absolutely clear that it is a more specific version of an existing concept. For example, a Semantic Digital Edition is a narrower type of Digital Edition but a Knowledge Base is related to a Database, rather than being in a hierarchical relationship with it. skos:closeMatch
is used when an external vocabulary includes a near-identical concept (based on both the title of the concept and its definition, for example an existing concept for Javascript), whileskos:relatedMatch
is used for related concepts (for example the concept of Semantic Web is related to the concept of Linked Data).- Terms are singular.
- English and Italian are the primary languages for both vocabularies.
References
[1] “SKOS Simple Knowledge Organization System Reference.” n.d. Www.w3.org. Accessed July 11, 2023. https://www.w3.org/TR/skos-reference/.[2] Harping, Patricia. 2010. Introduction to Controlled Vocabularies: Terminology for Art, Architecture and Other Cultural Works. Santa Monica, CA: Getty Research Institute.
[3] Perkins, Jody, Quinn Dombrowski, Luise Borek, and Christof Schöch. “Project Report: Building Bridges to the Future of a Distributed Network: From DiRT Categories to TaDiRAH, a Methods Taxonomy for Digital Humanities.” In Proceedings of the 2014 International Conference on Dublin Core and Metadata Applications, 181–83. DCMI’14. Austin, Texas: Dublin Core Metadata Initiative, 2014.
Structure
Learn more about the different segments that make up the KNOT-DM.
Modules
Learn more about how to use the different modules within KNOT-DM.