The KNOT Pilot
KNOT is a three-year pilot (2023-2025) tasked with investigating ways to integrate the digital cultural heritage of Italian universities into the national infrastructure being built by the Central Institute for the Digitization of Cultural Heritage (ICDP) - Digital Library.
Considering the lack of consensus within universities and across them as to what constitutes their digital cultural heritage (DCH), the KNOT project has chosen to focus its investigation on the digital scholarly objects created by academic research within the field of the Humanities, and in particular the Digital Humanities, as an interesting yet unexplored example of existing DCH held by Italian universities. Our approach makes reference to the definitions of DCH provided by both UNESCO [1] and the ICDP [2] as well as insights from the heritage sector to consider digital objects as a form that goes beyond just data and contains multiple agencies and spatio-temporal distributions [3]. As such the objects we are interested comprise collections of information (such as datasets) as well as digital forms that enable interaction with information, from software and data services such as search interfaces or APIs to visualization and annotation tools. This allows us to move past the scientific context in which these objects are most commonly understood and look at what heritage value might be found in their “reservoir of meaning” [3]: such as the activity that produced them; the relationships to the information they encode; the new contexts they create for this information; and the ways in which they can foster the acquisition of new knowledge from this combination of context and information.
The primary expected outputs of the pilot are a web application showcasing early adoption of the new national infrastructure (including data integration and services) and a set of guidelines for the collection, management, enrichment, and reuse of these digital scholarly objects.
The KNOT project is divided into three annual phases, each with its own focus:
- The identification and collection of heterogenous data, used to create a transversal data model and a knowledge graph (year 1).
- The development of a web application to allow the exploration and use of the knowledge graph and collected data (year 2).
- The development of a suite of services to allow the acquisition of new knowledge from the hybridization of data (year 3).
Below is a more detailed overview of the activities and outputs expected within each phase of the project.
Year 1 (2023)
The first year was focused on data, with the following goals:
- Collection, classification, and categorization of heterogeneous and representative data for inclusion and use in the pilot.
- Elaboration of a conceptual and metadata model to describe the selected data and reflect the central argument of the project. This model is to be informed by relevant existing approaches and project requirements and should enable a network of semantic links, including through reconciliation with authority records and controlled vocabularies.
- Definition of competency questions against which the conceptual model can be tested to ensure its ability to facilitate the discovery of latent knowledge.
- Creation of a knowledge graph, available via a SPARQL endpoint, representing digital scholarly objects as examples of the DCH of Italian universities alongside a workflow for the collection, normalization, cleaning, and transformation of data into RDF.
Additionally the first year also included the following objectives which will continue in subsequent phases:
- Research into techniques for the extraction of knowledge from heterogenous data. This research is conducted alongside the DISI department and as part of a working group with CINECA, aligned with the goals and needs of the Digital Library infrastructure.
- Interviews with Italian academics to explore the challenges they face in the creation and management of digital scholarly objects born of Humanities research.
The outputs for the first year were:
- A small census of sources and heterogenous data (digital scholarly objects and their related projects), used to develop the data model and knowledge graph.
- A data model to describe digital scholarly objects as DCH, including an ontology and controlled vocabularies in RDF.
- A small-scale catalogue, using the census, powered by a knowledge graph based on the data model that can act as a first version of the final web application.
Year 2 (2024)
The second year is focused on the application, with the following goals:
- Integration of data and metadata into the I.PaC platform.
- Evaluation of potential services to extract new knowledge from the data as well as reconcile information with existing institutional repositories.
- Further development of the catalogue.
- Creation of a working document for the guidelines expected at the end of the project, drawing from early insights acquired during the data model development phase such as issues around visibility, categorization, and classification.
Year 3 (2025)
The third year is focused on services, with the following goals:
- Definition of a suite of user services available through the application, diverse in their offerings and potential and focused on the acquisition of new knowledge.
- Integration of services and application into the national infrastructure.
References
[1] “Records of the General Conference, 32nd Session, Paris, 29 September to 17 October 2003, v. 1: Resolutions.” UNESCO, 2004. https://unesdoc.unesco.org/ark:/48223/pf0000133171.[2] Docs Italia. “01_Piano nazionale di digitalizzazione del patrimonio culturale | Piano nazionale di digitalizzazione del patrimonio culturale.” Accessed January 28, 2024. https://docs.italia.it/italia/icdp/icdp-pnd-docs/it/v1.1-febbraio-2023/index.html.
[3] Cameron, Fiona. The Future of Digital Data, Heritage and Curation in a More-than Human World.London ; New York:Routledge/Taylor & Francis Group, 2021. ZA4080.4.