Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions

Lucia Giagnolini; Andrea Schimmenti; Paolo Bonora; Francesca Tomasi

doi:10.6092/issn.2532-8816/21229

Umanistica Digitale (Jul 2025)

Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions

Lucia Giagnolini,
Andrea Schimmenti,
Paolo Bonora,
Francesca Tomasi

Affiliations

Lucia Giagnolini: ORCiD; Università di Bologna
Andrea Schimmenti: ORCiD; Università di Bologna
Paolo Bonora: ORCiD; Università di Bologna
Francesca Tomasi: ORCiD; Università di Bologna

DOI: https://doi.org/10.6092/issn.2532-8816/21229
Journal volume & issue: no. 20
pp. 115 – 144

Abstract

Read online

Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems.

Published in Umanistica Digitale

ISSN: 2532-8816 (Online)
Publisher: University of Bologna
Country of publisher: Italy
LCC subjects: General Works: History of scholarship and learning. The humanities
Website: http://umanisticadigitale.unibo.it/

About the journal

Abstract

Keywords