Umanistica Digitale (Jul 2025)

Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions

  • Lucia Giagnolini,
  • Andrea Schimmenti,
  • Paolo Bonora,
  • Francesca Tomasi

DOI
https://doi.org/10.6092/issn.2532-8816/21229
Journal volume & issue
no. 20
pp. 115 – 144

Abstract

Read online

Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems.

Keywords