Filolog (Jun 2025)
IT-SR-NER: SERBIAN-ITALIAN PARALLEL CORPUS FOR LEARNING SERBIAN AS A FOREIGN LANGUAGE
Abstract
This paper explores the applications of the It-Sr-NER parallel corpus in foreign language teaching. The corpus, developed in 2022 by researchers from the University of Turin and the Society for Linguistic Resources and Technologies ( JeRTeh) from Belgrade as part of the “Bridging gaps” project under CLARIN infrastructure, represents the first parallel corpus for the Italian-Serbian language pair. It contains 10.000 sentences extracted from both classical and modern Italian and Serbian literature, which have been aligned to facilitate data exploration and word translation in context. The corpus is annotated for Named Entity Recognition (NER), with special attention to toponyms (including pluralia tantum) and personal names. The study addresses three key aspects: the challenges faced by beginners due to Serbian’s rich morphology in recognising and connecting named entities with their lemmas, the utility of the corpus for intermediate and advanced learners in studying lexical gaps (words without direct translation equivalents), and the potential of NER-annotated parallel corpora as an alternative to traditional bilingual dictionaries and digital tools. The paper demonstrates how this type of corpus can serve as both an effective alternative to conventional resources and a unique tool for specific types of linguistic research.
Keywords