IEEE Access (Jan 2025)

An Efficient Methodology for the Categorization of Software Requirements Using Natural Language Processing and Similarity Analysis

  • Rahat Izhar,
  • Kenneth Cosh,
  • Lachana Ramingwon,
  • Sakgasit Ramingwong,
  • Shahid N. Bhatti

DOI
https://doi.org/10.1109/access.2025.3568504
Journal volume & issue
Vol. 13
pp. 83591 – 83606

Abstract

Read online

The classification of software requirements into functional (FRs) and non-functional requirements (NFRs) is indispensable for the efficacious implementation of software systems. Traditionally, this endeavor reliant on manual exertion, this process has proven to be both protracted and inherently susceptible to errors, and inaccuracies. Recent advancements in machine learning (ML) have begun to offer promising avenues for automation, enhancing both the efficiency and accuracy of requirement classification. The following research study proposes “NLPReqClassifier,” a lightweight automated classification model that integrates Term Frequency-Inverse Document Frequency (TF-IDF) and Cosine Similarity. Leveraging the PROMISE dataset, which is enriched continually by academic and professional input, this model addresses the existing shortcomings of traditional classification methods. The proposed model utilizes a dual-evaluation approach, ensuring relevance and precision across varied software development contexts. By incorporating iterative feedback into the model’s training process, this research not only aligns with academic standards but also meets the practical demands of the industry. The model’s performance was tested in both academic settings dataset and real-world industry dataset, particularly focusing on its application in Enterprise Resource Planning (ERP) systems. The proposed model demonstrated superior capability to categorize a broad spectrum of software requirements accurately, outperforming existing traditional methodologies in terms of adaptability and efficiency. It showed significant improvements over traditional classification methods, particularly in its ability to dynamically adapt to new and evolving requirements. The dual-evaluation process verified the model’s effectiveness, showcasing high precision and recall rates in both controlled and practical environments.

Keywords