Adding Data Quality to Federated Learning Performance Improvement

Ernesto Gurgel Valente Neto; Solon Alves Peixoto; Valderi Reis Quietinho Leithardt; Juan Francisco de Paz Santana; Julio C. S. Dos Anjos

doi:10.1109/access.2025.3578301

IEEE Access (Jan 2025)

Adding Data Quality to Federated Learning Performance Improvement

Ernesto Gurgel Valente Neto,
Solon Alves Peixoto,
Valderi Reis Quietinho Leithardt,
Juan Francisco de Paz Santana,
Julio C. S. Dos Anjos

Affiliations

Ernesto Gurgel Valente Neto: ORCiD; PPGETI, Federal University of Ceará, Fortaleza, Brazil
Solon Alves Peixoto: ORCiD; Department of Data Science, Federal University of Ceará Campus Itapajé, Itapajé, Brazil
Valderi Reis Quietinho Leithardt: ORCiD; Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR, Lisboa, Portugal
Juan Francisco de Paz Santana: ORCiD; Expert Systems and Applications Laboratory, University of Salamanca, Salamanca, Spain
Julio C. S. Dos Anjos: ORCiD; Department of Data Science, Federal University of Ceará Campus Itapajé, Graduate Program in Teleinformatics Engineering (PPGETI/UFC) Technological Center Campus of Pici, Ceará, Fortaleza, Brazil

DOI: https://doi.org/10.1109/access.2025.3578301
Journal volume & issue: Vol. 13
pp. 126623 – 126648

Abstract

Read online

Massive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data privacy. However, selecting high-quality data for training remains a critical challenge in FL environments with non-independent and identically distributed (non-iid) data. Poor-quality data introduces errors, delays convergence, and increases computational costs. This study develops a data quality analysis algorithm for both FL and centralized environments to address these challenges. The proposed algorithm reduces computational costs, eliminates unnecessary data processing, and accelerates the convergence of AI models. The experiments utilized the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, and performance evaluation was based on main literature metrics, including accuracy, recall, F1 score, and precision. Results show a maximum observed execution time reduction of up to 56.49%, with an accuracy loss of approximately 0.50%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords