IEEE Access (Jan 2025)

Adding Data Quality to Federated Learning Performance Improvement

  • Ernesto Gurgel Valente Neto,
  • Solon Alves Peixoto,
  • Valderi Reis Quietinho Leithardt,
  • Juan Francisco de Paz Santana,
  • Julio C. S. Dos Anjos

DOI
https://doi.org/10.1109/access.2025.3578301
Journal volume & issue
Vol. 13
pp. 126623 – 126648

Abstract

Read online

Massive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data privacy. However, selecting high-quality data for training remains a critical challenge in FL environments with non-independent and identically distributed (non-iid) data. Poor-quality data introduces errors, delays convergence, and increases computational costs. This study develops a data quality analysis algorithm for both FL and centralized environments to address these challenges. The proposed algorithm reduces computational costs, eliminates unnecessary data processing, and accelerates the convergence of AI models. The experiments utilized the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, and performance evaluation was based on main literature metrics, including accuracy, recall, F1 score, and precision. Results show a maximum observed execution time reduction of up to 56.49%, with an accuracy loss of approximately 0.50%.

Keywords