IEEE Access (Jan 2025)
Adding Data Quality to Federated Learning Performance Improvement
Abstract
Massive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data privacy. However, selecting high-quality data for training remains a critical challenge in FL environments with non-independent and identically distributed (non-iid) data. Poor-quality data introduces errors, delays convergence, and increases computational costs. This study develops a data quality analysis algorithm for both FL and centralized environments to address these challenges. The proposed algorithm reduces computational costs, eliminates unnecessary data processing, and accelerates the convergence of AI models. The experiments utilized the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, and performance evaluation was based on main literature metrics, including accuracy, recall, F1 score, and precision. Results show a maximum observed execution time reduction of up to 56.49%, with an accuracy loss of approximately 0.50%.
Keywords