Results in Engineering (Sep 2025)

WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks

  • Jinjiang Liu,
  • Yonghua Xie

DOI
https://doi.org/10.1016/j.rineng.2025.105930
Journal volume & issue
Vol. 27
p. 105930

Abstract

Read online

With the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhanced Detection Transformer), a Transformer-based detection framework built upon RT-DETR (Real-Time Detection Transformer). The framework integrates four custom-designed components to jointly enhance detection accuracy and efficiency. First, BasicBlock-WTCM (Wavelet-based Transform Coordinate Module) improves small object perception by modeling spatial and channel semantics across scales. Second, the Dual-Stage Adaptive Normalization (DSAN) dynamically selects appropriate normalization strategies during training and inference to improve convergence and runtime performance. Third, the Feature-Focused Diffusion Pyramid Network (FFDPN) enhances context modeling and robustness via hierarchical multi-scale feature alignment and fusion. Finally, the boundary-aware Slide-VarifocalLoss combines sliding window mechanisms with class-weighted reweighting to address class imbalance and improve boundary localization. Experiments show that WDFS-DETR improves [email protected] by 2.2 % over RT-DETR-r18 on VisDrone2019. When deployed on Jetson Orin Nano, inference speed improves from 38.6 FPS to 50.1 FPS, highlighting its suitability for real-time deployment on lightweight platforms. The model also generalizes well to UAVDT and DOTA datasets, demonstrating its applicability to real-world UAV-based detection tasks in complex engineering settings. Source code is available at: https://github.com/liuliuliu2002/WDFS-DETR.

Keywords