DBTU-Net: A Dual Branch Network Fusing Transformer and U-Net for Skin Lesion Segmentation

Wanqing Peng; Jiaojiao Li; Dandan Lai; Qiaomin Lin; Ying Han; Guangrong Huang

doi:10.1109/ACCESS.2025.3578295

IEEE Access (Jan 2025)

DBTU-Net: A Dual Branch Network Fusing Transformer and U-Net for Skin Lesion Segmentation

Wanqing Peng,
Jiaojiao Li,
Dandan Lai,
Qiaomin Lin,
Ying Han,
Guangrong Huang

Affiliations

Wanqing Peng: ORCiD; The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
Jiaojiao Li: The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
Dandan Lai: The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
Qiaomin Lin: The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
Ying Han: The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
Guangrong Huang: The Seventh Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China

DOI: https://doi.org/10.1109/ACCESS.2025.3578295
Journal volume & issue: Vol. 13
pp. 101262 – 101273

Abstract

Read online

The dermoscopy, as a non-invasive diagnostic tool, plays a significant role in the early diagnosis of skin cancer and in improving patient survival rates. However, the complexity of skin lesion regions, the ambiguity of their boundaries, and issues such as hair occlusion pose challenges for the segmentation of skin lesions. Currently, models based on the convolutional neural network (CNN) and Transformer have been widely used for the segmentation of skin lesions regions. However, CNN-based models struggle with long-range feature modeling, while Transformer-based models tend to pay less attention to local information, resulting in lower boundary accuracy. To address the aforementioned issues, we propose a dual branch network fusing Transformer and U-Net (DBTU-Net). DBTU-Net utilizes attention dense U-Net to capture local features and employs vision Transformer (ViT) to model long-range dependencies and the local contribution score of the image, thereby achieving a comprehensive extraction of both local and global features. Attention dense U-Net includes a triple fusion attention module that extracts features across the height, width, and channel dimensions, aiding the U-Net in capturing the interdependencies between channel and spatial locations. Additionally, we employ channel and spatial fusion attention after the attention dense U-Net to fuse channel and spatial information, thereby enhancing the CNN’s ability to capture long-range dependencies. DBTU-Net achieves accuracies of 0.9680, 0.9647, and 0.9623 on the ISIC-2017, ISIC-2018, and PH2 datasets, respectively, demonstrating its strong generalization capability and superior performance in segmenting skin lesion regions.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords