Spatial&#x2013;Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification

Haipeng Liu; Zhen Ye; Wen-Shuai Hu; Zhan Cao; Wei Li

doi:10.1109/jstars.2025.3565652

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)

Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification

Haipeng Liu,
Zhen Ye,
Wen-Shuai Hu,
Zhan Cao,
Wei Li

Affiliations

Haipeng Liu: School of Electronics and Control Engineering, Chang'an University, Xi'an, China
Zhen Ye: ORCiD; School of Electronics and Control Engineering, Chang'an University, Xi'an, China
Wen-Shuai Hu: ORCiD; School of Information and Electronics, Beijing Institute of Technology, Beijing, China
Zhan Cao: School of Electronics and Control Engineering, Chang'an University, Xi'an, China
Wei Li: ORCiD; School of Information and Electronics, Beijing Institute of Technology, Beijing, China

DOI: https://doi.org/10.1109/jstars.2025.3565652
Journal volume & issue: Vol. 18
pp. 12150 – 12165

Abstract

Read online

Due to the excellent feature extraction capabilities, deep learning has become the mainstream method for hyperspectral image (HSI) classification. Transformer, with its powerful long-range relationship modeling ability, has become a popular model; however, it usually requires a large number of labeled data for parameter training, which may be costly and impractical for HSI classification. As such, based on the self-supervised learning, this article proposes a spatial–spectral hierarchical multiscale transformer-based masked autoencoder (SSHMT-MAE) for HSI classification. First, after the spatial–spectral feature embedding with a spatial–spectral feature extraction module, to solve the increased computational complexity caused by filling invisible patches in traditional masked autoencoder (MAE), the grouped window attention module is introduced to process only the visible patches of HSIs during spatial–spectral reconstruction, avoiding unnecessary computations for masked ones. After that, a spatial–spectral hierarchical transformer is designed to build a hierarchical MAE structure, followed by a cross-feature fusion module to extract the multiscale spatial–spectral fusion features. It can not only assist the whole model in learning the fine-grained local spatial–spectral features within each local region but also capture the long-range dependencies between different regions, generating rich multiscale spatial–spectral features with high-level semantic and low-level detail information for HSI classification. Extensive experiments are conducted on the five public HSI datasets, evaluating the superiority of the proposed SSHMT-MAE model over several state-of-the-art methods.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords