Journal of Cloud Computing: Advances, Systems and Applications (Jul 2025)

Memory-augment graph transformer based unsupervised detection model for identifying performance anomalies in highly-dynamic cloud environments

  • Huangyining Gao,
  • Ruyue Xin,
  • Peng Chen,
  • Xi Li,
  • Ning Lu,
  • Peng You

DOI
https://doi.org/10.1186/s13677-025-00766-5
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Cloud computing systems provide highly available and scalable computing, storage, and network resources to meet various service demands. Anomaly detection based on monitoring metrics plays a crucial role in identifying system defects and abnormal behaviors, ensuring the reliability and stability of cloud services. However, as the complexity of data increases and concurrent noise impacts cloud environments, server failures and abnormal events rise, making anomaly detection more challenging. To address these issues, we propose MemGT, an unsupervised multivariate time series anomaly detection method that offers high accuracy, good robustness, and noise resistance for various data patterns in complex cloud environments. Our approach utilizes a Transformer encoder and dynamic graph structure learning to extract spatio-temporal features of monitoring metrics in cloud computing systems in parallel. Additionally, we introduce a novel dynamic gated memory module to guide the Transformer encoder in extracting hidden features, thereby enhancing the model’s robustness to varying data patterns in dynamic cloud environments. To accurately distinguish between concurrent noise and real anomalies, we utilize a window-wise graph learning method, further improving the model’s noise resistance. We compared the detection performance of MemGT with 15 baseline methods across 8 public datasets. The experimental results demonstrate that our method achieves an average F1 score of 95.04%, surpassing state-of-the-art baseline methods by 24.80%.

Keywords