CQS-Attention: Scaling Up the Standard Attention Computation for Infinitely Long Sequences

Yiming Bian; Arun K. Somani

doi:10.1109/access.2025.3544550

IEEE Access (Jan 2025)

CQS-Attention: Scaling Up the Standard Attention Computation for Infinitely Long Sequences

Yiming Bian,
Arun K. Somani

Affiliations

Yiming Bian: ORCiD; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Arun K. Somani: ORCiD; Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA

DOI: https://doi.org/10.1109/access.2025.3544550
Journal volume & issue: Vol. 13
pp. 35527 – 35538

Abstract

Read online

Transformer models suffer from unaffordable high memory consumption when the sequence is long and standard self-attention is utilized. We developed a sequence parallelism scheme called CQS-Attention that can break the limit of sequence length. A long sequence is divided into multiple overlapping subsequences. The attention of each subsequence is independently computed and gathered as the final exact attention of the original long sequence. CQS-Attention is a fork-join parallel model comprising three components: Scheduler, Workers, and Tiler. The Scheduler equally partitions computation responsibility in a completely mutually exclusive manner and ensures the local subsequence length is minimum. Each worker independently computes the standard attention of the assigned subsequence and transfers local results to the Tiler, which produces the final attention. CQS-Attention makes attention computation embarrassingly parallel. Hence, it enjoys great performance regarding single-device memory and computation time consumption, mathematical stability and scalability. More importantly, it is fully compatible with all state-of-the-art attention optimizations. Our code and supplementary information (SI) are available at https://github.com/CQS-Attention/CQS_Attention.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords