A PyTorch extension implementing symmetric power transformers - a variant of linear transformers that achieves transformer-level performance while sca

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-01-24 23:00:05

A PyTorch extension implementing symmetric power transformers - a variant of linear transformers that achieves transformer-level performance while scaling linearly with sequence length. This package provides efficient CUDA kernels that make it possible to process much longer sequences compared to standard quadratic attention.

The package includes a drop-in replacement for standard attention in transformer models. See train/model.py for a complete example of using power attention in a GPT-style model:

Leave a Comment