Skip to content

Commit

Permalink
doc: graph: sdpa: example and reference link for mqa
Browse files Browse the repository at this point in the history
  • Loading branch information
TaoLv committed Dec 20, 2024
1 parent f31db8f commit eff6df6
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions doc/graph/complex_fusion/sdpa.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,16 @@ example](https://github.com/oneapi-src/oneDNN/tree/main/examples/graph/sdpa.cpp)
demonstrating how to construct a typical floating point SDPA pattern with oneDNN
Graph API on CPU and GPU with different runtimes.

oneDNN also provides a [MQA (Multi-Query Attention)
example](https://github.com/oneapi-src/oneDNN/tree/main/examples/graph/mqa.cpp) [3]
demonstrating how to construct a floating point MQA pattern with the same
pattern structure as in the SDPA example but different head number in Key and
Value tensors. In MQA, the head number of Key and Value is always one.

## References

[1] Attention is all you need, https://arxiv.org/abs/1706.03762v7

[2] oneDNN Graph API documentation, https://oneapi-src.github.io/oneDNN/graph_extension.html

[3] Fast Transformer Decoding: One Write-Head is All You Need, https://arxiv.org/abs/1911.02150

0 comments on commit eff6df6

Please sign in to comment.