Fasten: A Library of Fast Segment Operators

Introduction

Fasten is a library aimed at speeding up Heterogeneous Graph Neural Network (HGNN) workloads. The current version of Fasten focuses on improving segmented matrix multiplication, a critical operator in HGNNs. Fasten implements a simple interface, making it easy to integrate with existing graph library PyG with minimal changes. Fasten achieved an average speedup of 13.65x and 4.72x in operator-wise benchmarks compared to CUTLASS and cuBLAS, respectively.

Fasten vs CUTLASS

Fasten vs cuBLAS

Installation

Build Instructions

Install pytorch nightly and triton nightly. We use relatively new triton features so old triton releases may crash.

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

You may need to build triton from source before proton is distributed with triton's pip wheel.

git clone https://github.com/Deep-Learning-Profiling-Tools/fasten.git && cd fasten
pip install .

Examples

Fasten's segment matrix multiplication operator has been integrated with various HGNN architecture such as RGCN, HGT, RGAT in PyG. Examples on how to run the examples can be found below:

GNN Examples

RGCN

cd examples/rgcn
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --dataset AIFB
# With fasten
python rgat.py --device cuda --mode fasten --dataset AIFB

HGT

cd examples/rgcn
# Without fasten
# Available datasets are: DBLP, Freebase, AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --example DBLP
# With fasten
python rgat.py --device cuda --mode fasten --example DBLP

RGAT

cd examples/rgat
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgat.py --device cuda --dataset MUTAG
# With fasten
python rgat.py --device cuda --mode fasten --dataset MUTAG

Benchmarking

cd test
pytest -vs test_op.py::test_perf

Compatibility

Supported Platforms

Linux

Supported Hardware

NVIDIA GPUs (Compute Capability 7.0+)

Software requirements

Pytorch >=2.2.0
Triton >=3.0.0
PyG >=2.6.0

Publication

Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, and Jiajia Li. 2024. FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogeneous Graph Neural Networks. In Proceedings of the 38th ACM International Conference on Supercomputing (ICS’24), June 4–7, 2024, Kyoto, Japan.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github/workflows		.github/workflows
examples		examples
fasten		fasten
test		test
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fasten: A Library of Fast Segment Operators

Introduction

Fasten vs CUTLASS

Fasten vs cuBLAS

Installation

Build Instructions

Examples

GNN Examples

Benchmarking

Compatibility

Supported Platforms

Supported Hardware

Software requirements

Publication

About

Releases

Packages

Contributors 3

Languages

License

Deep-Learning-Profiling-Tools/fasten

Folders and files

Latest commit

History

Repository files navigation

Fasten: A Library of Fast Segment Operators

Introduction

Fasten vs CUTLASS

Fasten vs cuBLAS

Installation

Build Instructions

Examples

GNN Examples

Benchmarking

Compatibility

Supported Platforms

Supported Hardware

Software requirements

Publication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages