HyQuas is a Hybrid partitioner based Quantum circuit Simulation system on GPU, which supports both single-GPU, single-node-multi-GPU, and multi-node-multi-GPU quantum circuit simulation.
For single-GPU simulation, it provides two highly optimized methods, OShareMem and TransMM. OShareMem method optimizes the shared-memory based quantum circuit simulation by . TransMM method converts quantum circuit simulation to standard operations and enables the usage of highly optimized libraries like cuBLAS and powerful hardwares like Tensor Cores. It leads up to speedup over previous gate-merging based simulation. Moreover, it can select the better simulation method for different parts of a given quantum circuit according to its pattern.
For distributed simulation, it provides a GPU-centric communication pipelining approach. It can utilize the high-throughput NVLink connections to make the simulation even faster while still preserving low communication traffic.
Experimental results show that HyQuas can achieve up to speedup on a single GPU and speedup on a GPU cluster over state-of-the-art quantum circuit simulation systems.
-
Get the source code
git clone https://github.com/thu-pacman/HyQuas.git --recursive
-
Specify the compute capability in
CMakeLists.txt
(CUDA_NVCC_FLAGS
) andthird-party/cutt/Makefile
(GENCODE_FLAGS
) -
Prepare the following dependencies
- cmake (tested on 3.12.3)
- cuda (tested on 10.2.89 and 11.0.2)
- g++ (compatible with cuda)
- cublas (with the same version of cuda)
- openmpi (tested on 4.0.5)
- nccl (Fully tested on 2.9.6-1. Known that 2.7.8-1 cannot work. It will be blocked in an NCCL simulated MPI_Sendrecv.)
And update environment variables like
CUDA_HOME
,NCCL_ROOT
,$PATH
,$LIBRARY_PATH
,$LD_LIBRARY_PATH
,CPATH
inscripts/env.sh
.
-
Compile the tensor transpose library
cutt
cd third-party/cutt make -j
-
Specify the root directory
export HYQUAS_ROOT=${The_directory_running_git_clone}/HyQuas
-
Prepare the database for the time predictor
mkdir -p evaluator-preprocess/parameter-files cd benchmark ./preprocess.sh
-
Example usages of HyQuas: HyQuas will use all GPUs it can detect, so please control the number of GPU by
CUDA_VISIBLE_DEVICES
.-
Run a single circuit with single GPU
cd scripts ./run-single.sh
-
Run a single circuit with multiple GPUs in one node
cd scripts ./run-multi-GPU.sh
-
Run a single circuit with multiple GPUs in multiple nodes Please modify the
-host
first.cd scripts ./run-multi-node.sh
-
Run all circuits and check the correctness (The script trys both w/o MPI)
cd scripts CUDA_VISIBLE_DEVICES=0,1,2,3 ./check.sh
-
Please use the commands in check.sh for evaluating the performance of HyQuas because the run_*.sh compiles the simulator in debug mode and check.sh compiles it in release mode.
For more ways to use our simulator (like only using the OShareMem method or TransMM method, tuning off the overlap of communication and computation), and for reproducing our results in the ICS'21 paper, please refer to our benchmark/
directory.
It also supports the following unstable feathers now. See our dev branch for details.
- Simulating more qubits by saving the state in CPU memory while still compute with GPU.
- An imperative mode, so that you do not need to explicitly call
c->compile();
andc->run()
. - Support for more control qubits.
- Support for some two-qubit gates.
- Fast measurement of quantum state.
To cite HyQuas, you can use the following BibTex:
@inproceedings{10.1145/3447818.3460357,
author = {Zhang, Chen and Song, Zeyu and Wang, Haojie and Rong, Kaiyuan and Zhai, Jidong},
title = {HyQuas: Hybrid Partitioner Based Quantum Circuit Simulation System on GPU},
year = {2021},
isbn = {9781450383356},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3447818.3460357},
doi = {10.1145/3447818.3460357},
booktitle = {Proceedings of the ACM International Conference on Supercomputing},
pages = {443–454},
numpages = {12},
keywords = {quantum computing, GPU computing, simulation},
location = {Virtual Event, USA},
series = {ICS '21}
}