Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Support for mps (Apple Metal) GPUs in torch #28321

Open
mgerstgrasser opened this issue Sep 7, 2022 · 11 comments
Open

[RLlib] Support for mps (Apple Metal) GPUs in torch #28321

mgerstgrasser opened this issue Sep 7, 2022 · 11 comments
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical rllib RLlib related issues

Comments

@mgerstgrasser
Copy link
Contributor

Description

Torch on MacOS supports GPU acceleration on Metal GPUs (AMD GPUs on Intel Macs and Apple GPUs on M1/M2) through the mps backend now. It would be nice if ray / rllib could make use of this.

As far as I understand, the only change that is needed as far as torch is concerned is to do torch.device("mps") instead of torch.device("cuda..."), so this would be a relatively small addition in rllib's torch_policy_v2. I'm less clear on what would be needed for other parts of ray to recognise mps devices as GPU resources.

As a side note, in case anyone comes across this looking for GPU support on MacOS, it seems this already works for tf2, using tensorflow-metal. Just pip install tensorflow-metal and set framework to tf2 (not just tf), and rllib should see and use your AMD or Apple GPU.

Use case

It would be nice to have GPU acceleration for quick local debugging sessions.

@mgerstgrasser mgerstgrasser added the enhancement Request for new feature and/or capability label Sep 7, 2022
@krfricke krfricke added rllib RLlib related issues air P2 Important issue, but not time-critical labels Sep 7, 2022
@mgerstgrasser
Copy link
Contributor Author

As per this discussion, even if ray doesn't detect metal GPUs as resources, that's pretty easy to work around. So even just adding support in rllib itself would be useful. Could we simply try mps devices if there aren't any CUDA devices found? If so, this might just be a couple of additional lines of code.

@visuallization
Copy link

GPU support for m1 would be great!

@ChaceAshcraft
Copy link
Contributor

Would also like to see this happen!

@qazi0
Copy link
Contributor

qazi0 commented May 10, 2023

Waiting for this too:)

@hippotilt
Copy link

Same, that would help a lot :)

@ersinakinci
Copy link

Would love to see this happen!

@sams-data
Copy link

+1 this would be great

@anyscalesam anyscalesam removed the air label Oct 28, 2023
@arnaudlenain
Copy link

Hey, any update for rlib/Ray? Anything we can do to help?

@duburcqa
Copy link
Contributor

duburcqa commented Nov 11, 2024

It would be nice to address this issue. I think it should be quite straightforward. After playing around, currently it does not work only because rllib relies on torch.cuda.device_count to check whether the desired GPU index is available. Instead, one should first check if MPS is available. If so, it means there is exactly one GPU device. If not, then torch.cuda.device_count should be checked.

I'm currently monkey-patching torch to make it work:

import torch

device_count_orig = torch.cuda.device_count

def device_count():
    if torch.backends.mps.is_available():
        return 1
    return device_count_orig()

torch.cuda.device_count = device_count

@sashless
Copy link

sashless commented Nov 14, 2024

I was eager to try that and added GPU to my learner configs.

 .learners(
            num_gpus_per_learner=1,  # Set this to 1 to enable GPU training.
 )
.resources(num_gpus=1 )

and adding GPU to my resources

trainable_with_cpu_gpu = tune.with_resources(PPO, {"cpu": 4, "gpu": 1})

tuner = tune.Tuner(
    trainable_with_cpu_gpu,

and using GPU in ray.init

ray.init(
    num_gpus=1,

Then i see Logical resource usage: 4.0/10 CPUs, 1.0/1 GPUs when starting the tune job.

Unfortunately it doesn't tune and "idles" in Pending Status. What else do i need to do @duburcqa ?

@duburcqa
Copy link
Contributor

duburcqa commented Dec 8, 2024

Hum... This is curious. That is all I had to do to make it works with PPO. But I'm not using tune at all. I have written some kind of minimal equivalent of tune to supervise my jobs, so I'm configuring the algorithms manually in python scripts, without relying on configuration files at all.

Anyway, the performance is pretty bad. In the end, it runs slower than CPU only...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical rllib RLlib related issues
Projects
None yet
Development

No branches or pull requests