Allow multiple inference engines in single script #4384

mrwyattii · 2023-09-21T22:58:27Z

The InferenceEngine relies on some global values and fails to clean up workspaces such that if we try to run multiple models via DeepSpeed-Inference in a single script (see example below), we get errors like this:

RuntimeError: The specified pointer resides on host memory and is not registered with any CUDA device.

This PR adds a destroy() method that will be automatically called on subsequent invocations of deepspeed.init_inference().

example.py:

import os
import torch
import deepspeed
from transformers import pipeline

local_rank = int(os.environ.get("LOCAL_RANK", 0))
world_size = int(os.environ.get("WORLD_SIZE", 1))

model1 = "bigscience/bloom-560m"
task1 = "text-generation"
pipe1 = pipeline(task1, model1, torch_dtype=torch.float16, device=local_rank)
pipe1.model = deepspeed.init_inference(
    pipe1.model,
    dtype=torch.float16,
    replace_with_kernel_inject=True,
    mp_size=world_size,
)
print(model1, pipe1("test one two"))

model2 = "gpt2"
task2 = "text-generation"
pipe2 = pipeline(task2, model2, torch_dtype=torch.float16, device=local_rank)
pipe2.model = deepspeed.init_inference(
    pipe2.model,
    dtype=torch.float16,
    replace_with_kernel_inject=True,
    mp_size=world_size,
)
print(model2, pipe2("test one two"))

Co-authored-by: Jeff Rasley [email protected]

deepspeed/inference/engine.py

Co-authored-by: Jeff Rasley <[email protected]>

* origin/master: Allow multiple inference engines in single script (microsoft#4384) adds triton flash attention2 kernel (microsoft#4337) Fix llama meta tensor loading in AutoTP and kernel injected inference (microsoft#3608) Fix min torch version (microsoft#4375) Fix multinode runner to properly append to PDSH_SSH_ARGS_APPEND (microsoft#4373) add the missing method (microsoft#4363) Openfold fix (microsoft#4368) deepspeed4science japanese blog (microsoft#4369) deepspeed4science chinese blog (microsoft#4366) Enable workflow dispatch on Torch 1.10 CI tests (microsoft#4361) Update conda env to have max pydantic version (microsoft#4362) add deepspeed4science blog link (microsoft#4364) added check to avoid undefined behavior when the input_id length is greater than max_tokens (microsoft#4349) Add the policy to run llama model from the official repo (microsoft#4313) fix deepspeed4science links (microsoft#4358) DeepSpeed4Science (microsoft#4357) Support InternLM (microsoft#4137) Pass base_dir to model files can be loaded for auto-tp/meta-tensor. (microsoft#4348)

add destroy method to InferenceEngine

ba13183

mrwyattii requested review from RezaYazdaniAminabadi, jeffra, awan-10, cmikeh2, arashb and tjruwase as code owners September 21, 2023 22:58

remove accidentally aded submodule

3f9ad48

RezaYazdaniAminabadi approved these changes Sep 21, 2023

View reviewed changes

mrwyattii mentioned this pull request Sep 21, 2023

[BUG] Intermittent RuntimeError: The specified pointer resides on host memory and is not registered with any CUDA device. #3178

Open

mrwyattii added this pull request to the merge queue Sep 22, 2023

jeffra reviewed Sep 22, 2023

View reviewed changes

deepspeed/inference/engine.py Outdated Show resolved Hide resolved

mrwyattii removed this pull request from the merge queue due to a manual request Sep 22, 2023

mrwyattii and others added 3 commits September 22, 2023 10:16

Update deepspeed/inference/engine.py

ec7a258

Co-authored-by: Jeff Rasley <[email protected]>

move imports to avoid error with global variable not updating

d890c52

formatting

2e71430

mrwyattii enabled auto-merge September 22, 2023 22:09

mrwyattii mentioned this pull request Sep 22, 2023

Re-enable non persistent test cases microsoft/DeepSpeed-MII#238

Merged

mrwyattii disabled auto-merge September 22, 2023 22:56

mrwyattii merged commit 4c35880 into master Sep 22, 2023
16 checks passed

mrwyattii deleted the mrwyattii/allow-multiple-inf-engines branch September 22, 2023 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple inference engines in single script #4384

Allow multiple inference engines in single script #4384

mrwyattii commented Sep 21, 2023

Allow multiple inference engines in single script #4384

Allow multiple inference engines in single script #4384

Conversation

mrwyattii commented Sep 21, 2023