Get Started

LMDeploy offers functionalities such as model quantization, offline batch inference, online serving, etc. Each function can be completed with just a few simple lines of code or commands.

Installation

Install lmdeploy with pip (python 3.8+) or from source

pip install lmdeploy

The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:

export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118

Offline batch inference

import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm2_5-7b-chat")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

For more information on inference pipeline parameters, please refer to here.

Serving

LMDeploy offers various serving methods, choosing one that best meet your requirements.

Serving with openai compatible server
Serving with docker
Serving with gradio

Quantization

LMDeploy provides the following quantization methods. Please visit the following links for the detailed guide

4bit weight-only quantization
k/v quantization
w8a8 quantization

Useful Tools

LMDeploy CLI offers the following utilities, helping users experience LLM features conveniently

Inference with Command line Interface

lmdeploy chat internlm/internlm2_5-7b-chat

Serving with Web UI

LMDeploy adopts gradio to develop the online demo.

# install dependencies
pip install lmdeploy[serve]
# launch gradio server
lmdeploy serve gradio internlm/internlm2_5-7b-chat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_started.md

get_started.md

Get Started

Installation

Offline batch inference

Serving

Quantization

Useful Tools

Inference with Command line Interface

Serving with Web UI

Files

get_started.md

Latest commit

History

get_started.md

File metadata and controls

Get Started

Installation

Offline batch inference

Serving

Quantization

Useful Tools

Inference with Command line Interface

Serving with Web UI