Add LLaMA end-to-end benchmarking #19985

kunal-vaishnavi · 2024-03-20T01:48:54Z

Description

This PR adds a benchmarking script to measure end-to-end performance and saves the results in a CSV file.

Motivation and Context

With this PR, end-to-end performance can be easily measured for many large-language models such as LLaMA-2. The performance numbers for LLaMA-2 are located here.

onnxruntime/python/tools/transformers/models/llama/benchmark_e2e.py

onnxruntime/python/tools/transformers/models/llama/benchmark.py

### Description This PR updates the end-to-end benchmarking numbers for LLaMA-2. ### Motivation and Context The numbers were gathered with the end-to-end benchmarking script in [this PR](microsoft/onnxruntime#19985).

### Description This PR adds a benchmarking script to measure end-to-end performance and saves the results in a CSV file. ### Motivation and Context With this PR, end-to-end performance can be easily measured for many large-language models such as LLaMA-2. The performance numbers for LLaMA-2 are located [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).

kunal-vaishnavi added 8 commits March 17, 2024 23:30

Add E2E benchmarking

cad5aab

Add license to files and changes suggested by linter

3abc6fe

Fix runtime errors

72c92b8

Update README

ac1f266

Update CSV labels

f3d9036

Add cache dir

45774c2

Add changes suggested by linter

8172c28

Fix comment formatting

5ff17cf

kunal-vaishnavi mentioned this pull request Mar 20, 2024

Update LLaMA end-to-end numbers microsoft/onnxruntime-inference-examples#396

Merged

github-advanced-security bot found potential problems Mar 20, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/benchmark_e2e.py Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems Mar 20, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/benchmark.py Fixed Show fixed Hide fixed

kunal-vaishnavi added 5 commits March 20, 2024 02:13

Fix linter error in CI that differs locally

33897a2

Update README

5736e14

Fix bug with input shapes

559a738

Merge branch 'main' into kvaishnavi/llama-e2e

c44005b

Add changes suggested by linter

f0ebf4d

kunal-vaishnavi added 5 commits March 21, 2024 01:50

Add longer prompts

b3971d1

Add error message for unrecognized prompt lengths

0b21f14

Add local directory option for loading Hugging Face models

30a5973

Clarify arg descriptions

2dc0762

Update local directory arg description

34110f5

RyanUnderhill approved these changes Mar 22, 2024

View reviewed changes

kunal-vaishnavi merged commit 6238e9c into microsoft:main Mar 22, 2024
90 of 94 checks passed

kunal-vaishnavi added the release:1.17.3 label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLaMA end-to-end benchmarking #19985

Add LLaMA end-to-end benchmarking #19985

kunal-vaishnavi commented Mar 20, 2024

Add LLaMA end-to-end benchmarking #19985

Add LLaMA end-to-end benchmarking #19985

Conversation

kunal-vaishnavi commented Mar 20, 2024

Description

Motivation and Context