Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tensor allocations in servings #233

Merged
merged 2 commits into from
Aug 3, 2023
Merged

Conversation

jonatanklosko
Copy link
Member

@jonatanklosko jonatanklosko commented Aug 3, 2023

Closes #217.

  1. We want to always allocate tokenization input using binary backend, because it's zero copy, and there is no reason to involve XLA too early.

  2. A new :preallocate_params option that moves params to the device as defined by :defn_options. This can be useful with multiple GPUs, where we could load params into CPU and then use :preallocate_params so each serving partition allocates params on the corresponding device.

@jonatanklosko jonatanklosko merged commit 333ba09 into main Aug 3, 2023
@jonatanklosko jonatanklosko deleted the jk-serving-improvements branch August 3, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serving improvements
2 participants