Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kserve:master #357

Merged
merged 50 commits into from
May 17, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented May 15, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

terrytangyuan and others added 30 commits April 7, 2024 10:17
* docs: Move Alibi explainer to docs

Signed-off-by: Yuan Tang <[email protected]>

* Empty-Commit

Signed-off-by: Yuan Tang <[email protected]>

* fix test

Signed-off-by: Yuan Tang <[email protected]>

* Empty-Commit

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
* build: Add flake8 and black to pre-commit hooks

Signed-off-by: Yuan Tang <[email protected]>

* fix path

Signed-off-by: Yuan Tang <[email protected]>

* pass config

Signed-off-by: Yuan Tang <[email protected]>

* fix flake8

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
…#3576)

* Set writable cache folder to avoid permission issue. Fixes #3562

Signed-off-by: Yuan Tang <[email protected]>

* Update huggingface_server.Dockerfile

Signed-off-by: Yuan Tang <[email protected]>

* Empty-Commit

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
* Add OpenAIModel support to model repository.

Signed-off-by: grandbora <[email protected]>

* Allow model server to register an openai model

Signed-off-by: grandbora <[email protected]>

* address comments

Signed-off-by: grandbora <[email protected]>

* fix format

Signed-off-by: grandbora <[email protected]>

* make black happy

Signed-off-by: grandbora <[email protected]>

* Python 3.9 can not do isinstance on union type

Signed-off-by: grandbora <[email protected]>

* add comment

Signed-off-by: grandbora <[email protected]>

* Use a base model

Signed-off-by: grandbora <[email protected]>

* fix formatting

Signed-off-by: grandbora <[email protected]>

* Fix case

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Bora <[email protected]>

* Fix case

Signed-off-by: grandbora <[email protected]>

---------

Signed-off-by: grandbora <[email protected]>
Signed-off-by: Bora <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
* updated xgboost to support json and ubj models

Signed-off-by: Andrews Arokiam <[email protected]>

* rename bst_model dir

Signed-off-by: Andrews Arokiam <[email protected]>

* bug fix

Signed-off-by: Andrews Arokiam <[email protected]>

* black format

Signed-off-by: Andrews Arokiam <[email protected]>

* black formatter

Signed-off-by: Andrews Arokiam <[email protected]>

* bug fix

Signed-off-by: Andrews Arokiam <[email protected]>

---------

Signed-off-by: Andrews Arokiam <[email protected]>
* google.golang.org/protobuf version upgrade

Signed-off-by: Andrews Arokiam <[email protected]>

* version upgrade

Signed-off-by: Andrews Arokiam <[email protected]>

---------

Signed-off-by: Andrews Arokiam <[email protected]>
* VLLM support for OpenAI Completions in HF server

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted imports

Signed-off-by: Gavrish Prabhu <[email protected]>

* minor fixes

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix lint

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix verify license

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix verify license

Signed-off-by: Gavrish Prabhu <[email protected]>

* Change base model

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix linter

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix tests

Signed-off-by: Gavrish Prabhu <[email protected]>

* Fix vllm Base and Chat Completion template

Signed-off-by: Gavrish Prabhu <[email protected]>

* Include Readme

Signed-off-by: Gavrish Prabhu <[email protected]>

* ignore file from linter and generate

Signed-off-by: Gavrish Prabhu <[email protected]>

* ignore file from linter and generate

Signed-off-by: Gavrish Prabhu <[email protected]>

* add codege license

Signed-off-by: Gavrish Prabhu <[email protected]>

* bring in openai errors

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix linting

Signed-off-by: Gavrish Prabhu <[email protected]>

* Remove openai import and update openai types codegen cmd

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted import

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix tests after conflict

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix poetry lock

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix poetry lock

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove openai from extras

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix logprobs

Signed-off-by: Gavrish Prabhu <[email protected]>

* send json repsone content of type openai error

Signed-off-by: Gavrish Prabhu <[email protected]>

---------

Signed-off-by: Gavrish Prabhu <[email protected]>
Fix model server stop method

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Provide full

Signed-off-by: Yuan Tang <[email protected]>

* Move to cmd directory

Signed-off-by: Yuan Tang <[email protected]>

* Add helm charts

Signed-off-by: Yuan Tang <[email protected]>

* regen

Signed-off-by: Yuan Tang <[email protected]>

* fix conflict

Signed-off-by: Yuan Tang <[email protected]>

* rebase

Signed-off-by: Yuan Tang <[email protected]>

* remove unused

Signed-off-by: Yuan Tang <[email protected]>

* remove redundant files

Signed-off-by: Yuan Tang <[email protected]>

* Empty-Commit

Signed-off-by: Yuan Tang <[email protected]>

* Rename file

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
* go lint fix

Signed-off-by: Andrews Arokiam <[email protected]>

* commit for golangci

Signed-off-by: Andrews Arokiam <[email protected]>

* rewrite if-else to switch statement

Signed-off-by: Andrews Arokiam <[email protected]>

* fix for the response body

Signed-off-by: Andrews Arokiam <[email protected]>

---------

Signed-off-by: Andrews Arokiam <[email protected]>
Ignore protected namespaces. Don't set json_loads

Signed-off-by: Curtis Maddalozzo <[email protected]>
* build: Fix CRD copying in generate-install.sh

Signed-off-by: Yuan Tang <[email protected]>

* Empty-Commit

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Co-authored-by: Sivanantham <[email protected]>
Remove replace for golang.org/x/net and fix CVE-2023-45288 for qpext

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
test re run

Signed-off-by: Andrews Arokiam <[email protected]>
* OpenAI data models and endpoints from vLLM

Signed-off-by: Tessa Pham <[email protected]>

more components for OpenAI endpoints

Signed-off-by: Tessa Pham <[email protected]>

add OpenAI endpoints to router

Signed-off-by: Tessa Pham <[email protected]>

modify generate() in data plane

Signed-off-by: Tessa Pham <[email protected]>

class OpenAIModel

Signed-off-by: Tessa Pham <[email protected]>

delete and rename files

Signed-off-by: Tessa Pham <[email protected]>

add create_chat_completion() to OpenAIModel

Signed-off-by: Tessa Pham <[email protected]>

update routers and lint

Signed-off-by: Tessa Pham <[email protected]>

* Implement streaming

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Register OpenAI endpoints when appropriate

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Remove completion types from dataplane methods

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Add OpenAI endpoint support to huggingfaceserver

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Allow accessing headers and response in completion methods

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Create separate model for completion and chat completion requests

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Add stop function for handling model shutdown

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Add arg for remote code param

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Add option to allow selecting model backend

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Pin ray to 2.10.x

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Use correct type in tests

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Refactor encoder-decoder and decoder only models into separate classes.

Fix tests.

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Add more test cases. Factor models out into fixtures.

Pass loop as argument to the background request handler.

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Remove unneccessary None check

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Properly handle unsupported models.

Don't try to load table question answering models as they are not
supported.

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Remove models we don't support

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Pass in predictor config

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Fix test assertion. Remove debug lines

Signed-off-by: Curtis Maddalozzo <[email protected]>

---------

Signed-off-by: Tessa Pham <[email protected]>
Signed-off-by: Curtis Maddalozzo <[email protected]>
Co-authored-by: Tessa Pham <[email protected]>
* build: Remove misleading logs from minimal-crdgen.sh

Signed-off-by: Yuan Tang <[email protected]>

* Add file

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
* Fix v2 predict for hf

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add e2e test for hf

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix post processing and e2e image build

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Increase memory limit

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix output for v2

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add more tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Reduce parallelism

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Use backend argument

Signed-off-by: Dan Sun <[email protected]>

* Update to use chat completion endpoint

Signed-off-by: Dan Sun <[email protected]>

* Fix openai tests

Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
…erver (#3594)

* set SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY

Signed-off-by: Lize Cai <[email protected]>

* add doc on the default value

Signed-off-by: Lize Cai <[email protected]>

---------

Signed-off-by: Lize Cai <[email protected]>
Signed-off-by: Curtis Maddalozzo <[email protected]>
… backend (#3657)

* Assign device of input tensors

Signed-off-by: sailgpu <[email protected]>

* lint fix

Signed-off-by: sailgpu <[email protected]>

---------

Signed-off-by: sailgpu <[email protected]>
* Test image builds for ARM64 arch in CI

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Update lockfiles

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add ARM64 support for paddle

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Encoder-decoder models do not include input tokens in their output

Signed-off-by: Curtis Maddalozzo <[email protected]>

* Pass stopping criteria into streamer

Signed-off-by: Curtis Maddalozzo <[email protected]>

---------

Signed-off-by: Curtis Maddalozzo <[email protected]>
* Added the field AdditionalIngressDomains into the struct IngressConfig

Signed-off-by: Vincent Hou <[email protected]>

* Added the additional ingress domains into the hosts

Signed-off-by: Vincent Hou <[email protected]>

* Fixed the indentation

Signed-off-by: Vincent Hou <[email protected]>

* Added isvc name and namespace into the domain name

* Added the validation for the URLs

Signed-off-by: Vincent Hou <[email protected]>

* Validate the domain in the additionalIngressDomains

Signed-off-by: Vincent Hou <[email protected]>

* Create the hosts from the list of additionalIngressDomains

Signed-off-by: Vincent Hou <[email protected]>

* Change the way to validate the host

Signed-off-by: Vincent Hou <[email protected]>

* Change the validation error message

Signed-off-by: Vincent Hou <[email protected]>

* Revert the name to url

Signed-off-by: Vincent Hou <[email protected]>

* Get all the available domain list

Signed-off-by: Vincent Hou <[email protected]>

* gofmt -s -w the file

Signed-off-by: Vincent Hou <[email protected]>

* Add additionalIngressDomains into the charts

Signed-off-by: Vincent Hou <[email protected]>

* Added the comments and refactor the tests

Signed-off-by: Vincent Hou <[email protected]>

* Regenerate the manifests

Signed-off-by: Vincent Hou <[email protected]>

* Modify createHTTPMatchRequest, the charts and the test cases

Signed-off-by: Vincent Hou <[email protected]>

* Run make generate

Signed-off-by: Vincent Hou <[email protected]>

---------

Signed-off-by: Vincent Hou <[email protected]>
cmaddalozzo and others added 14 commits May 6, 2024 16:25
Signed-off-by: Curtis Maddalozzo <[email protected]>
upgrade vllm version

Signed-off-by: Johnu George <[email protected]>
Signed-off-by: Curtis Maddalozzo <[email protected]>
…Fixes #3452 (#3603)

* feat: Support customizable deployment strategy for RawDeployment mode

Signed-off-by: Yuan Tang <[email protected]>

* regen

Signed-off-by: Yuan Tang <[email protected]>

* lint

Signed-off-by: Yuan Tang <[email protected]>

* Correctly apply rollingupdate

Signed-off-by: Yuan Tang <[email protected]>

* address comments

Signed-off-by: Yuan Tang <[email protected]>

* Add validation

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
* Enable dtype for huggingface server

Signed-off-by: Dattu Sharma <[email protected]>

* Set float16 as default. Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Add small comment to make the changes understandable

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Adapt to new huggingfacemodel

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup merge :)

Signed-off-by: Dattu Sharma <[email protected]>

* Explicitly mention the behaviour of dtype flag on auto.

Signed-off-by: Dattu Sharma <[email protected]>

* Default to FP32 for encoder models

Signed-off-by: Dattu Sharma <[email protected]>

* Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Update poetry

Signed-off-by: Dattu Sharma <[email protected]>

* Use torch.float32 forr tests explicitly

Signed-off-by: Dattu Sharma <[email protected]>

---------

Signed-off-by: Dattu Sharma <[email protected]>
* fix for extract zip from gcs

Signed-off-by: Andrews Arokiam <[email protected]>

* initial commit for gcs model download unittests

Signed-off-by: Andrews Arokiam <[email protected]>

* unittests for model download from gcs

Signed-off-by: Andrews Arokiam <[email protected]>

* black format fix

Signed-off-by: Andrews Arokiam <[email protected]>

* code verification

Signed-off-by: Andrews Arokiam <[email protected]>

---------

Signed-off-by: Andrews Arokiam <[email protected]>
* update wording for huggingface README

small update to make readme easier to understand

Signed-off-by: Alexa Griffith  <[email protected]>

* Update README.md

Signed-off-by: Alexa Griffith [email protected]

* Update python/huggingfaceserver/README.md

Co-authored-by: Filippe Spolti <[email protected]>
Signed-off-by: Alexa Griffith  <[email protected]>

* update vllm

Signed-off-by: alexagriffith <[email protected]>

* Update README.md

---------

Signed-off-by: Alexa Griffith  <[email protected]>
Signed-off-by: Alexa Griffith [email protected]
Signed-off-by: alexagriffith <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Filippe Spolti <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
* fix: HPA equality check should include annotations

Signed-off-by: Yuan Tang <[email protected]>

* Only watch related autoscalerclass annotation

Signed-off-by: Yuan Tang <[email protected]>

* simplify

Signed-off-by: Yuan Tang <[email protected]>

* Add missing delete action

Signed-off-by: Yuan Tang <[email protected]>

* fix logic

Signed-off-by: Yuan Tang <[email protected]>
---------

Signed-off-by: Yuan Tang <[email protected]>
fix huggingface runtime in chart

Signed-off-by: Dan Sun <[email protected]>
* fix huggingface runtime in chart

Signed-off-by: Dan Sun <[email protected]>

* Allow model_dir to be specified on template

Signed-off-by: Dan Sun <[email protected]>

* Default model_dir to /mnt/models for HF

Signed-off-by: Dan Sun <[email protected]>

* Lint format

Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Dan Sun <[email protected]>
* Fix:vLLM Model Supported check throwing circular dependency

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix return case

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix to check all arch in model config forr vllm support

Signed-off-by: Gavrish Prabhu <[email protected]>

* fixlint

Signed-off-by: Gavrish Prabhu <[email protected]>

---------

Signed-off-by: Gavrish Prabhu <[email protected]>
Fix: allow null in Finish reason

Signed-off-by: Gavrish Prabhu <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

openshift-ci bot commented May 15, 2024

Hi @pull[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

openshift-ci bot commented May 15, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pull[bot]
Once this PR has been reviewed and has the lgtm label, please assign vedantmahabaleshwarkar for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pull pull bot added merge-conflict Resolve conflicts manually and removed needs-rebase labels May 16, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 929471b into opendatahub-io:master May 17, 2024
1 check failed
Jooho pushed a commit to Jooho/kserve that referenced this pull request Jul 19, 2024
…storage-initializer-211

Red Hat Konflux update kserve-storage-initializer-211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⤵️ pull merge-conflict Resolve conflicts manually
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.