Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generation configuration for LLM #62

Open
Tracked by #204
RyanMarten opened this issue Nov 12, 2024 · 5 comments
Open
Tracked by #204

Support generation configuration for LLM #62

RyanMarten opened this issue Nov 12, 2024 · 5 comments
Assignees

Comments

@RyanMarten
Copy link
Contributor

RyanMarten commented Nov 12, 2024

e.g.

top_p
temperature
etc.

OpenAI completion parameters
https://platform.openai.com/docs/api-reference/chat/create

@RyanMarten RyanMarten self-assigned this Nov 12, 2024
@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

https://docs.litellm.ai/docs/completion/input

LiteLLM very nicely tracks supported openai params for any model + provider.

litellm.get_supported_openai_params()

Nice system for dealing with unsupported params:

By default, LiteLLM raises an exception if the openai param being passed in isn't supported.
To drop the param instead, set litellm.drop_params = True or completion(..drop_params=True).
This ONLY DROPS UNSUPPORTED OPENAI PARAMS.
LiteLLM assumes any non-openai param is provider specific and passes it in as a kwarg in the request body

def completion(
    model: str,
    messages: List = [],
    # Optional OpenAI params
    timeout: Optional[Union[float, int]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop=None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    # openai v1.0+ new params
    response_format: Optional[dict] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[str] = None,
    parallel_tool_calls: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    deployment_id=None,
    # soon to be deprecated params by OpenAI
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    # set api_base, api_version, api_key
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,  # pass in a list of api_base,keys, etc.
    # Optional liteLLM function params
    **kwargs,

) -> ModelResponse:

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

Based on litellm: https://docs.litellm.ai/docs/completion/input#input-params-1

List also in code here: https://github.com/BerriAI/litellm/blob/main/litellm/main.py#L843

We will support:

    model: str,
    messages: List = [],
    stop=None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[str] = None,
    parallel_tool_calls: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    # set api_base, api_version, api_key
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,

Right now we default to structured output { "type": "json_schema", "json_schema": {...} } instead of json output { "type": "json_object" }. If we want to support both, we need to change the way we are doing it. See the API reference.

    response_format: Optional[dict] = None,

We can support later if we choose, but won't include now

    user: Optional[str] = None,
    deployment_id=None,

We will not support:

    timeout: Optional[Union[float, int]] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,

Others to think about

    model_list: Optional[list] = None,  # pass in a list of api_base,keys, etc.
    # Optional liteLLM function params
    **kwargs,

Optional liteLLM functions params that look interesting:

input_cost_per_token: float (optional) - The cost per input token for the completion call
output_cost_per_token: float (optional) - The cost per output token for the completion call

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

https://docs.litellm.ai/docs/completion/batching#send-multiple-completion-calls-to-1-model

LiteLLM also supports messages being a list of lists, sending multiple completions

We should benchmark this vs doing threadpool over async completion with litellm.

EDIT: added to #74

@RyanMarten
Copy link
Contributor Author

Lessons learned from #77

  1. Don't think it makes sense to have all of these args through all the functions. Instead do kwargs and have a programmatic way of adding to the request body.

  2. Add these args to cache fingerprint

  3. Do we want to add all of these to GenericRequest - we can, but we don't have to.

Instead we can just pass in the kwargs to the constructor of the RequestProcessor. And then use

from litellm import get_supported_openai_params
supported_params = get_supported_openai_params(model="anthropic.claude-3", custom_llm_provider="bedrock")

And if kwarg in supported_params we can add it to the request body.

Can also share this main request body code between OpenAI batch and online so don't have to duplicate it.

@RyanMarten
Copy link
Contributor Author

There is also this litellm.OpenAIConfig object: https://docs.litellm.ai/docs/completion/provider_specific_params

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant