Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Enhance Model Identifier Validation Logic to Use Regex for Flexibility and Robustness, in client.py #72

Open
ytiam opened this issue Nov 26, 2024 · 2 comments

Comments

@ytiam
Copy link

ytiam commented Nov 26, 2024

Code block:

if ":" not in model:

The current logic for validating and extracting the provider and model parts from the model string uses a split(":") approach. This method works for basic cases but lacks robustness and flexibility when dealing with complex model names or edge cases. I suggest replacing this logic with a regular expression-based approach, which will allow more precise validation and parsing.

Here’s the proposed step-by-step plan for the enhancement:

  1. Replace the split(":") logic with a regex pattern that ensures the model string adheres to the expected format of provider:model.
  2. The regex should support:
    • Alphanumeric characters (a-z, A-Z, 0-9).
    • Special characters like underscores (_), hyphens (-), and dots (.) in the model part.
    • Valid formats such as google:gemini-1.5-flash-8b and google:gemini_v2.3.
  3. Add meaningful error messages to guide users when the model string format is invalid.
  4. Update or add relevant unit tests to verify the robustness of the new regex-based validation logic.

Current Behavior

Currently, the model validation logic uses split(":") to separate the provider and model components. This approach:

  • Does not validate the format of the model string.
  • Can lead to errors or unexpected behavior if the input string contains extra colons or lacks proper structure.

Example:

Input: google:gemini-1.5-flash-8b
Output: Correctly identifies google as the provider and gemini-1.5-flash-8b as the model.

Input: invalid:bad:format
Output: Results in unintended splits or errors.

Input: missing_colon
Output: Fails without clear feedback to the user.


Expected Behavior

The new regex-based validation logic should:

  1. Accurately validate the format of the model string.
  2. Provide clear and specific error messages when the format is incorrect.
  3. Ensure that only valid formats (e.g., provider:model) are processed.

Example:

  • Input: google:gemini-1.5-flash-8b → Passes validation, correctly identifies parts.
  • Input: invalid:bad:format → Fails with an error: "Invalid model format. Expected 'provider:model', got 'invalid:bad:format'."
  • Input: missing_colon → Fails with an error: "Invalid model format. Expected 'provider:model', got 'missing_colon'."

Why This Enhancement Would Be Useful to Most aisuite Users

  1. Robustness: The enhancement ensures that all inputs conform to the expected format, reducing the risk of unexpected behavior or bugs caused by invalid model strings.
  2. Flexibility: By allowing special characters like -, _, and . in the model names, this enhancement supports a broader range of use cases.
  3. User Experience: Clear and descriptive error messages improve the developer experience, making it easier to debug and use the library.
  4. Alignment with Best Practices: Many modern libraries use regex for input validation as it provides better precision and maintainability.

Inspiration from Other Projects:

  • The proposed enhancement aligns with common standards for handling structured strings.

If approved, I am happy to implement this enhancement and contribute tests to ensure the changes meet the project's standards.

@rohitprasad15
Copy link
Collaborator

Is there a bug that you are reporting? Can you give an example that will fail with the current approach ?

@ytiam
Copy link
Author

ytiam commented Dec 8, 2024

Hi @rohitprasad15 thanks for replying.
I proposed this as an enhancement as the present logic might fail in couple of scenarios. Please see below few such scenarios,

  1. Leading or Trailing Colons

Screenshot from 2024-12-08 11-43-16

or

Screenshot from 2024-12-08 11-46-47

Note: In my opinion, string validation cases are always better to handle with Regex. Please let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants