Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added support for audio timestamp understanding to Google Vertex #4061

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

timconnorz
Copy link

Changes to have the Google Vertex provider support audioTimestamp understanding

  • Updated the google-cloud/vertex package to latest (v1.9.2) which is required for..
  • Added audioTimestamp to GoogleVertexSettings, which is passed to the GenerationConfig of the sdk

@shaper
Copy link
Contributor

shaper commented Dec 11, 2024

Hi there, thank you for the contribution! As you may have seen, we recently shipped a 2.0 update to the google-vertex provider:

https://x.com/aisdk/status/1866044262409765270

As part of this we moved to using the Vertex AI Gemini REST API instead of the google-cloud/vertex package. It is likely pretty straightforward to add it using REST instead.

Just looking briefly at the example on the page you linked it looks like the submitted audio would be handled as a file attachment, which we already have support for, so I am not sure we need the cachedContent setting. We would need a way to add the "generatationConfig": { "audioTimestamp": true }. I think this would require using experimental_providerMetadata to tag the message with the file, and then in message conversion or just outside of it we'd add it to the request as needed. @lgrammel may have further thoughts.

We would need unit tests for new logic, demo scripts in examples/ai-core/src/{generate,stream}Text with a sample audio snippet, and added test cases similarly for generate/stream in the examples/ai-core/src/e2e/google-vertex.test.ts file.

If this sounds like a lot we can put it in our feature request queue, please file an issue or link to one if it already exists.

@timconnorz
Copy link
Author

timconnorz commented Dec 11, 2024

@shaper I've updated the PR, it's only two edits to support this now! You can use it by passing audioTimestamp param to the model settings. This settings object is also where you configure other output-effecting parameters like structuredOutputs, safetySettings, etc. so I figured it made sense to live here.

image

@lgrammel
Copy link
Collaborator

Lgtm. We would need an example under examples/ai-core to see how this works, a changeset (patch release), and docs updated for vertex.

@timconnorz
Copy link
Author

@lgrammel I've added an example, updated the docs, and added a changeset file. let me know if this is satisfactory! thanks for your guidance 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants