Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added support for audio timestamp understanding to Google Vertex #4061

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/poor-apples-punch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@ai-sdk/google': patch
---

feat: adding audioTimestamp support to GoogleGenerativeAISettings
7 changes: 7 additions & 0 deletions content/providers/01-ai-sdk-providers/11-google-vertex.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,13 @@ The following optional settings are available for Google Vertex models:

Optional. When enabled, the model will [use Google search to ground the response](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/overview).

- **audioTimestamp** _boolean_

Optional. Enables timestamp understanding for audio files. Defaults to false.

This is useful for generating transcripts with accurate timestamps.
Only available for `gemini-1.5-pro-002` and `gemini-1.5-flash-002`.

You can use Google Vertex language models to generate text with the `generateText` function:

```ts highlight="1,4"
Expand Down
19 changes: 19 additions & 0 deletions examples/ai-core/src/e2e/google-vertex.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,25 @@ describe.each(Object.values(RUNTIME_VARIANTS))(
expect(result.text.toLowerCase()).toContain('cat');
expect(result.usage?.totalTokens).toBeGreaterThan(0);
});

it('should generate text from audio input', { timeout: LONG_TEST_MILLIS }, async () => {
const model = vertex(modelId);
const result = await generateText({
model,
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Output a transcript of spoken words. Break up transcript lines when there are pauses. Include timestamps in the format of HH:MM:SS.SSS.' },
{ type: 'file', data: Buffer.from(fs.readFileSync('./data/galileo.mp3')), mimeType: 'audio/mpeg' },
],
},
],
});
expect(result.text).toBeTruthy();
expect(result.text.toLowerCase()).toContain('galileo');
expect(result.usage?.totalTokens).toBeGreaterThan(0);
});
});

describe.each(MODEL_VARIANTS.embedding)('Embedding Model: %s', modelId => {
Expand Down
30 changes: 30 additions & 0 deletions examples/ai-core/src/generate-text/google-vertex-audio.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { vertex } from '@ai-sdk/google-vertex';
import { generateText } from 'ai';
import 'dotenv/config';
import fs from 'node:fs';

async function main() {
const result = await generateText({
model: vertex('gemini-1.5-flash', { audioTimestamp: true }),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'Output a transcript of spoken words. Break up transcript lines when there are pauses. Include timestamps in the format of HH:MM:SS.SSS.',
},
{
type: 'file',
data: Buffer.from(fs.readFileSync('./data/galileo.mp3')),
mimeType: 'audio/mpeg',
},
],
},
],
});

console.log(result.text);
}

main().catch(console.error);
1 change: 1 addition & 0 deletions packages/google/src/google-generative-ai-language-model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ export class GoogleGenerativeAILanguageModel implements LanguageModelV1 {
this.supportsStructuredOutputs
? convertJSONSchemaToOpenAPISchema(responseFormat.schema)
: undefined,
audioTimestamp: this.settings.audioTimestamp,
};

const { contents, systemInstruction } =
Expand Down
9 changes: 9 additions & 0 deletions packages/google/src/google-generative-ai-settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,15 @@ Optional. A list of unique safety settings for blocking unsafe content.
| 'BLOCK_ONLY_HIGH'
| 'BLOCK_NONE';
}>;
/**
* Optional. Enables timestamp understanding for audio-only files.
* This is a preview feature.
*
* Available for the following models:
* - gemini-1.5-pro-002
* - gemini-1.5-flash-002
*/
audioTimestamp?: boolean;

/**
Optional. When enabled, the model will use Google search to ground the response.
Expand Down