Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bedrock): add multimodal support for documents, images and videos #1888

Conversation

tzolov
Copy link
Contributor

@tzolov tzolov commented Dec 7, 2024

  • Add BedrockMediaFormat class to handle media format conversions for documents, images and videos
  • Enhance Media class with builder pattern and comprehensive format constants
  • Refactor BedrockProxyChatModel to support multimodal content handling
  • Add integration tests for PDF, image and video processing
  • Add unit tests for Media and BedrockMediaFormat classes
  • Upgrade AWS SDK version from 2.26.7 to 2.29.29
  • Remove redundant aws.sdk.version property in favor of awssdk.version
  • Documentation updates for Bedrock Converse API
    • Added multimodal support documentation (images, video, documents)
    • Added deprecation notices for existing Bedrock model implementations
    • Updated feature comparison table
    • Added warning notes about transitioning to Converse API

Resolves #1887

- Add BedrockMediaFormat class to handle media format conversions for documents, images and videos
- Enhance Media class with builder pattern and comprehensive format constants
- Refactor BedrockProxyChatModel to support multimodal content handling
- Add integration tests for PDF, image and video processing
- Add unit tests for Media and BedrockMediaFormat classes
- Upgrade AWS SDK version from 2.26.7 to 2.29.29
- Remove redundant aws.sdk.version property in favor of awssdk.version

Resolves spring-projects#1887
- Added multimodal support documentation (images, video, documents)
- Added deprecation notices for existing Bedrock model implementations
- Updated feature comparison table
- Added warning notes about transitioning to Converse API
public Builder data(URL url) {
Assert.notNull(url, "URL must not be null");
this.data = url.toString();
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra semi

*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add since on merge.

}

public Media build() {
return new Media(mimeType, data, id, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should verify that required fields are set, I presume mimeType and media are required but there can be defaults generated for id and name. The default for no media name shouldn't be nope and it seems like id could be null? It is retrieved from the model response for the Audio response.

* @author Christian Tzolov
* @since 1.0.0
*/
public class BedrockMediaFormat {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make abstract

* @author Christian Tzolov
*/
@SpringBootTest(classes = BedrockNovaChatClientIT.Config.class)
@EnabledIfEnvironmentVariable(named = "AWS_ACCESS_KEY_ID", matches = ".*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use @RequiresAwsCredentials meta annotation instead


throw new IllegalArgumentException("Unknown location: " + request.location());
})
.inputType(WeatherRequest.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added .description("Get the weather for a city in Celsius")

note, that the description we normally use for this didn't work .description("Get the weather in location")

@@ -354,6 +361,91 @@ else if (prompt.getOptions() instanceof ChatOptions) {
.build();
}

private ContentBlock mapMediaToContentBlock(Media media) {

var mimeType = media.getMimeType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should look to break up this multi-screen function into if/else structure with additional helper methods.

@markpollack
Copy link
Member

Did minor cleanup. I ran all the aws tests, made some small changes to have the tests be less brittle. getWeather function description that used to work, isn't working any more on nova.

@markpollack
Copy link
Member

so cool to see this.

merged in 51261e4 but may not get a green build in CI as the there is something going on the the Ollama tests.

@markpollack markpollack closed this Dec 9, 2024
@tzolov
Copy link
Contributor Author

tzolov commented Dec 9, 2024

I'm re-opening because I don't see the code merged in main

@tzolov tzolov reopened this Dec 9, 2024
@markpollack
Copy link
Member

now really merged! 9a5d61c

Made only id in Media nullable, kept two arg ctors at they are more terse than using the builder pattern. Added more tests for Media to ensure builder and ctor behave the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multimodal support to Bedrock Converse integration
2 participants