-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image generation with Dall-e #398
base: main
Are you sure you want to change the base?
Conversation
@ceicke I know, right? I'm really excited about tools. I'm adding a bunch of them. I want to turn this project into my own personal Jarvis / Samantha / Computer (star trek) — pick your favorite sci fi computer reference. :) On the API key, I already solved this in a branch so I just plucked it into main. Rebase and you can now access Current.user within your tool. commit: 00be174 |
195ebc4
to
38d6604
Compare
Can you share all 4 messages? The one you wrote, the 2 tool-related ones,
and then this was the 4th one, I believe:
…On Thu, Jun 13, 2024 at 8:15 AM Christoph Eicke ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In app/services/toolbox/dalle.rb
<#398 (comment)>:
> @@ -0,0 +1,19 @@
+class Toolbox::Dalle < Toolbox
+
+ describe :generate_an_image, <<~S
+ Generate an image based on what the user asks you to generate. You will pass the user's prompt and will get back a URL to an image.
Ok, I understand what you mean. Here is the Message that I got as a reply:
=> #<Message:0x0000ffff9b0cc6c8
id: 1048909790,
conversation_id: 962582977,
role: "assistant",
content_text:
"Here is an image of a bottle:\n\n![Bottle](https://oaidalleapiprodscus.blob.core.windows.net/private/org-3gmrqp9cIUIbQLYMoIkhGTAb/user-l5OLBM3CmF42GH5rlcxG3eQH/img-P0sNHzxaDyBXtRsqmzI5g1TS.png?st=2024-06-13T11%3A58%3A37Z&se=2024-06-13T13%3A58%3A37Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-06-12T18%3A49%3A07Z&ske=2024-06-13T18%3A49%3A07Z&sks=b&skv=2023-11-03&sig=4nxEJvSJmaIK%2BNnrrn1c%2BiLd3JSkJudSvYhSHiIv9uk%3D)",
created_at: Thu, 13 Jun 2024 12:58:49.176219000 UTC +00:00,
updated_at: Thu, 13 Jun 2024 12:58:56.704132000 UTC +00:00,
content_document_id: nil,
run_id: nil,
assistant_id: 1013349885,
processed_at: Thu, 13 Jun 2024 12:58:49.259744000 UTC +00:00,
index: 3,
version: 2,
cancelled_at: nil,
branched: false,
branched_from_version: nil,
content_tool_calls: {},
tool_call_id: nil,
versions: "">
As you can see, the .content_text contains the Markdown including an
image link. I think this part needs to be extracted, image downloaded and
attached to the Message object like you described and then this part of the
.content_text markup needs to be removed.
Is it like this, or am I missing something?
—
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAIR5PJIMY6LWPPI3K6PMTZHGLQBAVCNFSM6AAAAABJCQTRQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMJVG43DMOJTG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I just pushed my changes... I can't quite get the document attached to the message and I don't understand why... maybe you can spot it. Code is not really nice to look at. |
app/models/message.rb
Outdated
document = Document.build(message: self) | ||
p document | ||
|
||
image_urls.each do |image_url| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense yet to iterate over multiple image_urls and then only attaching one... WIP WIP WIP :-D
I'm taking a look. But the key thing is that the reason GPT-4o is including the image as markdown is just because it's doing it's best to be helpful. We don't want to parse that markdown and grab the image, instead we want to tell GPT not to include the image in markdown because we're going to pre-attach it. I'll over-share a little bit here just in case the additional context helps you. First, I had a simple conversation — 1 message and 1 response: And in console I see there are 4 messages, as I expected. There are 2 hidden tool messages preceding the assistants reply:
Message 2 of 4 is the API deciding to call a tool, which our system does. And then message 3 of 4 is our system creating the tool's response. This is the key one. This is the serialized hash that your function is returning as json, stored in the content text. Here is the key part:
First, I'll manually attach this image in console just to work through the code:
I just refreshed my conversation and now I see it: Now, how do we do this automatically and get it to skip the markdown. It's all about handling the 3rd message differently. The place this third message got created was this line of code: So here's my suggestion for how to get the image cleanly attached and then tell the AI not to do it as markdown:
|
@krschacht again, thanks for your patience and guiding me through it. It works fine. Almost. There is an error happening and here is the important part of it:
I redacted a bunch of stuff. I think what is happening is that it's trying to display the |
Hmm, this is tricky. I'm scratching my head on this one. But when I initially wrote this document / file / active storage stuff I never fully wrapped my mind around file variants and how processing of that worked. The root thing that I got stuck on was: Some methods of accessing a file variant (I think the typical methods) are blocking: if the variant doesn't exist it processes it right then and hangs until it's done. I didn't want this. It took me a bunch of trial and error to figure out how to check if a variant is processed in a non-blocking way so that I could show a fallback image / spinner and then just check again on some interval. I got it working, but it was definitely a case where I wrote some code that I didn't fully understand what it was doing and when I finally got it working I was done staring at that problem. :) I've been wanting to come back to it. I even have a tiny bit of work I did on it in another branch: There is a small diff on this file (linking straight to document.rb) and the one right below it: But that diff is solving a different problem I ran into with images. I don't think it solves the problem you're bumping into.... But I'm not sure what problem you're bumping into... :) I think you shoudl put a breakpoint in document.rb right here:
I assume file.attached? is returning true and variant.present? are returning true. But they're somehow lying and the variant isn't fully done? It may take some trial and error on this command The problem with this breakpoint is that there may be a bug at the moment the breakpoint is hit, but by the time you get in console the image may have finished processing and when you execute I'm really not sure how to debug this... Ohoh, did you try uploading an image yourself into a conversation, through the app's UI? Does that fully work? You should be 100% certain that the existing image upload flow works. Ideally try uploading a really big image (10-20mb so that it takes a few seconds to process) and confirm for yourself that you initially see a spinner on the image and then later it pops in. Note that there are two places to check: the image appearing in your conversation directly and, click the image, there is a different variant that opens int he modal. We want to make sure there isn't some deeper configuration issue in your environment which is breaking all image processing. Like maybe a corrupted libmagick install or something (I forget what library the app uses). The last thing that occurs to me is: active job is pushing a We could try to re-write the logic of
If image uploads fully work for you when done manually, and if poking around that binding.pry does not yield any clues, then I'd try this solution. I don't love it because it's not actually fixing the problem it's just side stepping it. It would be forcing actioncable pushed messages to never render images and then letting the front-end refresh the image. But I still can't imagine why an image variant check would fail in a queue context but work in a controller context. |
@@ -103,7 +103,8 @@ def generate_url(key, expires_in:, filename:, disposition:, content_type:) | |||
) | |||
|
|||
generated_url = url_helpers.rails_postgresql_service_url(verified_key_with_expiration, | |||
**url_options, | |||
# **url_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krschacht had some time today to look into the exception again. It looks like the **url_options
are nil
. Replacing it with only_path: true
solves the issue (at least in development). I am not sure if this is an acceptable fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. That makes sense. I can imagine the URL or host configuration may be different when running in workers. Probably we can just leave that change in place for always. It’s not like it ever needed a full URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it will create a problem down the road for larger deployments with a CDN. But that's a future problem then.
Fix missing item separator.
I just converted the PR to a draft to help me stay organized in knowing which ones are waiting on me to review. But feel free to switch it back whenever you're ready. |
Hey @krschacht, do you have an idea why the test is failing? I can't figure it out, especially since the test above it basically tests a very similar thing... |
Hi @ceicke, I haven’t had a chance to dig in, sorry for the delay! I think I’ll have some time tomorrow AM. I’ve been doing some travel in England so a bit going on, but I’ll be catching up on things soon. |
d = Document.new | ||
d.file.attach(io: URI.open(url_of_dalle_generated_image), filename: 'image.png') | ||
assistant_reply.documents << d | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should assume that @message.content_tool_calls
can have multiple Dalle toolbox calls. This means that within msgs.each do |tool_message|
each of those calls will set the url_of_dalle_generated_image
overwriting the previous.
So let's collect multiple url_of_dalle_generated_image
into an array so that in this block here we push multiple documents into the assistant_reply.
@ceicke I looked closely at the two failing test cases. In document_test.rb line 57, you just need to remove the assertion that the URL starts with "http" since it's now Also, can you mirror the structure of the test
After this, I think we can merge this in! |
Co-authored-by: Keith Schacht <[email protected]>
Co-authored-by: Keith Schacht <[email protected]>
@@ -56,6 +65,52 @@ class GetNextAIMessageJobOpenaiTest < ActiveJob::TestCase | |||
refute second_new_message.finished?, "This message SHOULD NOT be considered finished yet" | |||
end | |||
|
|||
test "properly handles a tool response call from the assistant when images are included" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krschacht I tried to implement a test for the job, but I cannot get it to work. It already fails to add 2 new messages. Maybe I setup something wrong in the fixtures or in the setup
...
Hello! Any news about the image generation feature? I think this should be more of a priority than tokens, for example. Image generation based on LLM answers is the gold standard for chat apps nowadays. With this feature, hostedgpt will be one of the best open-source solutions for using key AI tools today — LLM and content generation. Both individually and as a team. Especially if you also add a mode for simultaneous interaction with multiple neural networks (like in chathub). |
Yes, sorry @unameisfine and @ceicke, this was on my plate. Ack. I'll make a point of digging in over the next couple days to unblock this. I know Chris has gotten it really close! Not to get too distracted, but unameisfine — I've never had a desire for simultaneous chatting with multiple LLMs. Can you share more about why that's something you would want to do? I suppose when I'm doing a quick eval of which LLM to use as a developer, but in that case there isn't a huge motivation to set up your own open-source self-hosted version since it's not an on-going use-case. You can just visit https://sdk.vercel.ai/playground or openrouter.ai and do the quick test there. |
It can be found in the Chrome plugin Chathub. This feature, which allows simultaneous speaking with LLMs, can be used for the comparison of answers when dealing with difficult questions. |
continuing work on this here: #526 |
@krschacht this is wild... and so easy to implement.
On a more serious note, is this the direction you are looking for?
I would need help to get the current user in the
Toolbox::Dalle
class. Not sure how I can access it to get theopenai_api_key
.