-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama stream multiple models supported #3505
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This is fantastic, but I think causing some merge conflicts with some of your earlier changes. Can you rebase off |
Or you can provide multiple messages. Nb. The default is for messages to be submitted in Llama2 format, if you are using a different backend model with `node-llama-cpp` then it is possible to specify the model format to use using the `streamingModel` option. The supported formats are: | ||
- `llama2` - A Llama2 model, this is the default. | ||
- `chatML` - A ChatML model | ||
- `falcon` - A Falcon model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the indents different for each?
@@ -29,8 +30,14 @@ export interface LlamaCppInputs | |||
export interface LlamaCppCallOptions extends BaseLanguageModelCallOptions { | |||
/** The maximum number of tokens the response should contain. */ | |||
maxTokens?: number; | |||
/** A function called when matching the provided token array */ | |||
onToken?: (tokens: number[]) => void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to be backwards compatible. Instead we should mark as deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The onToken
was included in error and was not actually applied as the eventual implementation used a raw call. I can mark it as deprecated and not used if preferred.
* 'falcon' - A Falcon model | ||
* 'genral' - Any other model, uses "### Human\n", "### Assistant\n" format | ||
*/ | ||
streamingModel?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a union type or a check in the constructor that verifies the input is one of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea, I'll move this up to the constructor.
expect(chunks.length).toBeGreaterThan(1); | ||
}); | ||
|
||
test.skip("test multi-mesage streaming call", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi if you were unaware, we don't run int
tests in CI so no need to add skip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! I have a few comments, please re-request my review once resolved!
Resubmitted based off main under #3588. |
This adds the
streamingModel
option to the streaming methods properties, this can be set to:llama2
,chatML
,falcon
, orgeneral
. This then turns messages passed to the streaming model into the appropriate format for that model. If no option is specified it defaults to bellama2
.