-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Local LLM Support #53
Comments
I personally use LM Studio for my local LLM server and would love to use it with this as well. hopefully the dev can use this example python code for future development. Example: reuse your existing OpenAI setupfrom openai import OpenAI Point to the local serverclient = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed") completion = client.chat.completions.create( print(completion.choices[0].message) |
We're about to add Ollama support to LlamaIndexTS first, see run-llama/LlamaIndexTS#305 - then it could be used in chat-llamaindex |
Some things are missing for the class |
I've managed to make it work, with LiteLLM (and Ollama behind it) by setting diff --git app/client/platforms/llm.ts app/client/platforms/llm.ts
index ddc316f..a907ee7 100644
--- app/client/platforms/llm.ts
+++ app/client/platforms/llm.ts
@@ -36,6 +36,9 @@ export const ALL_MODELS = [
"gpt-4-vision-preview",
"gpt-3.5-turbo",
"gpt-3.5-turbo-16k",
+ "mixtral_default",
+ "mistral",
+ "phi",
] as const;
export type ModelType = (typeof ALL_MODELS)[number]; And then creating a new bot using one of those models, and adjusting its params. |
@m0wer thanks! cool hack! Yes |
Thanks @marcusschiesser ! But I found additional problems when uploading files and building for production. In llamaindexTS there is a list of valid OpenAI model names so the type check fails. Maybe renaming mixtral as gpt4 in litellm would do the trick. I needed to do the Now I don't know if I should go on the direction of extending the Ollama class or doing the rename in litellm to just be able to reuse most of the current chat-llamaindex code. Any advice or ideas are welcome ;-) |
|
Yes that works. But what I'm trying to do is to get the complete https://chat.llamaindex.ai/ to work with local LLMs. So either I fake the OpenAI API with litellm or I extend the class Ollama from LlamaIndexTS to support the missing methods. |
To make it work with Ollama I had to adapt my reverse proxy settings and do the following changes: diff --git app/api/llm/route.ts app/api/llm/route.ts
index aa3066c..de9a806 100644
--- app/api/llm/route.ts
+++ app/api/llm/route.ts
@@ -4,7 +4,8 @@ import {
DefaultContextGenerator,
HistoryChatEngine,
IndexDict,
- OpenAI,
+ LLMMetadata,
+ Ollama,
ServiceContext,
SimpleChatHistory,
SummaryChatHistory,
@@ -120,6 +121,33 @@ function createReadableStream(
return responseStream.readable;
}
+class OllamaCustom extends Ollama {
+ maxTokens: number;
+
+ constructor(init: Partial<OllamaCustom> & {
+ model: string;
+ }) {
+ super(init);
+ this.maxTokens = init.maxTokens || 2048;
+ }
+
+ get metadata(): LLMMetadata {
+ return {
+ model: this.model,
+ temperature: this.temperature,
+ topP: this.topP,
+ maxTokens: this.maxTokens,
+ contextWindow: this.contextWindow,
+ tokenizer: undefined,
+ };
+ }
+
+ tokens(messages: ChatMessage[]): number {
+ let tokens = 10;
+ return tokens;
+ }
+}
+
export async function POST(request: NextRequest) {
try {
const body = await request.json();
@@ -146,11 +174,14 @@ export async function POST(request: NextRequest) {
);
}
- const llm = new OpenAI({
+ const llm = new OllamaCustom({
+ baseURL: "https://ollama.mydomain.tld",
model: config.model,
temperature: config.temperature,
topP: config.topP,
+ contextWindow: config.maxTokens,
maxTokens: config.maxTokens,
+ requestTimeout: 5 * 60 * 1000,
});
const serviceContext = serviceContextFromDefaults({
diff --git app/client/platforms/llm.ts app/client/platforms/llm.ts
index ddc316f..33273c9 100644
--- app/client/platforms/llm.ts
+++ app/client/platforms/llm.ts
@@ -31,11 +31,11 @@ export interface ResponseMessage {
}
export const ALL_MODELS = [
- "gpt-4",
- "gpt-4-1106-preview",
- "gpt-4-vision-preview",
- "gpt-3.5-turbo",
- "gpt-3.5-turbo-16k",
+ "mistral",
+ "mixtral_default",
+ "dolphin-mixtral:8x7b-v2.7-q4_K_M",
+ "llava",
+ "phi",
] as const;
export type ModelType = (typeof ALL_MODELS)[number];
diff --git package.json package.json
index 0ba2c1b..b6dfdfb 100644
--- package.json
+++ package.json
@@ -38,7 +38,7 @@
"dotenv": "^16.3.1",
"emoji-picker-react": "^4.4.12",
"encoding": "^0.1.13",
- "llamaindex": "0.0.0-20231110031459",
+ "llamaindex": "0.0.44",
"lucide-react": "^0.277.0",
"mermaid": "^10.3.1",
"nanoid": "^5.0.2", |
see #77 for how to use a Ollama |
Would like to be able to run this with local llm stacks like litellm or ollama etc.
Could you provide a parameter to specify llm and base url
The text was updated successfully, but these errors were encountered: