Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Local LLM Support #53

Open
rapidarchitect opened this issue Dec 16, 2023 · 10 comments
Open

[Feature] Local LLM Support #53

rapidarchitect opened this issue Dec 16, 2023 · 10 comments

Comments

@rapidarchitect
Copy link

Would like to be able to run this with local llm stacks like litellm or ollama etc.

Could you provide a parameter to specify llm and base url

@kagevazquez
Copy link

I personally use LM Studio for my local LLM server and would love to use it with this as well.

hopefully the dev can use this example python code for future development.

Example: reuse your existing OpenAI setup

from openai import OpenAI

Point to the local server

client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=[
{"role": "system", "content": "Always answer in rhymes."},
{"role": "user", "content": "Introduce yourself."}
],
temperature=0.7,
)

print(completion.choices[0].message)

@marcusschiesser
Copy link
Collaborator

We're about to add Ollama support to LlamaIndexTS first, see run-llama/LlamaIndexTS#305 - then it could be used in chat-llamaindex

@m0wer
Copy link

m0wer commented Jan 11, 2024

We're about to add Ollama support to LlamaIndexTS first, see run-llama/LlamaIndexTS#305 - then it could be used in chat-llamaindex

Some things are missing for the class Ollama to fit into the current implementation, at least the maxTokens metadata entry and the tokens() method.

@m0wer
Copy link

m0wer commented Jan 11, 2024

I've managed to make it work, with LiteLLM (and Ollama behind it) by setting export OPENAI_BASE_URL=https://litellm.mydomain.tld/v1 and changing

diff --git app/client/platforms/llm.ts app/client/platforms/llm.ts
index ddc316f..a907ee7 100644
--- app/client/platforms/llm.ts
+++ app/client/platforms/llm.ts
@@ -36,6 +36,9 @@ export const ALL_MODELS = [
   "gpt-4-vision-preview",
   "gpt-3.5-turbo",
   "gpt-3.5-turbo-16k",
+  "mixtral_default",
+  "mistral",
+  "phi",
 ] as const;

 export type ModelType = (typeof ALL_MODELS)[number];

And then creating a new bot using one of those models, and adjusting its params.

@marcusschiesser
Copy link
Collaborator

@m0wer thanks! cool hack! Yes Ollama currently doesn't have tokens() implemented, that's why SummaryChatHistory is not working with it. But SimpleChatHistory should.
You can try setting sendMemory to false for your bot, see:
https://github.com/run-llama/chat-llamaindex/blob/aeee808134a9b267d22d3d48900ba7393e37cdbc/app/api/llm/route.ts#L167C1-L169

@m0wer
Copy link

m0wer commented Jan 12, 2024

Thanks @marcusschiesser ! But I found additional problems when uploading files and building for production. In llamaindexTS there is a list of valid OpenAI model names so the type check fails. Maybe renaming mixtral as gpt4 in litellm would do the trick.

I needed to do the sendMemory to false as you said to get it to work but that was it for the development version.

Now I don't know if I should go on the direction of extending the Ollama class or doing the rename in litellm to just be able to reuse most of the current chat-llamaindex code.

Any advice or ideas are welcome ;-)

@marcusschiesser
Copy link
Collaborator

llamaindexTS there is a list of valid OpenAI model names so the type check fails.
This example doesn't work with mixtral? https://github.com/run-llama/LlamaIndexTS/blob/main/examples/ollama.ts
Should work as it's using model: string

@m0wer
Copy link

m0wer commented Jan 12, 2024

llamaindexTS there is a list of valid OpenAI model names so the type check fails.
This example doesn't work with mixtral? https://github.com/run-llama/LlamaIndexTS/blob/main/examples/ollama.ts
Should work as it's using model: string

Yes that works. But what I'm trying to do is to get the complete https://chat.llamaindex.ai/ to work with local LLMs. So either I fake the OpenAI API with litellm or I extend the class Ollama from LlamaIndexTS to support the missing methods.

@m0wer
Copy link

m0wer commented Jan 13, 2024

To make it work with Ollama I had to adapt my reverse proxy settings and do the following changes:

diff --git app/api/llm/route.ts app/api/llm/route.ts
index aa3066c..de9a806 100644
--- app/api/llm/route.ts
+++ app/api/llm/route.ts
@@ -4,7 +4,8 @@ import {
   DefaultContextGenerator,
   HistoryChatEngine,
   IndexDict,
-  OpenAI,
+  LLMMetadata,
+  Ollama,
   ServiceContext,
   SimpleChatHistory,
   SummaryChatHistory,
@@ -120,6 +121,33 @@ function createReadableStream(
   return responseStream.readable;
 }
 
+class OllamaCustom extends Ollama {
+    maxTokens: number;
+
+  constructor(init: Partial<OllamaCustom> & {
+    model: string;
+  }) {
+    super(init);
+    this.maxTokens = init.maxTokens || 2048;
+  }
+
+  get metadata(): LLMMetadata {
+    return {
+      model: this.model,
+      temperature: this.temperature,
+      topP: this.topP,
+      maxTokens: this.maxTokens,
+      contextWindow: this.contextWindow,
+      tokenizer: undefined,
+    };
+  }
+
+  tokens(messages: ChatMessage[]): number {
+    let tokens = 10;
+    return tokens;
+  }
+}
+
 export async function POST(request: NextRequest) {
   try {
     const body = await request.json();
@@ -146,11 +174,14 @@ export async function POST(request: NextRequest) {
       );
     }
 
-    const llm = new OpenAI({
+    const llm = new OllamaCustom({
+      baseURL: "https://ollama.mydomain.tld",
       model: config.model,
       temperature: config.temperature,
       topP: config.topP,
+      contextWindow: config.maxTokens,
       maxTokens: config.maxTokens,
+      requestTimeout: 5 * 60 * 1000,
     });
 
     const serviceContext = serviceContextFromDefaults({
diff --git app/client/platforms/llm.ts app/client/platforms/llm.ts
index ddc316f..33273c9 100644
--- app/client/platforms/llm.ts
+++ app/client/platforms/llm.ts
@@ -31,11 +31,11 @@ export interface ResponseMessage {
 }
 
 export const ALL_MODELS = [
-  "gpt-4",
-  "gpt-4-1106-preview",
-  "gpt-4-vision-preview",
-  "gpt-3.5-turbo",
-  "gpt-3.5-turbo-16k",
+  "mistral",
+  "mixtral_default",
+  "dolphin-mixtral:8x7b-v2.7-q4_K_M",
+  "llava",
+  "phi",
 ] as const;
 
 export type ModelType = (typeof ALL_MODELS)[number];
diff --git package.json package.json
index 0ba2c1b..b6dfdfb 100644
--- package.json
+++ package.json
@@ -38,7 +38,7 @@
     "dotenv": "^16.3.1",
     "emoji-picker-react": "^4.4.12",
     "encoding": "^0.1.13",
-    "llamaindex": "0.0.0-20231110031459",
+    "llamaindex": "0.0.44",
     "lucide-react": "^0.277.0",
     "mermaid": "^10.3.1",
     "nanoid": "^5.0.2",

@marcusschiesser
Copy link
Collaborator

see #77 for how to use a Ollama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants