-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ea68456
commit 67095ee
Showing
1 changed file
with
98 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,99 @@ | ||
# chromem-go | ||
In-memory vector database for Go with Chroma-like interface | ||
|
||
[![Go Reference](https://pkg.go.dev/badge/github.com/philippgille/chromem-go.svg)](https://pkg.go.dev/github.com/philippgille/chromem-go) | ||
|
||
In-memory vector database for Go with Chroma-like interface. | ||
|
||
It's not a library to connect to the Chroma database. It's an in-memory database on its own, meant to enable retrieval augmented generation (RAG) applications in Go *without having to run a separate database*. | ||
As such, the focus is not scale or performance, but simplicity. | ||
|
||
> ⚠️ The initial implementation is fairly naive, with only the bare minimum in features. But over time we'll improve and extend it. | ||
## Interface | ||
|
||
Our inspiration, the [Chroma](https://www.trychroma.com/) interface, is the following (taken from their [README](https://github.com/chroma-core/chroma/blob/0.4.21/README.md)). | ||
|
||
```python | ||
import chromadb | ||
# setup Chroma in-memory, for easy prototyping. Can add persistence easily! | ||
client = chromadb.Client() | ||
|
||
# Create collection. get_collection, get_or_create_collection, delete_collection also available! | ||
collection = client.create_collection("all-my-documents") | ||
|
||
# Add docs to the collection. Can also update and delete. Row-based API coming soon! | ||
collection.add( | ||
documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well | ||
metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these! | ||
ids=["doc1", "doc2"], # unique for each doc | ||
) | ||
|
||
# Query/search 2 most similar results. You can also .get by id | ||
results = collection.query( | ||
query_texts=["This is a query document"], | ||
n_results=2, | ||
# where={"metadata_field": "is_equal_to_this"}, # optional filter | ||
# where_document={"$contains":"search_string"} # optional filter | ||
) | ||
``` | ||
|
||
Our Go library exposes the same interface: | ||
|
||
```go | ||
package main | ||
|
||
import "github.com/philippgille/chromem-go" | ||
|
||
func main() { | ||
// Set up chromem-go in-memory, for easy prototyping. Persistence will be added in the future. | ||
client := chromem.NewClient() | ||
|
||
// Create collection. GetCollection, GetOrCreateCollection, DeleteCollection will be added in the future. | ||
collection := client.CreateCollection("all-my-documents", nil, nil) | ||
|
||
// Add docs to the collection. Update and delete will be added in the future. | ||
// Row-based API will be added when Chroma adds it! | ||
_ = collection.Add(ctx, | ||
[]string{"doc1", "doc2"}, // unique ID for each doc | ||
nil, // We handle embedding automatically. You can skip that and add your own embeddings as well. | ||
[]map[string]string{{"source": "notion"}, {"source": "google-docs"}}, // Filter on these! | ||
[]string{"This is document1", "This is document2"}, | ||
) | ||
|
||
// Query/search 2 most similar results. Getting by ID will be added in the future. | ||
results, _ := collection.Query(ctx, | ||
"This is a query document", | ||
2, | ||
map[string]string{"metadata_field": "is_equal_to_this"}, // optional filter | ||
map[string]string{"$contains": "search_string"}, // optional filter | ||
) | ||
} | ||
``` | ||
|
||
Initially, only a minimal subset of all of Chroma's interface is implemented or exported, but we'll add more in future versions. | ||
|
||
## Features | ||
|
||
- Embedding creators: | ||
- [X] [OpenAI ada v2](https://platform.openai.com/docs/guides/embeddings/embedding-models) (default) | ||
- [X] Bring your own | ||
- [ ] [Mistral (API)](https://docs.mistral.ai/api/#operation/createEmbedding) | ||
- [ ] [ollama](https://ollama.ai/) | ||
- [ ] [LocalAI](https://github.com/mudler/LocalAI) | ||
- Similarity search: | ||
- [X] Exact nearest neighbor search using cosine similarity | ||
- [ ] Approximate nearest neighbor search with index | ||
- [ ] Hierarchical Navigable Small World (HNSW) | ||
- [ ] Inverted file flat (IVFFlat) | ||
- Filters: | ||
- [X] Document filters: `$contains`, `$not_contains` | ||
- [X] Metadata filters: Exact matches | ||
- [ ] Operators (`$and`, `$or` etc.) | ||
- Storage: | ||
- [X] In-memory | ||
- [ ] Persistent (file) | ||
- [ ] Persistent (others (S3, PostgreSQL, ...)) | ||
|
||
## Usage | ||
|
||
For a full, working example, using the vector database for retrieval augmented generation (RAG), see [example/main.go](example/main.go) |