Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How? some basic questions (get an specific ID) #95

Closed
jonathanhecl opened this issue Aug 14, 2024 · 3 comments · Fixed by #97
Closed

How? some basic questions (get an specific ID) #95

jonathanhecl opened this issue Aug 14, 2024 · 3 comments · Fixed by #97

Comments

@jonathanhecl
Copy link

jonathanhecl commented Aug 14, 2024

How I can get an specific ID? I have the ID in the metadata of another item.

doc.Query(ctx, "", 1, map[string]string{"id": x.Metadata["reference"]}, nil)

Or I need to add the ID on metadatas?

Currenly I add the data with:
c.Add(ctx, []string{chunk.ID}, embeddings, []map[string]string{addToMapString(map[string]string{"source": fmt.Sprintf("%s:%d", source.Name, i)}, chunk.Metadata)}, []string{chunk.Content})

Thanks

@philippgille
Copy link
Owner

philippgille commented Aug 18, 2024

Hi 👋 , I didn't have the use case for it yet, so I didn't implement any c.Get(ctx, id) or similar yet.

Short-term:

One (ugly) workaround for now could be that you add the documents with the AddDocument(ctx context.Context, doc Document) error or AddDocuments(ctx context.Context, documents []Document, concurrency int) error instead, and you already set your own IDs both in the documents' ID field as well as in their metadata. Then you can get a specific document with a metadata query like you wrote. That's pretty inefficient of course, but could unblock you if you need this feature immediately.

Long-term:

But really there should be a separate c.Get(ctx, id) or similar for this. Would you be willing to contribute this feature?
With the c.Add and c.Query they're trying to be close to the Chroma interface, and their c.Get is defined here. Among those parameters, I think we can just have: ids, where, whereDocument. For the filtering of the latter two, you can probably do the same as here. For getting by ID just get them from the internal map.
Regarding limit/nResults I'm not sure if it's useful, because the data is kept in a map internally, so if we just return 10 out of 100, it can be 10 different documents each time, due to map elements being explicitly unordered (and actually randomized by Go to prevent anyone from relying on the order of elements). So I'd go without this one for now.

If you don't have the capacity to work on this, I can do it as well.

@philippgille
Copy link
Owner

PS: Also not recommended, but if you need it ASAP and can't wait for the c.Get to be implemented, another alternative is to use reflection. ⚠️ Use at your own risk. Like in this black box test (no access to unexported fields):

package chromem_test

import (
	"context"
	"fmt"
	"reflect"
	"testing"
	"unsafe"

	"github.com/philippgille/chromem-go"
)

func TestCollection_Reflection(t *testing.T) {
	// Create collection
	db := chromem.NewDB()
	name := "test"
	metadata := map[string]string{"foo": "bar"}
	vectors := []float32{-0.40824828, 0.40824828, 0.81649655} // normalized version of `{-0.1, 0.1, 0.2}`
	embeddingFunc := func(_ context.Context, _ string) ([]float32, error) {
		return vectors, nil
	}
	c, err := db.CreateCollection(name, metadata, embeddingFunc)
	if err != nil {
		t.Fatal("expected no error, got", err)
	}
	if c == nil {
		t.Fatal("expected collection, got nil")
	}

	// Add documents
	ids := []string{"1", "2"}
	metadatas := []map[string]string{{"foo": "bar"}, {"a": "b"}}
	contents := []string{"hello world", "hallo welt"}
	err = c.Add(context.Background(), ids, nil, metadatas, contents)
	if err != nil {
		t.Fatal("expected nil, got", err)
	}

	// Access the collection's internal documents map via reflection
	docMapField := reflect.ValueOf(c).Elem().FieldByName("documents")
	if !docMapField.IsValid() {
		t.Fatal("expected to be able to access the documents map via reflection")
	}
	if docMapField.Kind() != reflect.Map {
		t.Fatal("documents field is not a map")
	}
	// Use unsafe operations to access the unexported field
	docMapPtr := unsafe.Pointer(docMapField.UnsafeAddr())
	docMapVal := reflect.NewAt(docMapField.Type(), docMapPtr).Elem()
	docMap, ok := docMapVal.Interface().(map[string]*chromem.Document)
	if !ok {
		t.Fatal("expected to be able to do type assertion on the documents map")
	}
	if len(docMap) != 2 {
		t.Fatalf("expected 2 documents, got %d", len(docMap))
	}

	fmt.Printf("Got by ID: %v\n", docMap["1"].Content) // Running the test with `-v` flag prints "Got by ID: hello world"
}

Running the test with -v flag prints "Got by ID: hello world".

@philippgille
Copy link
Owner

philippgille commented Sep 1, 2024

I went with just .GetByID(ctx, id) for now, the more Chroma-like Get with additional where and whereDocument filters can be added later: #97

Hope that works for you! Otherwise let me know and I can adapt it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants