Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: option to only import/export selected collections to/from a DB #88

Merged
merged 4 commits into from
Jul 3, 2024

Conversation

iwilltry42
Copy link
Contributor

This PR does two things:

  1. add the variadic collections parameter to the (non-deprecated) import/export functions so one can import/export only selected collections
  2. makes sure that documents are actually persisted to disk when imported into a persistent database

Copy link
Owner

@philippgille philippgille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! 🙇‍♂️

Do you have a use case where someone wants to export only a subset of collections, and deleting the other collections is not an option?

db.go Outdated Show resolved Hide resolved
Comment on lines +262 to +272
err = c.persistMetadata()
if err != nil {
return fmt.Errorf("couldn't persist collection metadata: %w", err)
}
for _, doc := range c.documents {
docPath := c.getDocPath(doc.ID)
err = persistToFile(docPath, doc, c.compress, "")
if err != nil {
return fmt.Errorf("couldn't persist document to %q: %w", docPath, err)
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point! 👍

If you want you can split this into two PRs, one for the fix/improvement, and one for the new feature. But up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have any other change requests apart from the now-resolved ones, I'd leave this in one PR, if you don't mind :)

db.go Outdated Show resolved Hide resolved
db.go Show resolved Hide resolved
db_test.go Outdated Show resolved Hide resolved
db_test.go Outdated Show resolved Hide resolved
db_test.go Show resolved Hide resolved
db_test.go Outdated Show resolved Hide resolved
@iwilltry42
Copy link
Contributor Author

Do you have a use case where someone wants to export only a subset of collections, and deleting the other collections is not an option?

Yeah, over in our gptscript-ai/knowledge tool, we're playing around with sharing knowledge bases (datasets/collections).
E.g. I build a collection with only documentation about topic A and another one with documentation about topic B and then want to share those independently with other users or integrate them into separate larger knowledge servers.
Without this functionality, I'd either have to always share the whole database with collections that are not needed or first build collection A, export it, delete it, then build knowledge base B. That would prevent me from building them continuously in one database and export them separately.

Co-authored-by: Philipp Gillé <[email protected]>
@iwilltry42 iwilltry42 force-pushed the feat/export-collection branch from 22b5e2e to a4a5653 Compare July 3, 2024 08:29
@iwilltry42 iwilltry42 requested a review from philippgille July 3, 2024 08:29
@philippgille
Copy link
Owner

Ah right, the existing DB can still be a persistent one where you continue to use all existing collections, and the export is just for sharing, not for backing up or as an alternative to the immediate persistence. 👍

Thanks again 🙇‍♂️

@philippgille philippgille merged commit 935ec30 into philippgille:main Jul 3, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants