-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support memory-mapped on-disk Indices #4
Comments
I suppose that on each new insertion to an indexed table makes the engine whole index BLOB to be updated, and database writes are done twice, what makes it slow. And if the index files are not present on the folder, the code can recreate them from the content... (is it stored on 2 places?) |
In this proposal, for memory-mapped on-disk indexes, it won't be stored twice. By default, the Faiss index is stored inside a "shadow table" in your SQLite DB, but this option would instead store it on disk as a separate file. It'll still work the same at a user perspective (ie same SELECT and INSERT statements), but under-the-hood the storage of the actual index would be different. Right now the "shadow table" indexes are slow because we re-write the entire index at the end of every transactions that INSERT'ed or DELETE'ed to a vss0 table. That involves exporting the index to an in memory buffer, then re-writing the shadow table with the new contents, which isn't great. But if the Faiss index was its own file and memory mapped, then updates wouldn't be as drastic. |
Thinking about this more: Instead of a This is so we can easily support future storage backends like #30 |
@dleviminzi ok, I applied the new vss0 constructor parser to the main branch. You should be able to add a I also change a bit of the logic of the the create virtual table vss_demo usinv vss0(
a(2) storage_type=faiss_ondisk
) It currently saves vectors to the file:
But, when I change the
Mostly because I don't think the |
I'll look through the changes and give it a go today.
Yeah that makes sense. |
The underlying Faiss indicis are stored in SQLite shadow tables, which can't be mmaped with the
IO_FLAG_MMAP
.One solution: Introduce a new option to store a
vss0
column index on disk, allowing mmaped indices for larger-than-memory.Then, your directory would look like:
sqlite3_db_filename()
would be useful here.One problem: It's kindof nice to have all Faiss indices stored on one file in the SQLite database, and this config option would instead mean users would have to move around multiple files around instead of a single SQLite file. But since this is an "optimization" feature that's not enabled by default, I think it'll be ok.
The text was updated successfully, but these errors were encountered: