Improve query performance #47

philippgille · 2024-03-16T14:09:59Z

We added benchmarks in #46.

Now we used them + CPU and memory profiles gathered with them to improve the performance.

⚠️ This PR only addresses the vector similarity search, not the metadata or full text filtering.

Each individual improvement is a separate commit. The overall improvement is 60-80% reduction in query duration and > 99% reduction in memory allocations:

goos: linux
goarch: amd64
pkg: github.com/philippgille/chromem-go
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                                    │    before     │                after                │
                                    │    sec/op     │    sec/op     vs base               │
Collection_Query_NoContent_100-8       413.7µ ±  4%   109.9µ ±  1%  -73.44% (p=0.002 n=6)
Collection_Query_NoContent_1000-8     2759.4µ ±  0%   536.8µ ±  1%  -80.55% (p=0.002 n=6)
Collection_Query_NoContent_5000-8     12.980m ±  1%   4.985m ± 15%  -61.60% (p=0.002 n=6)
Collection_Query_NoContent_25000-8     66.56m ±  1%   14.97m ± 10%  -77.51% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   282.41m ±  3%   56.50m ± 11%  -79.99% (p=0.002 n=6)
Collection_Query_100-8                 416.7µ ±  2%   110.0µ ±  0%  -73.61% (p=0.002 n=6)
Collection_Query_1000-8               2792.8µ ± 23%   536.8µ ±  0%  -80.78% (p=0.002 n=6)
Collection_Query_5000-8               15.643m ±  1%   4.869m ±  5%  -68.88% (p=0.002 n=6)
Collection_Query_25000-8               78.29m ±  1%   15.01m ±  3%  -80.82% (p=0.002 n=6)
Collection_Query_100000-8             338.54m ±  5%   56.48m ±  4%  -83.32% (p=0.002 n=6)
geomean                                12.97m         3.008m        -76.81%

                                    │     before      │                after                │
                                    │      B/op       │     B/op      vs base               │
Collection_Query_NoContent_100-8      1211.007Ki ± 0%   6.330Ki ± 0%  -99.48% (p=0.002 n=6)
Collection_Query_NoContent_1000-8     12082.16Ki ± 0%   34.83Ki ± 0%  -99.71% (p=0.002 n=6)
Collection_Query_NoContent_5000-8      60394.2Ki ± 0%   162.8Ki ± 0%  -99.73% (p=0.002 n=6)
Collection_Query_NoContent_25000-8    301962.1Ki ± 0%   794.8Ki ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   1179.510Mi ± 0%   3.057Mi ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_100-8                1211.006Ki ± 0%   6.329Ki ± 0%  -99.48% (p=0.002 n=6)
Collection_Query_1000-8               12082.11Ki ± 0%   34.83Ki ± 0%  -99.71% (p=0.002 n=6)
Collection_Query_5000-8                60394.1Ki ± 0%   162.8Ki ± 0%  -99.73% (p=0.002 n=6)
Collection_Query_25000-8              301962.1Ki ± 0%   794.8Ki ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_100000-8             1179.510Mi ± 0%   3.057Mi ± 0%  -99.74% (p=0.002 n=6)
geomean                                  49.13Mi        155.0Ki       -99.69%

                                    │     before     │               after               │
                                    │   allocs/op    │ allocs/op   vs base               │
Collection_Query_NoContent_100-8         238.00 ± 0%   44.00 ± 0%  -81.51% (p=0.002 n=6)
Collection_Query_NoContent_1000-8       2038.50 ± 0%   44.00 ± 0%  -97.84% (p=0.002 n=6)
Collection_Query_NoContent_5000-8      10039.00 ± 0%   44.00 ± 0%  -99.56% (p=0.002 n=6)
Collection_Query_NoContent_25000-8     50038.00 ± 0%   44.00 ± 0%  -99.91% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   200038.00 ± 0%   44.00 ± 0%  -99.98% (p=0.002 n=6)
Collection_Query_100-8                   238.00 ± 0%   44.00 ± 0%  -81.51% (p=0.002 n=6)
Collection_Query_1000-8                 2038.00 ± 0%   44.00 ± 0%  -97.84% (p=0.002 n=6)
Collection_Query_5000-8                10038.00 ± 0%   44.00 ± 0%  -99.56% (p=0.002 n=6)
Collection_Query_25000-8               50038.00 ± 0%   44.00 ± 0%  -99.91% (p=0.002 n=6)
Collection_Query_100000-8             200038.50 ± 0%   44.00 ± 0%  -99.98% (p=0.002 n=6)
geomean                                  8.661k        44.00       -99.49%

Benchmarked on Framework Laptop 13 (first generation).

Benchmarked before the first commit of this PR, and after.

Benchmarked with: go test -benchmem -run=^$ -count 6 -bench . (6 counts because benchstat (used for printing the diff shown ⬆️ ) asks for it).

Not relevant for single query, but for concurrent ones

Also gets ride of sync.WaitGroup

…ock" This reverts commit cc86f2c.

For now we check this by computing the length. In the future we could pass a flag if it's already known whether a vector is normalized, which is the case for many embedding models.

Greatly reduces number of allocations. For a query of 5,000 documents from ~5000 allocations to ~50. Number of allocations are also now constant, i.e. 50 for querying 100,000 documents.

…known" This reverts commit ff28a38.

- Normalizes only once instead of each time - Embedding creation takes time anyway, while query should be as fast as possible

philippgille added 9 commits March 12, 2024 22:54

Unlock documents lock earlier

914a1b9

Not relevant for single query, but for concurrent ones

Stop copying all doc structs when returning doc similarities

4db0732

Use sub slices instead of channel to pass documents into goroutines

ff7e80f

Use channel for goroutine results instead of shared slice + lock

cc86f2c

Also gets ride of sync.WaitGroup

Revert "Use channel for goroutine results instead of shared slice + l…

089007b

…ock" This reverts commit cc86f2c.

Only normalize vector if it's not normalized yet

15d1858

For now we check this by computing the length. In the future we could pass a flag if it's already known whether a vector is normalized, which is the case for many embedding models.

Turn slice of pointers to slice of structs

503c3ce

Greatly reduces number of allocations. For a query of 5,000 documents from ~5000 allocations to ~50. Number of allocations are also now constant, i.e. 50 for querying 100,000 documents.

Add "normalized" parameter to skip check if normalization is known

ff28a38

Revert "Add "normalized" parameter to skip check if normalization is …

05c4f76

…known" This reverts commit ff28a38.

philippgille force-pushed the query-perf branch from fb2d415 to bffa34a Compare March 16, 2024 14:21

Normalize vectors on embedding creation instead of querying

579fd46

- Normalizes only once instead of each time - Embedding creation takes time anyway, while query should be as fast as possible

philippgille force-pushed the query-perf branch from bffa34a to 579fd46 Compare March 16, 2024 14:23

philippgille added 2 commits March 16, 2024 18:11

Clarify query duration in examples

5656523

Update README

1410612

philippgille force-pushed the query-perf branch from 1e3525a to 1410612 Compare March 16, 2024 18:27

philippgille merged commit acb1e3f into main Mar 16, 2024
2 checks passed

philippgille deleted the query-perf branch March 16, 2024 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve query performance #47

Improve query performance #47

philippgille commented Mar 16, 2024 •

edited

Loading

Improve query performance #47

Improve query performance #47

Conversation

philippgille commented Mar 16, 2024 • edited Loading

philippgille commented Mar 16, 2024 •

edited

Loading