Partitioned HNSW Deeplake Side Changes. #2847

sounakr · 2024-05-08T10:48:40Z

🚀 🚀 Pull Request

This PR is the deeplake side implementation of the Partitioned HNSW. In case of Partititoned HNSW we divide the HNSW into number of partition. This is done when the data is large and it has to scale. HNSW is not scalable, so in order to accommodate large of of data Partitioning is a way out.
Partitions are defined in index params. For e.g. we are creating 5 partitions and if the dataset is having 1000000 rows then each partition will have 200000 rows.

Through VectorStore API.
vs = VectorStore(
path=dest,
exec_option="compute_engine",
index_params={"threshold": 1, "distance_metric": "COS", "additional_params": {
"efConstruction": 200,
"M": 16,
"partitions": 5,
}},
token = TOKEN,
verbose=True,
overwrite= True,
)

Through Deeplake API.
ds = vs.dataset.
params = {
"efConstruction": 200,
"M": 16,
"partitions": 32,
}
ds.embedding.create_vdb_index("hnsw_1", distance="cosine_similarity", additional_params = params)

While doing query there is no change and TQL will be fired to all the partitions simultenously. The best match will be responded back.

Incremental index maintenance is enabled for partitioned hnsw. In case of new row Addition, Update or Remove of Top most rows the partitioned hnsw is automatically maintained.

In order to delete the partitioned hnsw index
ds.embedding.delete_vdb_index("hnsw_1")

Impact

Partitioned indexes are much faster to create and have high recall impact. Whenever indexing has to be done at scale, this feature is helpful.

…ncremental dmls.

…r in additional params as they are mutable.

CLAassistant · 2024-05-27T11:14:35Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ activesoull
✅ sounakr
✅ khustup2
❌ azat-manukyan
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…ncremental dmls.

…r in additional params as they are mutable.

…to partitioned_hnsw

sonarcloud · 2024-06-13T09:40:36Z

Quality Gate failed

Failed conditions
7.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

This reverts commit 31959f2.

sounakr added 6 commits May 8, 2024 16:17

Partitioned HNSW Deeplake Side Changes.

5e99d16

Partitioned HNSW Deeplake Side Changes. Deserialization changes for i…

607e81a

…ncremental dmls.

Partitioned HNSW Deeplake Side Changes. Test case addition.

2f67b52

Partitioned HNSW Deeplake Side Changes. Rectify Test Case

3d153bc

Partitioned HNSW Deeplake Side Changes. Rectify Test Case Phase1

2d1f376

Partitioned HNSW Deeplake Side Changes. Don't compare partition numbe…

8655c0f

…r in additional params as they are mutable.

sounakr marked this pull request as ready for review June 3, 2024 02:06

sounakr requested a review from nvoxland-al June 3, 2024 02:06

sounakr and others added 8 commits June 4, 2024 19:13

Partitioned HNSW Deeplake Side Changes.

ef7e0d0

Partitioned HNSW Deeplake Side Changes. Deserialization changes for i…

140da23

…ncremental dmls.

Partitioned HNSW Deeplake Side Changes. Test case addition.

00d3927

Partitioned HNSW Deeplake Side Changes. Rectify Test Case

6856074

Partitioned HNSW Deeplake Side Changes. Rectify Test Case Phase1

88d48ba

Partitioned HNSW Deeplake Side Changes. Don't compare partition numbe…

4486016

…r in additional params as they are mutable.

bumb libdeeplake to 0.0.130

c3cc62f

libdeeplake to 0.0.129

b906611

azat-manukyan force-pushed the partitioned_hnsw branch from 8f86704 to b906611 Compare June 4, 2024 15:13

Bump libdeeplake version.

3e2f2af

khustup2 approved these changes Jun 4, 2024

View reviewed changes

sounakr added 6 commits June 5, 2024 09:57

- Merge branch 'main' of https://github.com/activeloopai/deeplake in…

f01b6d0

…to partitioned_hnsw

Reformatting.

9058dbc

- Merge branch 'main' of https://github.com/activeloopai/deeplake in…

c6857de

…to partitioned_hnsw

Bump Up version.

6a7fc70

Adding Incremental test cases.

9e6e1d3

Fixing Test cases.

6ae45da

sounakr changed the title ~~[WIP]Partitioned HNSW Deeplake Side Changes.~~ Partitioned HNSW Deeplake Side Changes. Jun 5, 2024

Bump libdeeplake version.

817e1ef

azat-manukyan approved these changes Jun 6, 2024

View reviewed changes

Bump libdeeplake version.

6ac1a9b

Merge branch 'main' into partitioned_hnsw

addca99

khustup2 merged commit 31959f2 into main Jun 13, 2024
7 of 10 checks passed

khustup2 deleted the partitioned_hnsw branch June 13, 2024 10:16

nvoxland-al pushed a commit that referenced this pull request Jun 18, 2024

Revert "Partitioned HNSW Deeplake Side Changes. (#2847)"

cbc71a4

This reverts commit 31959f2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitioned HNSW Deeplake Side Changes. #2847

Partitioned HNSW Deeplake Side Changes. #2847

sounakr commented May 8, 2024 •

edited

Loading

CLAassistant commented May 27, 2024 •

edited

Loading

sonarcloud bot commented Jun 13, 2024

Partitioned HNSW Deeplake Side Changes. #2847

Partitioned HNSW Deeplake Side Changes. #2847

Conversation

sounakr commented May 8, 2024 • edited Loading

🚀 🚀 Pull Request

Impact

CLAassistant commented May 27, 2024 • edited Loading

sonarcloud bot commented Jun 13, 2024

Quality Gate failed

sounakr commented May 8, 2024 •

edited

Loading

CLAassistant commented May 27, 2024 •

edited

Loading