-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: don't calc the heavy expression used in ORDER BY stmt twice #58208
Conversation
Signed-off-by: “EricZequan” <[email protected]>
Hi @EricZequan. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @breezewish PTAL~ |
Signed-off-by: “EricZequan” <[email protected]>
/ok-to-test |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #58208 +/- ##
================================================
+ Coverage 73.2252% 73.9225% +0.6972%
================================================
Files 1681 1682 +1
Lines 463134 471572 +8438
================================================
+ Hits 339131 348598 +9467
+ Misses 103197 102159 -1038
- Partials 20806 20815 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest LGTM
pkg/planner/core/casetest/vectorsearch/testdata/ann_index_suite_out.json
Show resolved
Hide resolved
/hold until two approvals from planner |
Signed-off-by: “EricZequan” <[email protected]>
/retest |
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
/retest |
Signed-off-by: “EricZequan” <[email protected]>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
Signed-off-by: “EricZequan” <[email protected]>
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
/retest |
Signed-off-by: “EricZequan” <[email protected]>
/unhold |
byItemIndex := make([]int, 0) | ||
for i, byItem := range p.ByItems { | ||
if ContainHeavyFunction(byItem.Expr) { | ||
byItemIndex = append(byItemIndex, i) | ||
} | ||
} | ||
if fixValue && len(byItemIndex) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a plan unit test to cover this case (multiple heavy byItems and also some not-heavy byItems altogether)?
// └─Byitem: vec_distance(vec, '[1,2,3]') | ||
// └─Schema: id, vec | ||
// | ||
// New: DataSource(id, vec) -> Projection(id, vec->dis) -> TopN(by dis) -> Projection(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this comment is incorrect? Actually it does not eliminate any columns from projection, but append new ones:
New: DataSource(id, vec) -> Projection(id, vec, vec->dis) -> TopN(by dis) -> Projection(id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comment will fix in next cherry-pick~
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AilinKid, zanmato1984 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What problem does this PR solve?
Issue Number: ref #54245 , close #56318
Problem Summary:
cherry-pick : https://github.com/tidbcloud/tidb-cse/pull/1426
In origin plan, root will calculate distance when using vector search although the result has been calculated in store node. For example:
After this pr, it can be optimized by reuse the distance column and avoid exchange vector column.
What changed and how does it work?
We add
getPushedDownTopN4VectorSearch
to get partial TopN and set a children planphysicalProjection
to resolve partial TopN distance column. At the same time, apply the column in root plan.In 768 dimension and 10000 vector data, we test the sql execute time --
SELECT id FROM table_name ORDER BY Vec_Cosine_Distance(embedding, search_vector) limit 10;
, about 20+% performance improvement. ⬆️Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.