Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Fulltext remove slow ORDER BY SQL and reduce memory usage #20702

Draft
wants to merge 70 commits into
base: main
Choose a base branch
from

Conversation

cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Dec 10, 2024

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #20213

What this PR does / why we need it:

  • remove ORDER BY
  • For phrase search, use inner join and use position in runtime filter
  • group by SQL result with doc_id and produce a vector of document count [N]uint8 where N is the number of keywords in search string. Index of the vector is corresponding to the Pattern.Index. Only one row per doc_id will be stored in hastable to minimize the memory usage
  • ignore position for OR operation
  • secondary index table cluster by doc_id
  • value in hashtable stored in memory pool that can spill
  • optimize the SQL with JOIN
  • support print SQL with explain verbose
  • support Chinese in boolean mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working size/XXL Denotes a PR that changes 2000+ lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants