Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a (jaccard) similarity method to nodegraph and bitstorage #1842

Closed
wants to merge 6 commits into from

Conversation

luizirber
Copy link
Member

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested?
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
  • Did it change the command-line interface? Only backwards-compatible
    additions are allowed without a major version increment. Changing file
    formats also requires a major version number increment.
  • For substantial changes or changes to the command-line interface, is it
    documented in CHANGELOG.md? See keepachangelog
    for more details.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)

for (uint64_t index = 0; index < tablebytes; index++) {
// First, get how many values in common we have
intersection += __builtin_popcountll(me[index] & ot[index]);
union_size += __builtin_popcountll(me[index] | ot[index]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I'm grokking this:

  • bitwise AND --> buckets where both Bloom filters have 1
  • bitwise OR --> buckets where either Bloom filter has 1
  • __builtin_popcountll --> optimized function for counting buckets that satisfy the specified criteria

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__builtin_popcountll is a glorified count function to check how many bits are set to 1. A pure python equivalent would be

>>> a = 0b0101
>>> b = 0b1010
>>> bin(a & b).count('1')
0
>>> bin(a | b).count('1')
4

@luizirber luizirber force-pushed the feature/nodegraph_distance branch from ef4e0ed to f125487 Compare July 19, 2018 23:46
@luizirber luizirber force-pushed the feature/nodegraph_distance branch from cfc461f to be7949c Compare August 9, 2018 20:21
@luizirber luizirber force-pushed the feature/nodegraph_distance branch from be7949c to 013006c Compare January 4, 2019 01:59
@luizirber luizirber closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants