Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in .jaccard #685

Closed
ag14774 opened this issue Jun 5, 2019 · 3 comments
Closed

Memory leak in .jaccard #685

ag14774 opened this issue Jun 5, 2019 · 3 comments

Comments

@ag14774
Copy link

ag14774 commented Jun 5, 2019

def get_average_dist(file_list):
    file_list = [str(f) + '.sig' for f in file_list]
    signatures = [sourmash.load_one_signature(f) for f in file_list]
    print("loaded")
    dist = 0
    counter = 0
    for i in range(len(signatures)):
        for j in range(i + 1, len(signatures)):
            dist += signatures[i].jaccard(signatures[j])
            counter += 1
    return dist / counter

With 3375 signatures I try to calculate the distance between all possible combinations of 2 and to calculate the average. As you can see I am not creating a list of all the combinations. Just a list of signatures which takes less than 800 MBs. Then once it enters the loop, memory usage increases very quickly with each iteration. Any ideas what is going wrong here?

@camillescott
Copy link
Contributor

There was indeed a memory leak in a function called by jaccard -- a heap-allocated object was not being deleted. I put up a PR to fix it.

@luizirber
Copy link
Member

This is now released in 2.0.1, thanks @camillescott!

@ctb
Copy link
Contributor

ctb commented Aug 2, 2019

Fixed in #687.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants