10x run leaks memory like crazy #559

olgabot · 2018-10-26T16:11:03Z

TL;DR: Running sourmash compute on larger 10X bam files crashed our 2TB ram machine (!!!)

I've been trying to run sourmash compute on a few 10x bam files with 3458 and 610 barcodes, and previously I had tested files with 150 and 625 barcodes with no problem. But because the bam file is sorted by coordinate, the code iterates over each alignment, checks if the barcode associated with this alignment is already added, adds it, then adds the sequence, it ends up taking up a LOT of memory since its unknown which sequences correspond to which barcodes a priori. I crashed our 2TB ram machine running sourmash compute on these two files 😱

The options I are:

Refactor the 10x bam code to first sort the bam file by barcode, then write the signature generation as an iterator that yields signatures after it runs into a new barcode, and write each to file before purging it from memory
Require a sorted-by-barcode bam file as input (annoying to lazy users like me)

What do you think?

The text was updated successfully, but these errors were encountered:

ctb · 2018-11-01T16:20:14Z

I have no real knowledge here :). What about building signatures for all barcodes simultaneously?

ctb · 2019-08-02T11:26:03Z

does #685/#687 fix this?

olgabot · 2019-08-03T18:25:30Z

Haven't tested it yet, but at a glance, those PRs address the compare functionality whereas this issue was happening just on compute because the file included all ~700k possible barcodes, and was using inefficient data structures to store them all. @pranathivemuri is working on a fix to reading the 10x data here: https://github.com/pranathivemuri/sourmash/blob/pranathi-10x/sourmash/commands.py

There may also be a necessary filter for only allowing barcodes with at least N reads to remove the 'bad' barcodes and reduce the total memory used.

luizirber · 2020-01-03T01:59:06Z

was this fixed with bam2fasta, @olgabot?

olgabot · 2020-01-03T15:42:07Z

Yes, thank you! --- Olga Botvinnik, PhD olgabotvinnik.com <http://www.olgabotvinnik.com>

…

On Thu, Jan 2, 2020 at 8:59 PM Luiz Irber ***@***.***> wrote: was this fixed with bam2fasta, @olgabot <https://github.com/olgabot>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#559>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGE24DM27QNOVW62ZRVNX3Q32LWXANCNFSM4F7SE43A> .

ctb mentioned this issue Aug 2, 2019

Question: Rust code advantages #703

Closed

luizirber closed this as completed Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10x run leaks memory like crazy #559

10x run leaks memory like crazy #559

olgabot commented Oct 26, 2018

ctb commented Nov 1, 2018

ctb commented Aug 2, 2019

olgabot commented Aug 3, 2019

luizirber commented Jan 3, 2020

olgabot commented Jan 3, 2020 via email

10x run leaks memory like crazy #559

10x run leaks memory like crazy #559

Comments

olgabot commented Oct 26, 2018

ctb commented Nov 1, 2018

ctb commented Aug 2, 2019

olgabot commented Aug 3, 2019

luizirber commented Jan 3, 2020

olgabot commented Jan 3, 2020 via email