Comparing metagenomes help? #294

rachelporetsky · 2017-07-13T17:03:11Z

I have data from a single water sample that was filtered through a big filter followed by a small filter to capture attached vs. free-living microbes. We assembled the sequences from both sets of filters together as well as separately and our assemblies are different.

Do you recommend comparing both reads and assemblies to GenBank microbial genomes? Or to each other (i.e., assemblies to assemblies and reads to reads) or reads from one to assemblies from the other?
You mentioned k-mer trimming first to get Jaccard distances when comparing reads to reads-- how do I do this with sourmash?

ctb · 2018-02-26T12:42:41Z

Partially done in #419, "a practical guide."

ctb · 2020-04-04T13:30:02Z

I think this has been addressed now.

ReneKat · 2020-06-07T21:12:31Z

Helle @ctb and sourmash Team!

I have been through the tutorials and Practical Guide which have all been extremely helpful. However, I have reached a snag I was hoping to get some guidance on.

I am wanting to use sourmash to compare 40 metagenomic environmental water samples: 20 sampling sites over 2 seasons. I have assembled reads from each sample using metaSPAdes and computed signatures for each assembly using combos of k= 21, 31, 51 and scaled= 10, 100, 1000, 10000:

sourmash compute -k 21 --scaled=1000 ${prefix}_MS_scaffolds.fasta --merge ${prefix} -o ${prefix}_k21_1000.sig

sourmash compare *.sig -k 21 -o ./samples01_21_1000
sourmash plot --labels samples01_21_1000

It is known from 16S DNAseq data that some sites should be clustering, especially the same site sampled at different time periods. However, my matrix plot is barely clustering any sites regardless of the k-mer + scaled combination.
The sampling coverage for each sample is low, on average only 3X, with the per sample N50 between 550bp-1200bp.

Is it possible that I'm computing the assembly signatures wrong? I was unsure if I should compute signatures on the reads or the contigs.

sourmash sig describe BR1_2018_k21_1000.sig

== This is sourmash version 3.3.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 1 signatures total.

signature filename: BR1_2018_k21_1000.sig
signature: BR1_2018
source file: BR1_2018_MS_scaffolds.fasta
md5: 13ffdebef827e79bd5a3e92ec4431ae0
k=21 molecule=DNA num=0 scaled=1000 seed=42 track_abundance=0
size: 3679
signature license: CC0

Please let me know if more information is needed. I appreciate your assistance in using sourmash for my project.

Best Regards,
René

ctb · 2020-06-22T13:49:23Z

hi @ReneKat sorry for delay in responding - in the future, just file a new issue, that way it pops up when I'm going through the issue tracker :)

ctb · 2020-06-22T13:54:32Z

a few thoughts --

assembly may be eliminating a lot of your sample, due to the low coverage
Jaccard similarity is really stringent compared to 16s. you're taking into account not just the shared k-mers, but all of the k-mers (including those that are not shared). with metagenomes, we tend to see the behavior I think you're describing.
when you say your samples are "barely clustering", is that a description of the dendrogram lengths or of the shading in the plot matrix? The latter can be adjusted with --vmax and I often go down to a vmax of 0.1 in order to see trace similarities.

I would suggest sticking with k=21 and a scaled of 1000, and applying that to the reads rather than the assembly. If you e-mail me the .sig files at [email protected] I can poke at them a bit and see if I can find something you're missing.

ctb mentioned this issue Sep 28, 2017

document "correct" khmer pre-processing approach for k-mer trimming #273

Closed

ctb mentioned this issue Feb 18, 2018

what's needed for a 2.0 release? #174

Closed

ctb mentioned this issue Feb 26, 2018

[MRG] Add more documentation, including a practical guide to parameters and preprocessing. #419

Merged

5 tasks

ctb mentioned this issue Mar 9, 2018

Hackfest thoughts 3/9/18 #434

Closed

ctb closed this as completed Apr 4, 2020

ctb reopened this Jun 22, 2020

ctb closed this as completed Jan 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing metagenomes help? #294

Comparing metagenomes help? #294

rachelporetsky commented Jul 13, 2017 •

edited

Loading

ctb commented Feb 26, 2018

ctb commented Apr 4, 2020

ReneKat commented Jun 7, 2020 •

edited

Loading

loaded 1 signatures total.

ctb commented Jun 22, 2020 •

edited

Loading

ctb commented Jun 22, 2020

Comparing metagenomes help? #294

Comparing metagenomes help? #294

Comments

rachelporetsky commented Jul 13, 2017 • edited Loading

ctb commented Feb 26, 2018

ctb commented Apr 4, 2020

ReneKat commented Jun 7, 2020 • edited Loading

loaded 1 signatures total.

ctb commented Jun 22, 2020 • edited Loading

ctb commented Jun 22, 2020

rachelporetsky commented Jul 13, 2017 •

edited

Loading

ReneKat commented Jun 7, 2020 •

edited

Loading

ctb commented Jun 22, 2020 •

edited

Loading