strict read quality and end trimming leads to mispaired files #96

yannickwurm · 2021-09-29T13:38:59Z

in the first practical, running cutadapt with paramaters that are relatively stringent (e.g. quality-trim of 20-25), leads to changes in the fastq files.
those changes in the fastq file (perhaps converting many nucleodites to N). Those changes (that masking?) leads kmc2 to drop some reads
thus the fastqs that are output of kmc are out of sync.
this means that subsequent cut adapt does not run appropriately - because cut adapt expects appropriately paired reads as input

We need to do one of the following:

find a way for kmc to not drop reads (is there an option in a newer version of KMC that enables masking (through N or lowercase instead).
or add a step where we manually drop orphan reads to ensure the files are in sync prior to running paired cut adapt
or use something other than kmc (probably not.
or maybe a newer version of cutadapt can have a "check my reads are paired and skip the orphaned ones" option

and/or replace with a different process that is less susceptible to extreme cleaning

This issue doesn't seem to appear if they use lenient quality cutoffs

yannickwurm · 2021-09-29T13:59:07Z

I encourage the assistants to resolve this one this year

piplus2 · 2022-07-26T15:28:52Z

I have added a note in the doc:

Note:
If you trim too much of your sequence (i.e. too large values for --cut and --quality-cutoff), you increase the likelihood of eliminating important information. Additionally, if the trimming is too aggressive, some sequences may be discarded completely, which will cause problems in the subsequent steps of the pre-processing.
For this example, we suggest to keep --cut below 5 and --quality-cutoff below 10.

Also I've corrected the text, as cutadapt fails in those cases, it doesn't drop the unpaired ones.

yannickwurm added bug important labels Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strict read quality and end trimming leads to mispaired files #96

strict read quality and end trimming leads to mispaired files #96

yannickwurm commented Sep 29, 2021 •

edited

Loading

yannickwurm commented Sep 29, 2021

piplus2 commented Jul 26, 2022

strict read quality and end trimming leads to mispaired files #96

strict read quality and end trimming leads to mispaired files #96

Comments

yannickwurm commented Sep 29, 2021 • edited Loading

yannickwurm commented Sep 29, 2021

piplus2 commented Jul 26, 2022

yannickwurm commented Sep 29, 2021 •

edited

Loading