Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

suzietallon · 2024-12-12T10:20:24Z

Hi everyone,

I am using STAR version 2.7.11a to map RNA-seq reads onto a reference genome, and I am encountering an issue with the --quantMode TranscriptomeSAM option. Specifically, some reads that span two CDS regions (interrupted by an intron) are missing from the transcriptome BAM file, even though they are present in the genome BAM file.

Previously, I mapped these RNA-seq reads with minimap2 directly onto the CDS sequences, and those reads aligned perfectly. However, with STAR, these reads are missing from the transcriptome BAM file, which I discovered when some expected SNPs were absent in downstream analyses.

Here is an example of one such read, as found in the genome BAM file: (i used saamtools view to have reads info)

A01968:75:H7GYKDSX5:1:2427:31530:29825	163	Chromosome_1	89005271	255	6S36M7220N98M	=	89012593	7410	TGGTCTTGTGAAGATGTCCAAGTCTCAGATATGTCTCTTCAGGATTACATTGCAGTCAAAGAGAAATACGCCAAATATATCCCACATTCTGCTGGGCGTTATGCTGCTAAACGGTTCAGAAAGGCTCAGTGTCCTATTGT	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFF,FFFF	NH:i:1	HI:i:1	AS:i:191	nM:i:15

A01968:75:H7GYKDSX5:1:2427:31530:29825	83	Chromosome_1	89012593	255	52S88M	=	89005271	-7410	CAAAGAGAAATACGCCAAATATATCCCACATTCTGCTGGGCGTTATGCTGCTAAACGGTTCAGAAAGGCTCAGTGTCCTATTGTTGAACGTGTCACCAACAGTCTTATGATGCATGGTCGTAACAATGGAAAGAAATTGA	FFFFFFFFFF,FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:191	nM:i:15

I made a simple sketch to better visualize the read mapping:

Here is the command I used :

STAR --genomeDir $GenomeIndexDirectory --readFilesIn $FastqInputFolder/$forward $FastqInputFolder/$reverse --readFilesCommand gunzip -c --runThreadN 16 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM --outMultimapperOrder Random --outFileNamePrefix $WorkingDirectory/AlignedReads/$ind

I expected this read to appear in the transcriptome BAM file, as it maps to two CDS regions separated by an intron.
I saw issue #315 and checked my GFF file, but there don’t seem to be any apparent problems with it.

What criteria does STAR use to include reads in the transcriptome BAM file?
Could the issue be related to splicing annotations (e.g., incomplete or incorrect intron definitions)?
Are there specific parameters I can adjust to include such reads in the transcriptome BAM file?

Thank you for your help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

suzietallon commented Dec 12, 2024 •

edited

Loading

Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

Comments

suzietallon commented Dec 12, 2024 • edited Loading

suzietallon commented Dec 12, 2024 •

edited

Loading