Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing reads in Transcriptome BAM file when using --quantMode TranscriptomeSAM #2253

Open
suzietallon opened this issue Dec 12, 2024 · 0 comments

Comments

@suzietallon
Copy link

suzietallon commented Dec 12, 2024

Hi everyone,

I am using STAR version 2.7.11a to map RNA-seq reads onto a reference genome, and I am encountering an issue with the --quantMode TranscriptomeSAM option. Specifically, some reads that span two CDS regions (interrupted by an intron) are missing from the transcriptome BAM file, even though they are present in the genome BAM file.

Previously, I mapped these RNA-seq reads with minimap2 directly onto the CDS sequences, and those reads aligned perfectly. However, with STAR, these reads are missing from the transcriptome BAM file, which I discovered when some expected SNPs were absent in downstream analyses.

Here is an example of one such read, as found in the genome BAM file: (i used saamtools view to have reads info)

A01968:75:H7GYKDSX5:1:2427:31530:29825	163	Chromosome_1	89005271	255	6S36M7220N98M	=	89012593	7410	TGGTCTTGTGAAGATGTCCAAGTCTCAGATATGTCTCTTCAGGATTACATTGCAGTCAAAGAGAAATACGCCAAATATATCCCACATTCTGCTGGGCGTTATGCTGCTAAACGGTTCAGAAAGGCTCAGTGTCCTATTGT	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFF,FFFF	NH:i:1	HI:i:1	AS:i:191	nM:i:15

A01968:75:H7GYKDSX5:1:2427:31530:29825	83	Chromosome_1	89012593	255	52S88M	=	89005271	-7410	CAAAGAGAAATACGCCAAATATATCCCACATTCTGCTGGGCGTTATGCTGCTAAACGGTTCAGAAAGGCTCAGTGTCCTATTGTTGAACGTGTCACCAACAGTCTTATGATGCATGGTCGTAACAATGGAAAGAAATTGA	FFFFFFFFFF,FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:191	nM:i:15

I made a simple sketch to better visualize the read mapping:

Capture d’écran du 2024-12-12 11-12-19

Here is the command I used :

STAR --genomeDir $GenomeIndexDirectory --readFilesIn $FastqInputFolder/$forward $FastqInputFolder/$reverse --readFilesCommand gunzip -c --runThreadN 16 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM --outMultimapperOrder Random --outFileNamePrefix $WorkingDirectory/AlignedReads/$ind

I expected this read to appear in the transcriptome BAM file, as it maps to two CDS regions separated by an intron.
I saw issue #315 and checked my GFF file, but there don’t seem to be any apparent problems with it.

What criteria does STAR use to include reads in the transcriptome BAM file?
Could the issue be related to splicing annotations (e.g., incomplete or incorrect intron definitions)?
Are there specific parameters I can adjust to include such reads in the transcriptome BAM file?

Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant