-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid segment Error when building reference of the rat genome #340
Comments
The issue is 183055283:183055283 is an invalid coordinate (it has length 0). Such a coordinate should not exist in a properly formatted GTF file. It seems to me like it's a GTF formatting issue rather than a kb ref issue. When you grep those coordinates, what do those problematic lines in the GTF file look like? |
Here are the results of the grep with two flanking lines:
It seems that in all cases these are two neighbour exons without an intron in between. Maybe the script should merge them in such cases? |
Hi, @kim-fehl,
and then running |
I'm using
kb ref
command from kallisto-bustools package and RefSeq genome mRatBN7.2 of Rattus norvegicus. Here is the command:kb ref -i mRATBN7.cdna.idx -g mRATBN7.t2g -f1 mRATBN7.cdna.fna ~/genomes/GCF_015227675.2_mRatBN7.2_genomic.fna ~/genomes/GCF_015227675.2_mRatBN7.2_genomic.gtf
The command fails with the following error message:
ngs_tools.gtf.Segment.SegmentError: Invalid segment [183055283:183055283)
To make it work, I have to eliminate lines with error-causing coordinates from the gtf file like this:
kb ref -i mRATBN7.cdna.idx -g mRATBN7.t2g -f1 mRATBN7.cdna.fna ~/genomes/GCF_015227675.2_mRatBN7.2_genomic.fna <(grep -vE "183055283|128787877|145599560|105962786|24017317|23500941" ~/genomes/GCF_015227675.2_mRatBN7.2_genomic.gtf)
Probably these coordinates can help you to investigate the issue. Thank you!
The text was updated successfully, but these errors were encountered: