-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some warning messages during centrifuger-download. "unexpected end of file", "invalid compressed data--format violated", "invalid compressed data--length error" #8
Comments
Could you please directly download one of the genomic file such as "curl -s -o test.fna.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/975/215/GCA_000975215.1_Cael_CB4856_1.0/GCA_000975215.1_Cael_CB4856_1.0_genomic.fna.gz" and check whether test.fna.gz is intact? |
I checked the integrity of the downloaded fna.gz file by
|
This is very strange...does the seqid2taxid.map look right? For example, does it have about 109 rows. What is the content for the line containing "CM003206.1"? |
Oh, I just noticed the other issue with classification results is also from you. I guess this is a strange issue from gzip then... |
Why might the seqid2taxid.map contain about 109 rows? Find duplicated records in seqid2taxid.map. This could be an issue with the GenBank data itself. e.g.
|
That's my mistake. I just tried to download some of the vertebrate and fungi genomes for the test. Seems the parsing for the seqid2taxid.map file was correct. |
I updated the Centrifuger to Centrifuger v1.0.3-r119 owing to no response in centrifuger-download. And the ultimate reason is that there is a maintenance from NCBI yesterday. The seqid2taxid.map file also seems correct.
Could the previous download issue be due to a slow network? |
I think you are right, the download was slow and interrupted due to the maintenance from NCBI (where did you find the notice? just curious.) Glad it works out today! |
The same centrifuger-download script didn't work at the second time. It's distressing. So I updated the centrifuger. It also didn't work yesterday. Finally, I found the Download problem. I stopped try it until the second day. I can download the data now. However, why the page still show that "Notice: Upcoming Maintenance Downtime". It's weird. I will close this issue. And there are some problem in parsing seqid_to_taxid.map, which will mentioned in #9 |
Hi, when I use the following code to download the genome, there are some warning messages in the logs. Is that a problem I should solve? The centrifuger (Centrifuger v1.0.1-r89) was installed via conda.
The logs (partial):
The text was updated successfully, but these errors were encountered: