Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes due to trying to unzip file that is not zipped #380

Closed
esrice opened this issue Oct 11, 2024 · 6 comments · Fixed by #382 or #403
Closed

Crashes due to trying to unzip file that is not zipped #380

esrice opened this issue Oct 11, 2024 · 6 comments · Fixed by #382 or #403
Labels
bug Something isn't working

Comments

@esrice
Copy link
Contributor

esrice commented Oct 11, 2024

Description of the bug

I specified an un-gzipped whitelist file with the --barcode_whitelist parameter. In the STAR_ALIGN step, it tries to unzip this file, which causes gzip to crash, which causes the step to fail. This is the offending line of .command.sh:

--soloCBwhitelist <(gzip -cdf 3M-february-2018.txt)

I will try to fix and submit a PR in the next day or two.

Command used and terminal output

$ nextflow run nf-core/scrnaseq
-profile singularity
--input ../samples.csv
--fasta: ../../ref/bGalGal1b_modified.fa
--gtf: ../../ref/bGalGal1b_modified_filtered.gtf
--protocol 10XV3
--aligner star
--outdir out
--barcode_whitelist /mnt/pixstor/data/esrbhb/3M-february-2018.txt
--save_reference

ERROR ~ Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (D2)'

Caused by:
Process NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (D2) terminated with an error exit status (104)

Command executed:

STAR
--genomeDir star
--readFilesIn D2_S2_L001_R2_001.fastq.gz D2_S2_L001_R1_001.fastq.gz
--runThreadN 16
--outFileNamePrefix D2.
--soloCBwhitelist <(gzip -cdf 3M-february-2018.txt)
--soloType CB_UMI_Simple
--soloFeatures Gene
--soloUMIlen 12

--sjdbGTFfile bGalGal1b_modified_genes.gtf
--outSAMattrRGline ID:D2 'SM:D2'

--readFilesCommand zcat --runDirPerm All_RWX --outWigType bedGraph --twopassMode Basic --outSAMtype BAM SortedByCoordinate \

if [ -f D2.Unmapped.out.mate1 ]; then
mv D2.Unmapped.out.mate1 D2.unmapped_1.fastq
gzip D2.unmapped_1.fastq
fi
if [ -f D2.Unmapped.out.mate2 ]; then
mv D2.Unmapped.out.mate2 D2.unmapped_2.fastq
gzip D2.unmapped_2.fastq
fi

if [ -d D2.Solo.out ]; then
# Backslashes still need to be escaped (nextflow-io/nextflow#67)
find D2.Solo.out ( -name ".tsv" -o -name ".mtx" ) -exec gzip {} ;
fi

cat <<-END_VERSIONS > versions.yml
"NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN":
star: $(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS

Command exit status:
104

Command output:
(empty)

Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
gzip: invalid magic

EXITING because of FATAL ERROR: CB whitelist file /dev/fd/63 is empty.
SOLUTION: provide non-empty whitelist.

Oct 11 07:18:04 ...... FATAL ERROR, exiting

Work dir:
/mnt/pixstor/warrenwc-lab/users/edward/nxf_work/1a/6d24d1b3b8d570f7e134a16d877d51

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details
-[nf-core/scrnaseq] Pipeline completed with errors-
WARN: Killing running tasks (1)

Relevant files

No response

System information

  • Nextflow version: 24.04.3
  • nf-core/scrnaseq version: v2.7.1-g4171377
  • slurm executor
  • singularity profile
  • CentOS
@esrice esrice added the bug Something isn't working label Oct 11, 2024
esrice added a commit to esrice/scrnaseq that referenced this issue Oct 11, 2024
@esrice esrice mentioned this issue Oct 11, 2024
@grst
Copy link
Member

grst commented Oct 28, 2024

Hi,

thanks for reporting.
Have validated this by running gzip -cdf 3M-february-2018.txt on your file manually? Because the -f flag of gzip should already deal with non-compressed files.

@esrice
Copy link
Contributor Author

esrice commented Oct 28, 2024

Oh, weird. As you predicted, running that command manually works just fine. So I don't understand why the same command appears to fail inside the pipeline leaving it with an empty whitelist, or why my attempted fix (see PR) of only running gzip if the filename ends in ".gz" prevents this from happening. Do you have any ideas?

@grst
Copy link
Member

grst commented Oct 28, 2024

Can you try running it inside the cellranger container? Maybe it has a different version of gzip...

@esrice
Copy link
Contributor Author

esrice commented Oct 28, 2024

Ah yup that's the problem:

$ gzip -cdf /mnt/pixstor/data/esrbhb/3M-february-2018.txt # this works
$ singularity exec -B /mnt https://depot.galaxyproject.org/singularity/star:2.7.10b--h9ee0642_0 gzip -cdf /mnt/pixstor/data/esrbhb/3M-february-2018.txt
gzip: invalid magic

My system gzip is v1.9 but the container gzip is BusyBox v1.32.1.

@grst
Copy link
Member

grst commented Oct 28, 2024

ok, then your PR should fix this. Many thanks for checking!

@grst
Copy link
Member

grst commented Nov 7, 2024

Closed by #383 382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants