Skip to content
nchernia edited this page Feb 9, 2017 · 21 revisions

Juicer

Overview # What is Juicer? # Juicer is a one-click pipeline for processing terabase scale Hi-C datasets. Using Juicer, you can
  • Go from raw fastq files to Hi-C maps binned at many resolutions
  • Automatically annotate loops and contact domains with the Juicer tools
  • Run the pipeline in the cloud, on LSF, Univa, or SLURM, or on a single CPU

Juicer Quick Start

  1. Choose your cluster system or single CPU. Juicer is currently available in the cloud on AWS, on LSF, Univa, or SLURM, or on a single CPU
  2. Be sure you know how to load the required software on your system; cluster systems might have slightly different names.
  3. Log into your cluster
  4. Install the appropriate Juicer scripts for your system in a directory; we will assume this directory is /home/user/juicedir
  5. Under /home/user/juicedir, there should be a folder references that contains the reference fasta file for your genome and the BWA index files. You can soft-link if necessary, or otherwise download the fasta files from UCSC and run bwa index on the fasta file.
  6. Under /home/user/juicedir, you should also create a folder restriction_sites. This should contain your restriction site file. You can create this file using the generate_site_positions.py Python script, or download already created ones from the Juicer AWS mirror.
  7. [Optional, only for deep maps] Create the bedfile folder
  8. Create a custom directory (e.g. mkdir -p /custom/filepath/MyHIC)
  9. Download the [test data]. Create a fastq directory under the top directory (e.g. cd /custom/filepath/MyHIC; mkdir fastq)
  10. Soft-link or copy your fastq files (zipped or unzipped) to that directory
  11. Type screen then launch Juicer:
    /local/path/scripts/juicer.sh [options]
    
    where /local/path refers to the folder containing the scripts folder bundling the necessary files included with this distribution. The most important options are -g <genomeID> and -s <restriction_site>. The files will be split if necessary and Juicer will launch.
  12. Check out the results with the appropriate command in your cluster; bjobs for LSF and AWS, squeue for SLURM, qstat for Univa. The single CPU script will run until it finishes or exits.
  13. If there are no jobs left, type cat debug/finalcheck*; you should see a "Pipeline successfully completed" message.
  14. Results are available in the aligned directory. The Hi-C maps are in inter.hic (for MAPQ > 0) and inter_30.hic (for MAPQ >= 30). The Hi-C maps can be loaded in Juicebox and explored. They can also be used for automatic feature annotation and to extract matrices at specific resolutions.
  15. These results also include automatic feature annotation. The output files include a genome-wide annotation of loops and, whenever possible, the CTCF motifs that anchor them (identified using the HiCCUPS algorithm). The files also include a genome-wide annotation of contact domains (identified using the Arrowhead algorithm). The formats of these files are described in the Juicebox tutorial online; both files can be loaded into Juicebox as a 2D annotation.