Creating an integrated SV callset is difficult. The code associated with this project was designed to help merge SVs in a consistent and straightforward way. The inputs to mergeSVcallers are Tabix merged VCF files and the output is a merged VCF file. MergeSVcallers can be re-run iteratively.
Please feel free to join the SV merge quest!
git clone --recursive https://github.com/zeeev/mergeSVcallers.git
cd mergeSVcallers/
make
Usage:
mergeSVcallers -a ref.fasta -f a.vcf.gz,b.vcf.gz -t WHAM,LUMPY -s 500
Required:
-a - <STRING> - The samtools faidx indexed FASTA file
-f - <STRING> - A comma separated list of Tabix indexed VCF files
-t - <STRING> - A comma separated list of tags/identifiers for each file
Optional:
-s - <INT> - Merge SVs with both breakpoints N BP away [100]
-r - <FLOAT> - Reciprocal overlap also required [0]
Info:
-This tool provides a simple set of operations to merge SVs.
-Output is unsorted.
##Tested Tools
- WHAM-GRAPHENING
- LUMPY
- GENOME STRIP CNVs
- GENOME STRIP DELETION
- DELLY
- VARIATION HUNTER
##TODO
- create a test suite
- merge by reciprocal overlap
- add a splitter function
- add translocation functionality
There are two utility scripts designed to quickly generate venn diagrams from the merged VCF file generated by mergeSVcallers. The first scrip generates the input data for the plot script:
perl vennGen.pl --file ../merged.test.vcf --patterns "WHAM-,LUMPY-,GENOME-STRIP" --names WHAM,LUMPY,GS > plottest-data.txt
The plottest-data is a Boolean dataframe measuring the intersection at each merged SV. The output is then passed to the simple [R] plotting script.
R --vanilla < plotVenn.R --args plottest-data.txt testing DEL 50
The first argument is the data. The second argument is a file prefix for the plot. The last argument is the type that you want to plot. The last option is the minimum SVLEN. This script uses the package gplots in [R] to generate a PDF in the same directory.
Here is an example plot (This is just an example of poorly matched samples):