4.6.0.0
Download release: gatk-4.6.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.6.0.0 release:
-
We've fixed a serious CRAM writing bug that affects GATK versions 4.3 through 4.5 and Picard versions 2.27.3 through 3.1.1. This bug can, in limited cases, lead to reads with an incorrect base sequence being written. See this comment to GATK issue 8768 and the full release notes below for more details on what conditions trigger the bug.
- To help users detect whether their CRAM files are affected, we've released a CRAM scanning tool called
CRAMIssue8768Detector
that can detect whether a particular CRAM file is affected by this bug. If you suspect that some of your CRAM files may have been affected, please run this tool on them for confirmation!
- To help users detect whether their CRAM files are affected, we've released a CRAM scanning tool called
-
By overwhelming popular demand, we've switched back to using the standard
./.
representation for no-calls inGenotypeGVCFs
andGenomicsDB
instead of0/0
withDP=0
. This reverts the change described in our article GenotypeGVCFs and the death of the dot.- We intend to publish a new article shortly to replace that older article with further details on this change. When we do so, we'll link to it from here.
-
The
Mutect2
germline resource can now have split multiallelic format -
Added an
--inverted-read-filter
argument to allow for selecting reads that fail read filters from the command line easily -
We've fixed a number of issues with HTTP support, mainly affecting the loading of side inputs such as indices over HTTP
-
Reduced the number of layers in the GATK docker image to help users running into docker quota issues
Full list of changes:
-
Important CRAM writing bug fix and detection tool
- We've updated to
HTSJDK
4.1.1 andPicard
3.2.0 (#8900), which fix a serious bug in the CRAM writing code first reported in GATK issue 8768 - This issue affects GATK versions 4.3.0.0 through 4.5.0.0, and is fixed in GATK 4.6.0.0.
- This issue also affects Picard versions 2.27.3 through 3.1.1, and is fixed in Picard 3.2.0.
- The bug is triggered when writing a CRAM file using one of the affected GATK/Picard versions, and both of the following conditions are met:
- At least one read is mapped to the very first base of a reference contig
- The file contains more than one CRAM container (10,000 reads) with reads mapped to that same reference contig
- When both of these conditions are met, the resulting CRAM file may have corrupt containers associated with that contig containing reads with an incorrect sequence.
- Since many common references such as hg38 have N's at the very beginning of the autosomes and X/Y, many pipelines will not be affected by this bug. However, users of a telomere-to-telomere reference, users doing mitochondrial calling, and users with reads aligned to the alt sequences will want to scan their CRAM files for possible corruption.
- The other mitigating circumstance is that when a CRAM is affected, the signal will be overwhelmingly obvious, with the mismatch rate typically jumping from sub-1% to 80-90% for the affected regions, making it likely to be caught by standard QC processes.
- We've released a CRAM scanning tool called
CRAMIssue8768Detector
(#8819) that can detect whether a particular CRAM file is affected by this bug. If you suspect that some of your CRAM files may have been affected, please run this tool on them for confirmation!
- We've updated to
-
Joint Calling
- We've switched back to using the standard
./.
representation for no-calls inGenotypeGVCFs
andGenomicsDB
instead of0/0
withDP=0
(#8715) (#8741) (#8759)- This reverts the change described in our article GenotypeGVCFs and the death of the dot
- Fix for
GenotypeGVCFs
with mixed ploidy sites (#8862) - Fix for
GnarlyGenotyper
when PLs are null (#8878) - Fixed bug in
ReblockGVCF
when removing annotations (#8870) - Enable
ReblockGVCF
to subset AS annotations that aren't "raw" (pipe-delimited) (#8771) - Remove header lines in
ReblockGVCF
when we remove FORMAT annotations (#8895) ReblockGVCF
: Add malaria spanning deletion exception regression test with fix (#8802)- Restore some
GnarlyGenotyper
tests (#8893)
- We've switched back to using the standard
-
HaplotypeCaller
- Fix to long deletions that overhang into the assembly window causing exceptions in
HaplotypeCaller
(#8731)
- Fix to long deletions that overhang into the assembly window causing exceptions in
-
Mutect2
- The
Mutect2
germline resource can now have split multiallelic format (#8837) - Make the
Mutect2
haplotype and clustered events filters smarter about germline events (#8717) - Added the DragSTR model to the Mutect2 WDL (#8716)
- Improvements to
Mutect2
'sPermutect
training data mode (#8663) - Bigger
Permutect
tensors andPermutect
test datasets can be annotated with truth VCF (#8836) Mutect2
WDL and GetSampleName can handle multiple sample names in BAM headers (#8859)Permutect
dataset engine outputs contig and read group indices, not names (#8860)- Normal artifact LOD is now defined without the extra minus sign (#8668)
- The
-
CNV Calling
- Fixed the GT header in
PostprocessGermlineCNVCalls
's--output-genotyped-intervals
output (#8621)
- Fixed the GT header in
-
SV Calling
-
Flow-based Calling
-
Notable Enhancements
- Added an
--inverted-read-filter
argument to allow for selecting reads that fail read filters from the command line easily (#8724) - Inverted
SoftClippedReadFilter
to conform to the standard filtering logic (#8888) - Reduced the number of docker layers in the GATK image from 44 to 16 (#8808)
VariantFiltration
: added a--mask-description
argument to write custom mask filter description in VCF header (#8831)GatherVcfsCloud
is no longer beta (#8680)
- Added an
-
Miscellaneous Changes
GetPileupSummaries
now uses the standardMappingQualityReadFilter
instead of a custom--min-mapping-quality
argument (#8781)Funcotator
: suppress a log message about b37 contigs when not doing b37/hg19 conversion (#8758)- Output the new image name at the end of a successful cloud docker build (#8627)
- Exclude the test folder from code coverage calculations (#8744)
- Removed deprecated genomes in the cloud docker image that was causing CNN WDL test failures (#8891)
- Re-commit large test files as lfs stubs (#8769)
- Standardize test results directory between normal/docker tests (#8718)
- Improve failure message in
VariantContextTestUtils
(#8725) - Update the
setup_cloud
github action (#8651) - Parameterize the logging frequency for ProgressLogger in
GatherVcfsCloud
(#8662)
-
Documentation
- Updated the README to include list of popular software included in docker image (#8745)
-
Dependencies
- Updated
HTSJDK
to 4.1.1, which fixes the CRAM writing bug described above (#8900) - Updated
Picard
to 3.2.0, which fixes the CRAM writing bug described above (#8900) - Updated
GenomicsDB
to 1.5.3, which supports M1 Macs and switches no-call representation back to./.
(#8710) (#8759) - Updated
http-nio
to 1.1.1, which fixes several URL-handling bugs with HTTP support (#8889) - Updated several miscellaneous dependencies to fix security vulnerabilities (#8898)
- Updated