Releases: pachterlab/kb_python
Releases · pachterlab/kb_python
v0.29.1
Updates since version 0.28.2:
Major:
- Upgraded kallisto to 0.51.1 and bustools to 0.44.1
- Added lr-kallisto (--long) option, and enabling k>31
- Added kb extract
- Added various kallisto binaries (w/ and w/o optimizations; w/ and w/o long k-mer sizes)
Other:
- Allow -i NONE in kb ref to create t2g+fasta but no index
- Various bug fixes (pandas version dependency, adata.X in nac containing total matrix, summing matrices not mishandling scientific notation, etc.)
- Ended support for python 3.7
v0.28.2
v0.28.1
v0.28.0
Implements all the updates detailed in protocols paper: https://doi.org/10.1101/2023.11.21.568164
- kallisto version 0.50.1
- bustools version 0.43.1
v0.27.3
General
- Bumped
ngs-tools>=1.7.3
.
ref
- [DEPRECATION] Split index generation using
-n
has been fully deprecated. (Thanks to @amcdavid for catching a bug)
count
- Fixed a minor issue with
--workflow kite:10xFB
, wherebustools project
would be called beforebustools correct
(the order should be opposite). This fix required a bump to thengs-tools
dependency. - Support for
--workflow lamanno
for-x smartseq3
. - [DEPRECATION] Counting using split indices by providing a comma-delimited list to
-i
has been fully deprecated. - Support for whitelist (
-w option
) forbulk
,smartseq2
andsmartseq3
technologies. - Added support for
-x 10XV3_ULTIMA
.
v0.27.2
v0.27.1
General
- [DEPRECATION] Support for split indices (with the
-n
option) will be deprecated in the next major release. It is now recommended to use--include-attribute
and--exclude-attribute
options, similar to Cellranger'smkref
options (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references), tokb ref
to reduce index size and memory usage.
ref
- A remote URL may be provided as the
fasta
(genomic FASTA) and/orgtf
(gene annotation GTF) arguments. Support fromngs_tools 1.5.13
. - GTF is now allowed to have 0-length segments (pachterlab/kallisto#340).
count
- [DEPRECATION] Technology
SMARTSEQ
is now deprecated. All future uses should useBULK
,SMARTSEQ2
orSMARTSEQ3
. - Genes that do not have a gene name will now have their gene IDs in the
gene_name
column (or theadata.var_names
if--gene-names
is used). - Support for
--workflow lamanno
for-x BULK
and-x SMARTSEQ2
technologies.
v0.27.0
General
- Added the
compile
command. See below for more information. (#139) - Fixed an issue where a call to kallisto would hang indefinitely due to a full stderr buffer.
- Changed docstring style to Google-style. Added typings to all functions.
- Updated kallisto binaries to
v0.48.0
. - Updated bustools binaries to
v0.41.0
. - Added binary compatibility checks. If a binary is incompatible,
kb compile
is suggested.
compile
- This command can be used to compile the
kallisto
and/orbustools
binary from source. At the most basic level, it downloads the latest release source distributions from the respective GitHub repositories, compiles them, and places them wherekb
can automatically detect them. - The
target
positional argument specifies which binary (or both) to compile. Possible values arekallisto
,bustools
andall
. - The
--url
optional argument may be provided with a URL to a remote archive that will be used instead of the latest GitHub release. When this option is used,target
may not beall
. -
- The
--ref
optional argument may be provided with a commit hash or git tag. When this option is used,target
may not beall
.
- The
- The
-o
optional argument may be used to place the compiled binaries in a different directory. Note that if this option is used,--kallisto
and--bustools
options will have to be set appropriately when runningref
orcount
. - The
--view
option may be used to simply view what binaries (their locations and versions) will be used bykb
. - The
--remove
option may be used to remove existing compiled binaries. - The
--overwrite
option may be used to overwrite existing compiled binaries. - The
kallisto
compilation follows https://pachterlab.github.io/kallisto/source and has the same dependencies. - The
bustools
compilation follows https://bustools.github.io/source and has the same dependencies. - The
--cmake-arguments
argument may be used to pass in a string of additional arguments to pass directly to thecmake
command. For instance, to manually specify additional include directories,--cmake-arguments "-DCMAKE_CXX_FLAGS='-I /usr/include'"
- Note that the compilation is performed in shared mode, which means the binary will contain links to shared libraries (i.e. not statically linked).
ref
- Added
--include-attribute
and--exclude-attribute
options which can be used to include/exclude specific GTF entries based on their attributes. The argument to these options must be in the form of akey:value
pair, wherekey
is a GTF attribute name andvalue
is the value of the aforementioned attribute to include/exclude. Only one of these two options may be specified, and each option may be specified more than once. When multiple--include-attribute
are provided, GTF entries that have any one of the attributes will be processed. When multiple--exclude-attribute
are provided, GTF entries that have any one of the attributes will not be processed.
count
- Added
--filter-threshold
option to specify the barcode filter threshold. This option may only be used when also providing--filter bustools
and indicates the minimum number of times a barcode must appear to be retained from filtering. (#142) - Added
--strand
option to override automatic strandedness setting bykallisto bus
. Available options areunstranded
,forward
, andreverse
. - Changed the
transcript_ids
column to be a semicolon-delimited string instead of a list (only applicable when--tcc
is provided) as a workaround for an issue with writing lists to h5ad withh5py>=3
. #141 - Added
BULK
andSMARTSEQ2
technologies. The two technologies behave identically. The FASTQs may be provided either directly via command-line (only for multiplexed samples), in which casekb
will perform demultiplexing, or as a single batch definition text file (only for demultiplexed samples). See https://pachterlab.github.io/kallisto/manual section aboutbatch.txt
for formatting. This batch textfile may also contain remote urls to FASTQ files, which will be streamed for supported operating systems. Additionally, added--parity
,--fragment-l
and--fragment-s
options, which may only be provided for these technologies. The first must always be provided, indicating the parity of the reads (single
,paired
), and the latter two may only be provided when--parity single
is also provided, specifying the mean length of the fragments and standard deviation of the fragment lengths. - DEPRECATION The
SMARTSEQ
technology has been deprecated and will be removed in the next release. Instead,SMARTSEQ2
should be used. See previous point for more information. - Added
SMARTSEQ3
technology. - The full binary path is used for
--dry-run
instead of an alias. - Added
--umi-gene
option, which deduplicates UMIs by gene. Can not be used with smartseq or bulk technologies. - Added
--em
option, which estimated gene abundances using the EM algorithm. Can not be used with smartseq or bulk technologies, or with--tcc
. - Fixed an issue that occurs when the
-o
option tobustools count
already exists, but as a directory. For instance,counts_unfiltered/cells_x_genes
. Such folders are removed before running the command. - Improved output file validation so that all expected files must exist.
- Added
--gene-names
option, which may only be used with--h5ad
or-loom
and not--tcc
. By specifying this option, the output h5ad or loom matrix will be aggregated by gene names instead of IDs. - Added support for the following technologies:
BDWTA
(BD Rhapsody),SPLIT-SEQ
,Visium
(10x).