Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge devel into master #143

Merged
merged 28 commits into from
Jan 19, 2022
Merged

merge devel into master #143

merged 28 commits into from
Jan 19, 2022

Conversation

Lioscro
Copy link
Collaborator

@Lioscro Lioscro commented Jul 25, 2021

General

  • Added the compile command. See below for more information. (OSError: [Errno 8] Exec format error #139)
  • Fixed an issue where a call to kallisto would hang indefinitely due to a full stderr buffer.
  • Changed docstring style to Google-style. Added typings to all functions.
  • Updated kallisto binaries to v0.48.0.
  • Updated bustools binaries to v0.41.0.
  • Added binary compatibility checks. If a binary is incompatible, kb compile is suggested.

compile

  • This command can be used to compile the kallisto and/or bustools binary from source. At the most basic level, it downloads the latest release source distributions from the respective GitHub repositories, compiles them, and places them where kb can automatically detect them.
  • The target positional argument specifies which binary (or both) to compile. Possible values are kallisto, bustools and all.
  • The --url optional argument may be provided with a URL to a remote archive that will be used instead of the latest GitHub release. When this option is used, target may not be all.
    • The --ref optional argument may be provided with a commit hash or git tag. When this option is used, target may not be all.
  • The -o optional argument may be used to place the compiled binaries in a different directory. Note that if this option is used, --kallisto and --bustools options will have to be set appropriately when running ref or count.
  • The --view option may be used to simply view what binaries (their locations and versions) will be used by kb.
  • The --remove option may be used to remove existing compiled binaries.
  • The --overwrite option may be used to overwrite existing compiled binaries.
  • The kallisto compilation follows https://pachterlab.github.io/kallisto/source and has the same dependencies.
  • The bustools compilation follows https://bustools.github.io/source and has the same dependencies.
  • The --cmake-arguments argument may be used to pass in a string of additional arguments to pass directly to the cmake command. For instance, to manually specify additional include directories, --cmake-arguments "-DCMAKE_CXX_FLAGS='-I /usr/include'"
  • Note that the compilation is performed in shared mode, which means the binary will contain links to shared libraries (i.e. not statically linked).

ref

  • Added --include-attribute and --exclude-attribute options which can be used to include/exclude specific GTF entries based on their attributes. The argument to these options must be in the form of a key:value pair, where key is a GTF attribute name and value is the value of the aforementioned attribute to include/exclude. Only one of these two options may be specified, and each option may be specified more than once. When multiple --include-attribute are provided, GTF entries that have any one of the attributes will be processed. When multiple --exclude-attribute are provided, GTF entries that have any one of the attributes will not be processed.

count

  • Added --filter-threshold option to specify the barcode filter threshold. This option may only be used when also providing --filter bustools and indicates the minimum number of times a barcode must appear to be retained from filtering. (How to manually change the threshold of UMI filtering in Kb count wrapper #142)
  • Added --strand option to override automatic strandedness setting by kallisto bus. Available options are unstranded, forward, and reverse.
  • Changed the transcript_ids column to be a semicolon-delimited string instead of a list (only applicable when --tcc is provided) as a workaround for an issue with writing lists to h5ad with h5py>=3. error in writing transcript_ids for tcc h5ad #141
  • Added BULK and SMARTSEQ2 technologies. The two technologies behave identically. The FASTQs may be provided either directly via command-line (only for multiplexed samples), in which case kb will perform demultiplexing, or as a single batch definition text file (only for demultiplexed samples). See https://pachterlab.github.io/kallisto/manual section about batch.txt for formatting. This batch textfile may also contain remote urls to FASTQ files, which will be streamed for supported operating systems. Additionally, added --parity, --fragment-l and --fragment-s options, which may only be provided for these technologies. The first must always be provided, indicating the parity of the reads (single, paired), and the latter two may only be provided when --parity single is also provided, specifying the mean length of the fragments and standard deviation of the fragment lengths.
  • DEPRECATION The SMARTSEQ technology has been deprecated and will be removed in the next release. Instead, SMARTSEQ2 should be used. See previous point for more information.
  • Added SMARTSEQ3 technology.
  • The full binary path is used for --dry-run instead of an alias.
  • Added --umi-gene option, which deduplicates UMIs by gene. Can not be used with smartseq or bulk technologies.
  • Added --em option, which estimated gene abundances using the EM algorithm. Can not be used with smartseq or bulk technologies, or with --tcc.
  • Fixed an issue that occurs when the -o option to bustools count already exists, but as a directory. For instance, counts_unfiltered/cells_x_genes. Such folders are removed before running the command.
  • Improved output file validation so that all expected files must exist.
  • Added --gene-names option, which may only be used with --h5ad or -loom and not --tcc. By specifying this option, the output h5ad or loom matrix will be aggregated by gene names instead of IDs.
  • Added support for the following technologies: BDWTA (BD Rhapsody), SPLIT-SEQ, Visium (10x).

@Lioscro Lioscro closed this Aug 20, 2021
@Lioscro Lioscro reopened this Aug 20, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2022

Codecov Report

Merging #143 (068c9ef) into master (6036007) will increase coverage by 0.20%.
The diff coverage is 96.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   96.60%   96.80%   +0.20%     
==========================================
  Files          12       14       +2     
  Lines        1268     1630     +362     
==========================================
+ Hits         1225     1578     +353     
- Misses         43       52       +9     
Impacted Files Coverage Δ
kb_python/dry/__init__.py 93.75% <75.00%> (-6.25%) ⬇️
kb_python/dry/count.py 78.57% <78.57%> (ø)
kb_python/count.py 95.29% <94.89%> (-0.18%) ⬇️
kb_python/compile.py 95.19% <95.19%> (ø)
kb_python/config.py 95.60% <97.61%> (-1.50%) ⬇️
kb_python/utils.py 95.95% <99.02%> (+4.32%) ⬆️
kb_python/constants.py 100.00% <100.00%> (ø)
kb_python/dry/utils.py 100.00% <100.00%> (ø)
kb_python/ref.py 100.00% <100.00%> (ø)
kb_python/report.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6036007...068c9ef. Read the comment docs.

@Lioscro Lioscro merged commit c2af1f8 into master Jan 19, 2022
@Lioscro Lioscro deleted the devel branch January 19, 2022 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants