Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPCells to h5AD #49

Closed
jonathan-columbiau opened this issue Sep 26, 2023 · 11 comments
Closed

BPCells to h5AD #49

jonathan-columbiau opened this issue Sep 26, 2023 · 11 comments

Comments

@jonathan-columbiau
Copy link

Hi,

Thanks for creating this great tool! I'd like to use data I currently have stored in a BPCells matrix with a library only found in Python and that takes in h5ad Anndata files - does BPCells have functionality to write matrices to h5ad?

Thanks!

@bnprks
Copy link
Owner

bnprks commented Sep 27, 2023

Currently no, but I've been waiting for an excuse to add the functionality. I'll try adding it over the next couple days and post back here how it goes

bnprks added a commit that referenced this issue Oct 20, 2023
- Support writing sparse matrices to AnnData files
- Allow AnnData matrices of type uint32_t, float, and double rather than
  just float. (This is confirmed working in the Python library, so it
  seems reasonable to have here)
- Dim names will automatically work properly when writing to X or
  layers/*. Other cases might not pick up the dimension names from the
  obs or var annotations.
@bnprks
Copy link
Owner

bnprks commented Oct 20, 2023

Sorry for the delay on this, but I've finally pushed an update which adds the function write_matrix_anndata_hdf5(). This should make it possible to write a BPCells matrix either to a standalone h5ad file, or as an extra matrix in an existing h5ad

@bnprks bnprks closed this as completed Oct 20, 2023
@jonathan-columbiau
Copy link
Author

Thanks, really appreciate it!

@Dario-Rocha
Copy link

Hello again, this package is getting everyday better!
At the moment I have a similar task as the OP but I R fails to find the function

Error: 'write_matrix_anndata_hdf5' is not an exported object from 'namespace:BPCells'

Even though the function is there in the help
I've just reinstalled BPCells package

Thanks a lot again for your help!

@bnprks
Copy link
Owner

bnprks commented Dec 15, 2023

Oh thanks for the heads up @Dario-Rocha! It looks like I hadn't re-generated the NAMESPACE file so the function wasn't getting exported. Should be fixed now by commit f1b9f6b. (Also for future temporary workarounds, you can use BPCells::: with three colons to access unexported methods though please still let me know if I've forgotten to export something)

@Dario-Rocha
Copy link

Great! that worked. Now I am only wondering if BPCells can write the expression matrix along with the obs and vars data in the h5ad file

@bnprks
Copy link
Owner

bnprks commented Dec 19, 2023

Right now the intended behavior is as follows (though correct me if you're seeing something else happen):

  1. BPCells can write sparse matrices either to the main matrix X (by default), a layer of the matrix (by setting the group to layers/my_matrix_name), or even under varm or obsm if desired.
  2. For compatibility purposes, BPCells will write a barebones obs or var group if that doesn't already exist in the file, which just contains a 0-based row or column index.

Assuming you are wanting to write additional data to obs or var, BPCells doesn't have additional support for that right now. I'd be happy to accept a contribution adding this support, but I haven't built it personally because handling factors seems a bit tricky to get right and BPCells doesn't have much other metadata-related functionality right now.

(If you just want a quick way to pass metadata yourself, I'd recommend hdf5r and checking out the Anndata format docs, or even using reticulate to write the metadata directly using the AnnData python package from within R)

@Dario-Rocha
Copy link

Thank you!, I understand

@ggruenhagen3
Copy link

I'm posting this here in case it helps others who were in my situation.

I was unable to access the data matrix after converting to an h5ad. For example, trying to access the first 5 genes and cells would give an error that said "ValueError: unsupported data types in input". Eventually, I found a solution on my own. I changed the data type of the matrix to an integer.

{r}
write_matrix_hdf5(obj[["SCT"]]$counts, "sct_counts.h5")
{python}
adata = sc.read_h5ad("sct_counts.h5")
adata.X[0:5, 0:5]              # -> gives the ValueError
adata.X = adata.X.astype(int)  # solution
adata.X[0:5, 0:5]              # this now works

@bnprks
Copy link
Owner

bnprks commented Feb 8, 2024

Thanks for the advice @ggruenhagen3! If I remember correctly the Anndata specification doesn't require any particular data type for matrices stored on disk, but evidently the python package implementation is not so flexible when reading from disk. I suppose your workaround might be the best option for now, or I believe calling mat <- convert_matrix_type(mat, "float") prior to writing in BPCells would be another option to match up with apparent type limitations in scanpy/anndata

@ggruenhagen3
Copy link

@bnprks I had tried convert_matrix_type to every available option (ie "float", "uint32_t", "double"), but all resulted in an in python when trying to use the matrix. The problem may lie with in python with anndata?

For the record, I am using the following versions: R 4.3.1, BPCells_0.1.0, Seurat_5.0.0, anndata in R version 0.7.5.6 (not sure that this one is relevant), python 3.9.18, scanpy 1.9.3, and anndata in python version 0.8.0 (I had tried other versions, including 0.10.something).

ycli1995 added a commit to ycli1995/BPCells that referenced this issue Mar 1, 2024
…nprks#49 (comment)]

* When data type of `indptr` is fixed to `int64`, the .h5ad created by `write_matrix_anndata_hdf5`
will work with python `sc.read_h5ad`.
bnprks added a commit that referenced this issue Mar 15, 2024
* [cpp] Force the `X/indptr` to be int64 for `createAnnDataMatrix`, solving #76
For additional details, see:
- #49 (comment)
---------

Co-authored-by: Ben Parks <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants