Extend benchmarks to zarr v3 #14

falexwolf · 2024-11-05T14:05:50Z

From a discussion with @ilan-gold (Slack ref).

@Koncopd, let's add this. Ilan is not in a particular rush though.

falexwolf · 2024-12-09T09:23:39Z

Following up here with more detailed suggestions rom @ilan-gold regarding benchmark zarr v3 (Slack ref).

So the way I would do it is:

Add a benchmark for the single dataset case, using zarrv3 both with and without our zarrs extensions (https://github.com/ilan-gold/zarrs-python) using both the read_elem_as_dask API and using sparse_dataset API

For the multi-benchmark case, use the concat API to make one giant dataset all loaded with X form read_elem_as_dask, tested with and without zarrs extension

The branch is here: scverse/anndata#1726 for zarr v3. For writing out synthetic data, I would only write out the X matrix using ad.io.write_elem since I think string handling is a bit iffy (although you can try). Similarly for reading in the dataset I would only do something like AnnData(X=read_elem_as_dask("path_to.zarr")) . I don't think the indices make a huge difference but maybe I'm wrong.

We'll be improving the zarrs performance but I think it's in a good state. I'm seeing about a 2X boost in general over zarrv3, which should already be a big perf improvement. Also, I noticed you don't clear the disk cache in scripts or missed it. That might be skewing results!

@ilan-gold, can you clarify whether any of this is for streaming from S3 or is it all for local datasets?

ilan-gold · 2024-12-09T09:25:08Z

It is only for local: ilan-gold/zarrs-python#44.

We also probably should make things truly async before testing the HTTP stuff but we can if you'd like once the PR is merged

ilan-gold · 2024-12-09T09:25:36Z

Also it is only sparse. I am going to open a PR for integer indexing soon on dense, but it will not be done until next year due to the holiday

falexwolf · 2024-12-09T19:25:53Z

Got it! 🤔

falexwolf assigned Koncopd and falexwolf Nov 5, 2024

falexwolf changed the title ~~Run dense array benchmark against zarr v3~~ Extend benchmarks to zarr v3 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend benchmarks to zarr v3 #14

Extend benchmarks to zarr v3 #14

falexwolf commented Nov 5, 2024

falexwolf commented Dec 9, 2024

ilan-gold commented Dec 9, 2024

ilan-gold commented Dec 9, 2024 •

edited

Loading

falexwolf commented Dec 9, 2024

Extend benchmarks to zarr v3 #14

Extend benchmarks to zarr v3 #14

Comments

falexwolf commented Nov 5, 2024

falexwolf commented Dec 9, 2024

ilan-gold commented Dec 9, 2024

ilan-gold commented Dec 9, 2024 • edited Loading

falexwolf commented Dec 9, 2024

ilan-gold commented Dec 9, 2024 •

edited

Loading