Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate granges rust crate #150

Open
ghuls opened this issue Jun 19, 2024 · 4 comments
Open

Investigate granges rust crate #150

ghuls opened this issue Jun 19, 2024 · 4 comments

Comments

@ghuls
Copy link
Contributor

ghuls commented Jun 19, 2024

Investigate granges rust crate for allowing GenomicRanges like functionality: https://github.com/vsbuffalo/granges

@tshauck
Copy link
Member

tshauck commented Jun 20, 2024

Thanks! I'm aware of it... I recently added a page to the docs showing how to work with the genomicranges package, but it's unlikely to be as fast as granges.

If you have a sec, would you be able to briefly sketch out the API you'd imagine? Are these like methods on the result object, something else?

Also, FWIW, you can recreate a lot of granges like functionality with SQL if you're comfortable with it.

@ghuls
Copy link
Contributor Author

ghuls commented Jun 20, 2024

Yes, I saw the genomicranges integration, but it is very slow if you have a lot of intersections (e.g. intersecting a bed file with a bigwig (yesterday more than 1 hour, with latest version of iranges 15minutes):
BiocPy/GenomicRanges#98

With my own implementation based on Polars and ncls (intersect library behind pyranges) it takes less than 17 seconds.

Also, FWIW, you can recreate a lot of granges like functionality with SQL if you're comfortable with it.

When you have a lot of intersections, this will likely be slow if you don't use specific structures that can handle intervals efficiently.

If you have a sec, would you be able to briefly sketch out the API you'd imagine? Are these like methods on the result object, something else?

I didn't look closely at it yet. So no idea at the moment.

Another similar crate: https://github.com/noamteyssier/bedrs

@tshauck
Copy link
Member

tshauck commented Jun 20, 2024

Thanks yeah, I agree with all of that.

My initial thought is similar to how biobear works with VCF/BAM indices, as I'd ideally want it to be compatible with SQL then expose a more pythonic API on top of it.

@jkanche
Copy link

jkanche commented Jun 21, 2024

but it's unlikely to be as fast as granges.

Definitely not going to reach rust like speeds in Python :)

Our focus initially has been to bring Bioconductor-like representations to Python. Its time I find some focus time and optimize the methods that were implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants