Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select several items from a category axis #296

Open
henryiii opened this issue Jan 2, 2020 · 8 comments
Open

Select several items from a category axis #296

henryiii opened this issue Jan 2, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@henryiii
Copy link
Member

henryiii commented Jan 2, 2020

We currently have slice and single item selection (implemented internally as a slice + single item sum), but there is no way to select a list of items from a category axes. Basically this:

If I have a run_number axis with {1,2,5,7}, I would like to be able to select 1, or {1, 5}. The latter will require new syntax in UHI (probably [0,2]::bh.sum or bh.loc([1,5])::bh.sum, for consistency with numpy). There's actually a fairly easy workaround for summing; you can just do:

from functools import reduce
from operator import add

reduce(add, (hist[bh.loc(item)] for item in {1, 5}) )

We need some way to select a subset category. So select 2 and 7, but not 4, for example. It would return a new, smaller category axis. Once we get full UHI, this could be implemented through a UHI object. Or we could come up with a new syntax, like [2,7]::bh.sum or something like that. Note that in Numpy, this would be a list, but there's no need to combine lists and slices, but we would often like to combine lists and actions.

This is easy to implement if it is supported upstream - is it @HDembinski? Basically a way to give a new category axis that is a subset of an existing one and ask for boost-histogram to copy one to the other would be enough.

Was mentioned in #274, but was not the focus of that issue. Brought up independently on gitter by @paulgessinger.

@HDembinski
Copy link
Member

Adding this in Boost.Histogram is not so easy, because it requires a large change to reduce. Currently, reduce can only select ranges. It is not possible to pick the 1, 3, 5, ... bin.

@henryiii henryiii changed the title Select an item or several items from a category axis Select several items from a category axis Jan 3, 2020
@henryiii henryiii changed the title Select several items from a category axis Select an item or several items from a category axis Jan 3, 2020
@paulgessinger
Copy link

How about a range of length one? I’m interested in slicing just one category from a category axis. Would that work, or is that even possible already?

@henryiii
Copy link
Member Author

henryiii commented Jan 3, 2020

What about adding a "pick" axis bins to Boost.Histogram C++? The current workaround is just that; and that's why it is breaking for boost-histogram with category axis, but it would be useful for single item selection for any axis. So you could say:

hist.pick(/*axis*/ 2, /*bin*/ 4)

And it would remove the "2" axis, keeping only the contents in the 4th bin for the rest of the axes. (Syntax only for example, probably not ideal - some way to pick multiple axes in one go would be better)

Currently you have to sum over a slice of length 1 (unless I'm missing something).

@henryiii
Copy link
Member Author

henryiii commented Jan 3, 2020

How about a range of length one? I’m interested in slicing just one category from a category axis. Would that work, or is that even possible already?

h[:, bh.loc("signal")] is short for h[:, bh.loc("signal"):bh.loc("signal")+1:bh.sum]. This normally works, but category histograms explicitly disable slicing, even by bin number (which is what the loc turns into).

The problem is that slices are not allowed on categories, even by bin number. Normally bin numbers are not very important for categories (and not sorted, etc), so you probably do not want to slice, but rather to pick from a list (which is, as @HDembinski mentioned, not implemented at the moment). It would also only work for categories (and integer) histograms, as other histograms would need a new type of piecewise axes to produce a result.

@henryiii
Copy link
Member Author

henryiii commented Jan 3, 2020

If we had full UHI support, it would be easy to write a functor that selects categories and use that instead of bh.sum. But we don't support it yet.

@HDembinski
Copy link
Member

@paulgessinger I think I can easily add slicing of a category axis, you specify a range of indices and the resulting histogram has a category axis with that sub-range. This would solve your issue. Let's target this case for now, and leave the more complicated "pick the following non-contiguous indices" for later (but we should probably also support that).

@HDembinski
Copy link
Member

I added it to my issues. I am currently busy with some other stuff, but I will give this high priority.

@henryiii henryiii added this to the 0.7.0 milestone Jan 8, 2020
@henryiii henryiii changed the title Select an item or several items from a category axis Select several items from a category axis Jan 8, 2020
@henryiii henryiii removed this from the 0.7.0 milestone Jan 8, 2020
@henryiii henryiii added the enhancement New feature or request label Jan 8, 2020
@HDembinski
Copy link
Member

Adding @pfackeldey as another person interested in this feature. It should be really implemented on the upper level in C++ Boost Histogram. I will try to work on this ASAP, but I am also happy to receive PRs on https://github.com/boostorg/histogram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants