Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing new IndexStyle / type of index? #38284

Closed
Luapulu opened this issue Nov 2, 2020 · 17 comments
Closed

Implementing new IndexStyle / type of index? #38284

Luapulu opened this issue Nov 2, 2020 · 17 comments

Comments

@Luapulu
Copy link

Luapulu commented Nov 2, 2020

I've been working on adding a TiledIndex with a IndexTiled <: IndexStyle, for all sorts of arrays that are best indexed into via an index representing the tile and an index within that tile. See issue in TiledIteration.jl. Along the way I had a few thoughts:

  1. Should reduce operations and possibly a number of others dispatch on IndexStyle, thus allowing users to implement new index types along with efficient algorithms for those index types.
  2. Same thought goes for Base.to_index and Base.to_indices. Should these dispatch on the IndexStyle, so that indices are converted to the correct type, based on the IndexStyle. This would allow any user defined index to work with any array type as long as the relevant to_index and to_indices functions are implemented!
  3. Should we add a section in the manual on implementing new index types. It could extend the section on Arrays with custom indices

May be I've misunderstood the point of IndexStyle. If that's the case, can anyone point me to some resources on implementing custom indexing? The docs only have a section for adding offset indexing, but not for more complicated indexing.

@timholy
Copy link
Member

timholy commented Nov 2, 2020

This seems like an appropriate usage to me. I've got some deadlines so may not get to you quickly, but I don't think you're off-base to be thinking this way.

@Luapulu
Copy link
Author

Luapulu commented Nov 5, 2020

I've started working on some of these ideas in TiledArrays.jl.

@johnnychen94
Copy link
Member

johnnychen94 commented Nov 6, 2020

I may have misunderstood the issue here. After JuliaArrays/TiledIteration.jl#23 (v0.3.0), TileIteration has changed to array interface, e.g.,:

julia> A = rand(200, 50);

julia> TileIterator(axes(A), (128,8))
2×7 TileIterator{2,Tuple{TiledIteration.CoveredRange{StepRange{Int64,Int64},TiledIteration.LengthAtMost},TiledIteration.CoveredRange{StepRange{Int64,Int64},TiledIteration.LengthAtMost}}}:
 (1:128, 1:8)    (1:128, 9:16)    (1:128, 17:24)    (1:128, 25:32)    (1:128, 33:40)    (1:128, 41:48)    (1:128, 49:50)
 (129:200, 1:8)  (129:200, 9:16)  (129:200, 17:24)  (129:200, 25:32)  (129:200, 33:40)  (129:200, 41:48)  (129:200, 49:50)

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

The TiledIterator is an array to help index into another. With a ‘TiledArray’ I mean an Array that contains the actual data, that can, or even must be, indexed into tile by tile — An easy example being chunked data on disk.

The problem is, adding a new kind of index currently requires rewriting a whole lot of code. Reduce operations, broadcasting, views, iteration, sorting, reshaping, etc. have to be rewritten algorithmically, but you also have to implement a whole system of conversions between different indices, i.e. cartesian to tile, tile to cartesian, tile of of one tiling style to tile of another tiling style and this has to work for any kind of regular tiling.

This is why I thought it might be possible to hijack Base.to_indices. You’d still have to rewrite the algorithmic aspects for the new index type, but at least you wouldn’t also have to reimplement conversion between indices everywhere. Just define to_indices once and you’d be done. This also allows indexing into already existing array types with a new index.

The idea then is to also make to_indices dispatch on IndexStyle, so as to allow different arrays to have different index styles, even if they are subtypes of each other, or otherwise related in the type tree.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

More importantly, sort, reduce, broadcast, etc. also have to dispatch on IndexStyle, for all of this to work.

@johnnychen94
Copy link
Member

johnnychen94 commented Nov 6, 2020

So if I understand it correctly, if At is a TiledArray, then At[1] returns an array here? Does it bring any actual performance improvement compared to A[TileItertor(...)[1]] where A is the normal array?

Or you just want to store the tile metadata to normal array A, and use it to create R = TileIterator(...) so that indexing A[R[1]] is efficient? If this is the case, then a very thin array wrapper can be made for this purpose and there's no need to create a new IndexStyle.

I'm a little skeptical to mix the normal array world and the fancy block array world.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

No, At[1] would return the first element of At, not an array. At[TileIndex(1)] would return the first tile (an array in most cases) and something like At[TileIndex(1), 1, 2, 3] would return the element (1, 2, 3) within the first tile. The user would rarely use TileIndex(1) explicitly. Instead, operations like sum or At .*= 3 would use TileIndex implicitly.

@johnnychen94
Copy link
Member

johnnychen94 commented Nov 6, 2020

At[TileIndex(1)] would return the first tile (an array in most cases) and something like At[TileIndex(1), 1, 2, 3] would return the element (1, 2, 3) within the first tile.

It's not very clear to me why TileIterator doesn't fit the usage. Say R = TileIterator(...), then A[R[1]...] return the first tile. It's not implemented yet, but it's doable to make TileIterator with eltype CartesianIndices so that A[R[1]] works.

I still don't get why we need a new IndexStyle for this usage.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

One example where IndexStyle come in useful: eachindex dispatches on IndexStyle to deliver an efficient iterator over all the indices in an array. For tiled arrays, that iterator should return indices tile by tile. So, something like:

eachindex(::IndexTiled, A) = ((ti, inds) for inds in eachindex(A[ti]) for ti in eachtileindex(A))

Just to clarify, I very much intend on using TileIterator. The goal is not to replace, but to build on top of.

@mbauman
Copy link
Member

mbauman commented Nov 6, 2020

This is indeed different from how IndexStyle has thus far been used — it really only says whether linear or cartesian indexing is fastest, and doesn't change the order of iteration. In fact, I wouldn't be surprised if there were generic code that assumes order of iteration of an eachindex is columnar (or at least in sync with another array's eachindex).

As a concrete counter point, we don't use a TransposedStyle to change the iteration order for adjoints. Maaaaybe we could, but this gets into #15648 territory.

@johnnychen94
Copy link
Member

johnnychen94 commented Nov 6, 2020

Just to clarify, I very much intend on using TileIterator. The goal is not to replace, but to build on top of.

This is exactly my worries. I'm worried that the changes to add a new IndexStyle, which will be a big change, does not bring any actual improvement on both performance and API(IIUC).

There's no API for eachindex(::IndexStyle, A...) and there is no API for getindex(::IndexStyle, A, inds...), a bunch of such API changes is almost not worth doing if it only marginally improves the usage API. Not to mention that these changes would be very likely to break a lot of codes in downstream Julia packages.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

It might not be public, but it's there

eachindex(::IndexLinear, A::AbstractArray) = (@_inline_meta; OneTo(length(A)))

As far as performance, in the same way that remainder division makes it slow to go from linear index to cartesian index, it is slow to go from cartesian index to tile index. The reverse is faster, especially when you have a buffer that you're filling with values from a larger array. You want to fill the buffer once, do everything you need with it and then fill it with new values for the next chunk / tile / piece.

#15648 is very interesting, thanks for the link.

May be hijacking IndexStyle is not the way to go, but there must (or ought to) be some way to specify how an array should be iterated over and indexed into?

@mbauman
Copy link
Member

mbauman commented Nov 6, 2020

I think you could still use an IndexTiled style here... but the trick would be if you could get a cleverly fast iterator to return a TileIndex(tileid, i, j) that iterated column major. Getting that iterator to be fast in column major order would be very tough though.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

So, punt on the iteration order part and just tackle the multiplication being faster than remainder division part using IndexStyle?

If that's the case, it isn't much of a solution for something like DIskArrays.jl because iteration order is the key issue there. Unfortunately, that's also the main thing I want to work on :(.

It might help with stuff like virtual concatenation and more complex tiling though.

@Luapulu
Copy link
Author

Luapulu commented Nov 6, 2020

Just found BlockArrays.jl. That might solve my concrete issues at the moment.

I still wonder if an array interface that allows for different iteration order is possible. Not sure what role IndexStyle would have in such an interface.

@Luapulu
Copy link
Author

Luapulu commented Nov 9, 2020

ArrayInterface.jl seems to be dealing with some of these ideas. I’ll close this, since that’s where these sorts of discussions seem to be taking place.

@Luapulu Luapulu closed this as completed Nov 9, 2020
@timholy
Copy link
Member

timholy commented Nov 9, 2020

That seems reasonable. There's also https://github.com/timholy/ArrayIteration.jl but the parts of it that haven't already moved to Base are mostly focused on the problem of sparsity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants