Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: make transpose non-recursive #23424

Closed
wants to merge 1 commit into from
Closed

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Aug 24, 2017

Step 2 of JuliaLang/LinearAlgebra.jl#408 "Taking matrix transposes seriously". Currently WIP - I hope to at least add some more unit tests and have the design questions below answered.

This PR continues to have a recursive adjoint for doing linear algebra (on possibly nested arrays, or other user defined structures) but removes the recursion for transpose. In effect, transpose is being "gifted" to data applications as a simple way to rearrange data in containers (like permutedims) - for example, we can now do ["abc", "def"].'. I also have corrected any non-recursive adjoint behavior as I found it.

One consequence which seems odd at first but makes more sense when you think about it is that some linear algebra applications will need to use conj(m') instead of m.'. This doesn't seem so bad since this is actually how legitimate linear algebra uses of .' arises anyway.

The design for RowVector extends on what was already there, adding recursive AdjointArray wrappers just like the existing lazy ConjArray wrapper. The idea is to do the same for transpose(::AbstractMatrix) and adjoint(::AbstractMatrix) soon, so that transpose there is lazy and we can remove Ac_mul_B and so-on. Future work will also involve moving functions into appropriate files, but since work is ongoing I probably would prefer to do that in a separate PR.

Two design questions:

  1. Should we define a conjadjoint(x) function? It seems to come up occassionally, and would be easy to elide in appropriate cases, e.g. via conjadjoint(x::Number) = x. (Going the opposite direction, we could remove ConjAdjointArray...)
  2. Should Symmetric and issymmetric be recursive? Their names make it seem a bit like they should go with transpose and work on "data", however most common usage is for linear algebra and it is a part of LinAlg. The two choices are issymmetric(m) = m == m.' or issymmetric(m) = m == conj(m').

CC: @StefanKarpinski @stevengj @andreasnoack @Sacha0

@andyferris
Copy link
Member Author

Also, this a prerequisite of JuliaLang/LinearAlgebra.jl#450 (which requires a non-recursively transposing RowVector to work properly).

Copy link
Member

@fredrikekre fredrikekre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A general question: everything is @inlined, is that necessary?

@inline adjoint(a::ConjArray) = ConjAdjointArray(parent(a))

"""
AdjointArray(array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

0+0im 1+1im
```
"""
struct AdjointArray{T,N,A<:AbstractArray} <: AbstractArray{T,N}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps struct AdjointArray{T,N,A<:AbstractArray{T,N}} <: AbstractArray{T,N} or perhaps the element type of A can be different from T?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N stays the same but the eltype can change via the mapping through adjoint. I can do struct AdjointArray{T,N,A<:AbstractArray{<:Any,N}} <: AbstractArray{T,N}.


@inline AdjointArray(a::AbstractArray{T,N}) where {T,N} = AdjointArray{adjoint_type(T),N,typeof(a)}(a)

const AdjointVector{T,V<:AbstractVector} = AdjointArray{T,1,V}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const AdjointVector{T,V<:AbstractVector{T}} unless eltype(V) sometimes is not T?

const AdjointVector{T,V<:AbstractVector} = AdjointArray{T,1,V}
@inline AdjointVector(v::AbstractVector{T}) where {T} = AdjointArray{adjoint_type(T),1,typeof(v)}(v)

const AdjointMatrix{T,M<:AbstractMatrix} = AdjointArray{T,2,M}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const AdjointMatrix{T,M<:AbstractMatrix{T}}

const AdjointMatrix{T,M<:AbstractMatrix} = AdjointArray{T,2,M}
@inline AdjointMatrix(m::AbstractMatrix{T}) where {T} = AdjointArray{adjoint_type(T),2,typeof(m)}(m)

# This type can cause the element type to change under conjugation - e.g. an array of complex arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this answers my own question above then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@inline conj(a::AdjointArray) = ConjAdjointArray(parent(a))

"""
ConjAdjointArray(array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

@@ -929,7 +929,7 @@ issymmetric(x::Number) = x == x
"""
ishermitian(A) -> Bool

Test whether a matrix is Hermitian.
Test whether a matrix is Hermitian (such that `A = adjoint(A)`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

==

@@ -13,44 +13,37 @@ vector can be multiplied by a matrix on its right (such that `v.' * A = (A.' * v
differs from a `1×n`-sized matrix by the facts that its transpose returns a vector and the
inner product `v1.' * v2` returns a scalar, but will otherwise behave similarly.
"""
struct RowVector{T,V<:AbstractVector} <: AbstractMatrix{T}
struct RowVector{T,V<:AbstractVector{T}} <: AbstractMatrix{T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

0+0im 1-1im
```
"""
struct ConjAdjointArray{T,N,A<:AbstractArray} <: AbstractArray{T,N}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be ConjAdjointArray{T,N,A<:AbstractArray{T,N}} <: AbstractArray{T,N}

@andreasnoack
Copy link
Member

Great that you are working on this. A few comments after a first read

  • Did you trash the idea of using mapped arrays?
  • AdjointArray being elementwise and not flipping the sizes seems unfortunate to me. If this is the best way to structure the functionality it might be better with a different name.
  • Are we concerned about the number of lazy array wrappers here?
  • I think issymmetric(m) = m == conj(m') is the right thing with these changes since I think issymmetric will continue to mainly be a linear algebra operation.

@mbauman mbauman added arrays [a, r, r, a, y, s] linear algebra Linear algebra labels Aug 24, 2017
@andyferris
Copy link
Member Author

andyferris commented Aug 25, 2017

@andreasnoack Yes, I think a MappedArray in Base would be better, and solves your first three points. I just wasn't sure how popular that would be. Maybe it's a separate PR?

I think you are right regarding issymmetric (and thus Symmetric).

@StefanKarpinski
Copy link
Member

If we're going to have some kinds of deferred computation views of arrays in Base, why not do the full blown thing? It seems somewhat simpler than having a bunch of special cases and then having the general version in a package.

@timholy
Copy link
Member

timholy commented Aug 25, 2017

Feel free to steal wildly from https://github.com/JuliaArrays/MappedArrays.jl. Note there are two types, differing in terms of whether you can or cannot define setindex! (for which you need the inverse transformation).

@timholy
Copy link
Member

timholy commented Aug 25, 2017

A general question: everything is @inlined, is that necessary?

AFAICT you almost never need manual @inline/@noinline anymore. Moreover, sometimes it solves problems (e.g., excessive compilation time in corner cases) not to use it.

@andyferris
Copy link
Member Author

andyferris commented Aug 29, 2017

AFAICT you almost never need manual @inline/@noinline anymore. Moreover, sometimes it solves problems (e.g., excessive compilation time in corner cases) not to use it.

Right - this is wonderful Tim :) I haven't had the chance to try it out with large StaticArrays yet. Does the heuristic take into account that the overhead of performing a function call (as opposed to inlining) scales with the size of the function arguments and return value? (for example: doing the equivalent a map(f, t) over a very simple f and very long isbits tuple t might be worth inlining for arbitrary long t. Here the cost of the function body scales as length(t), as does overhead of the function call.)

@andyferris andyferris force-pushed the ajf/vectortranspose2 branch from 11ef34b to aef7238 Compare August 29, 2017 11:16
@andyferris
Copy link
Member Author

If we're going to have some kinds of deferred computation views of arrays in Base, why not do the full blown thing? It seems somewhat simpler than having a bunch of special cases and then having the general version in a package.

I tend to agree... but I'm not sure what the community feels, or the best way forward. I mean - we need deferred computation of transpose and adjoint to preserve the nice Ac_mul_B behavior. But what else gets deferred? All map(f, array)s? broadcast? *, /, anything and everything in Base?

@andyferris
Copy link
Member Author

Feel free to steal wildly from https://github.com/JuliaArrays/MappedArrays.jl. Note there are two types, differing in terms of whether you can or cannot define setindex! (for which you need the inverse transformation).

@timholy MappedArrays.jl looks cool. I took the ideas and merged the two classes into a single MappedArray here: https://github.com/JuliaLang/julia/compare/ajf/mappedarray?expand=1.

If I get it working I'll open a PR - any design opinions?

@JeffBezanson
Copy link
Member

Yes might as well bring in the general MappedArray here --- looks like a small amount of code.

@Jutho
Copy link
Contributor

Jutho commented Sep 12, 2017

As a side note when I see the name ConjAdjoint popping up: some literature refers to normal transpose as adjoint, and explicitly specifies Hermitian adjoint for the former ctranspose, e.g.
https://en.wikipedia.org/wiki/Transpose_of_a_linear_map.

I don't think conjugate adjoint is a defined term, and therefore, no confusion could arise about its meaning, but I thought it was worth bringing up the above, since interpreting adjoint as transpose might lead to interpreting conjugate adjoint as ctranspose.

@andyferris
Copy link
Member Author

andyferris commented Nov 23, 2017

FYI when I get tests passing, I'd like to merge this (not there yet, still bugs).

Any last comments on e.g. MappedArray are welcome (we can fix afterwards also).

@Sacha0
Copy link
Member

Sacha0 commented Nov 23, 2017

I much appreciate the effort to move JuliaLang/LinearAlgebra.jl#57 and JuliaLang/LinearAlgebra.jl#408 forward that this pull request represents, and am on board with most of the associated plan. I would much like to also be on board with the terminology choices and downstream impact that this pull request advances, and so regularly try to convince myself that those choices yield the best set of tradeoffs available.

Each time I try to convince myself of that position, I run aground on the same set of misgivings. One of those misgivings is the substantial implementation complexity this choice forces upon LinAlg: Conflating transpose and array-flip forces LinAlg to handle both, requiring LinAlg support a much larger set of type combinations in common operations.

To illustrate, consider mul(A, B) where A and B are bare, adjoint-wrapped, transpose-wrapped, or array-flip-wrapped Matrixs. Without conflating transpose and array-flip, A can be a Matrix, an adjoint-wrapped Matrix, or a transpose-wrapped Matrix (and likewise B). So mul(A, B) need support nine type combinations. But conflating transpose and array-flip, A can be a Matrix, an adjoint-wrapped Matrix, a transpose-wrapped Matrix, or an array-flip-wrapped Matrix (and likewise B). So now mul(A, B) need support sixteen type combinations.

This problem worsens exponentially with the number of arguments. For example, without conflation mul!(C, A, B) need support twenty-seven type combinations, whereas with conflation mul!(C, A, B) need support sixty-four type combations. And of course adding Vectors and non-Matrix matrix/operator types to the mix further complicates things.

This side effect seems worth considering before moving forward with this change. Best!

@dlfivefifty
Copy link
Contributor

I think that's more of a comment about the current design of LinAlg. I think a redesign using traits, like IsBlasStrided would help.

I've sketched out a way this can work in a BandedMatrices.jl branch:

https://github.com/JuliaMatrices/BandedMatrices.jl/blob/feat-bandslice/src/generic/interface.jl

It dispatches by the trait type, not the matrix type. So for example if blasstructure(A::AbstractMatrix) returns BlasStrided, then it can dispatch multiplication to gemv.

@Sacha0
Copy link
Member

Sacha0 commented Nov 23, 2017

@dlfivefifty, can you expand? I do not follow how traits would address the problem of needing to handle an additional wrapper type with different semantics. Best!

@dlfivefifty
Copy link
Contributor

If it’s BLAS, knowing the stride and pointer is sufficient. So the type just needs to overload those. The trait helps determine if the BLAS interface is implemented.

@Sacha0
Copy link
Member

Sacha0 commented Nov 23, 2017

@dlfivefifty, these considerations are not confined to BLAS operations over dense arrays of BLAS-compatible scalar types, and in any case I am not certain I follow? Additionally, a complete redesign of LinAlg's structure does not seem in the cards in the next few weeks, even if somehow traits would address this concern?

@Jutho
Copy link
Contributor

Jutho commented Nov 23, 2017

I agree with @dlfivefifty . Pure julia fallback methods don't need to know whether a matrix is bare or wrapped, if only getindex and setindex! are correctly implemented (unless maybe to implement loops in the correct order, but even that could be solved). So it is only for BLAS that one needs to know, but there indeed just having every type correctly return the pointer, strides and e.g. the correct flag (C, T or N) means a cost just linear in the number of different wrappers, independent of the number of arguments, if properly implemented.

@Sacha0
Copy link
Member

Sacha0 commented Nov 23, 2017

Perhaps I see the difference in perspective 🙂:

The responses above seem to assume that operations over transpose-wrapped and array-flip-wrapped objects can always share an underlying implementation. This assumption holds in the most common and familiar cases, for example in operations that map to BLAS calls, where the semantics of transpose and array-flip coincide. Under that assumption, agreed, the additional complexity required to handle both transpose and array-flip in LinAlg would in some sense be superficial; that is, one can imagine a hypothetical system that could gracefully handle that additional complexity at some shallow level in call chains.

Even in such a system, however, handling both transpose and array-flip would still require an appreciable level of additional complexity. And lovely as such a system would be, the existing system neither is nor is likely to evolve into such a system by feature freeze.

And in general, operations over transpose-wrapped and array-flip-wrapped objects cannot always share an (efficient) underlying implementation. The difference in behavior between transpose-wrapped and array-flip-wrapped objects --- that the former is recursive, whereas the latter is not -- necessarily propagates all the way down to the elementary operations that constitute a given higher level operation, and so cannot always be efficiently handled at shallow levels in call chains.

Pure julia fallback methods don't need to know whether a matrix is bare or wrapped, if only getindex and setindex! are correctly implemented (unless maybe to implement loops in the correct order, but even that could be solved).

That statement is, to say the least, a minor trivialization of the importance and difficulty of writing efficient kernels for even simple operations that do not immediately map to e.g. BLAS calls 😄. Most operations in LinAlg that fall under the heading "pure Julia fallbacks" are not implemented in the way you describe, and, were they, they would be much slower. Best!

@Jutho
Copy link
Contributor

Jutho commented Nov 23, 2017

That statement is, to say the least, a minor trivialization of the importance and difficulty of writing efficient kernels for even simple operations that do not immediately map to e.g. BLAS calls 😄. Most operations in LinAlg that fall under the heading "pure Julia fallbacks" are not implemented in the way you describe, and, were they, they would be much slower. Best!

The most important aspect to take into account is loop order in combination with proper blocking no? Which could be checked by looking at the strides rather than at what particular type of wrapper it is (if the data is not strided than it's probably impossible to decide on an optimal strategy). I've been working on a package (https://github.com/Jutho/Strided.jl) to deal with strided data with arbitrary strides (such that permutedims, transpose and ctranspose, views with ranges and special reshapes which don't break strides are all lazy), and where an arbitrary combination of such strided arrays can be combined in an efficient map(!) construct and an optimal loop order and blocking strategy is chosen (at runtime, so some overhead for small arrays), even including multithreading.

Still different from linear algebra operations, but I would think similar constructions are possible. But indeed, not likely in the current structure of LinAlg.

@andyferris
Copy link
Member Author

@Sacha0 Yes, important points to consider. My thoughts on the way forward:

  • With eltype being a BLAS type, we have bare arrays, transposed arrays, conjugated arrays and transposed conjugated arrays. I see the 16 dispatch possibilities of * are the generalization of the set of Ac_mul_B operations we have and are the closure of * operations under conj and transpose of the inputs. This PR does it's best to make sure that adjoint turns into transpose(conj(...)) wherever possible (when eltype is Number).

  • If we wrote a blocking (cache-oblivious) method for * I suspect we could get closer to BLAS speeds with a single generic algorithm. It would only care about strides and so on, and the RowVector and TransposedMatrix wrappers could report these correctly, and with luck the ConjArray wrapper might not introduce overhead.

  • I imagine we could generalize the above cache-oblivious strided indexing order to other indexing patterns that understand sparseness.

Perhaps I'm being overly optimisitc, but with the right traits and so-on I thought we might get to a decent internal system in the future (not v1.0).

@Sacha0
Copy link
Member

Sacha0 commented Nov 23, 2017

I've been working on a package (https://github.com/Jutho/Strided.jl) to deal with strided data with arbitrary strides (such that permutedims, transpose and ctranspose, views with ranges and special reshapes which don't break strides are all lazy), and where an arbitrary combination of such strided arrays can be combined in an efficient map(!) construct and an optimal loop order and blocking strategy is chosen (at runtime, so some overhead for small arrays), even including multithreading.

@Jutho, this work sounds super exciting! I much look forward to checking this package out! 😃

The most important aspect to take into account is loop order in combination with proper blocking no? Which could be checked by looking at the strides rather than at what particular type of wrapper it is (if the data is not strided than it's probably impossible to decide on an optimal strategy).

Remember that: (1) LinAlg also supports various sparse and structured storage types, various triangular and symmetric/hermitian views into those and other storage types, implicit representations of various linear operators, etc; and (2) that matrix multiplication is only one among many operations that LinAlg supports. Efficient strategies for performing various of these of operations on various combinations of these types exist and are implemented in LinAlg. Additional layers of combinatorial types introduce substantial complexity, even when underlying implementations can be efficiently shared (which unfortunately is not always the case). Best!

@dlfivefifty
Copy link
Contributor

These can each be handled by a trait, IsTriangularStorage , IsBandedStorage , etc. and dispatch on the trait type instead of the matrix type.

This would allow user defined matrices to also benefit from optimal implementation.

@Sacha0
Copy link
Member

Sacha0 commented Nov 25, 2017

These responses implicitly affirm that gracefully handling this approach's additional complexity would require substantial additions to and/or structural overhaul of LinAlg, underscoring rather than mitigating concerns over this additional complexity. That such things are much easier said than done is also worth bearing in mind :). Best!

@andyferris andyferris force-pushed the ajf/vectortranspose2 branch 4 times, most recently from 68acd8e to 3232750 Compare November 26, 2017 07:14
 * RowVector no longer transposes elements
 * MappedArray used to propagate conj, transpose, adjoint, etc
@andyferris
Copy link
Member Author

Since we aren't doing non-recursive transpose, I'll close this PR.

Data users that have been following this, please use the new, friendlier permutedims for non-recursive transpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants