Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add the iteratenz API #167

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

add the iteratenz API #167

wants to merge 3 commits into from

Conversation

SobhanMP
Copy link
Member

@SobhanMP SobhanMP commented Jun 26, 2022

I couldn't find anything like this so i thought i'd make a small implementation. the idea is that this makes iteration on sparsearrays much easier, no need to deal with colptr & friends. It's still pretty fast. But this begs the question, has this been in the api and did i miss it?

This should make adding special matrice operations easy.

julia> using BenchmarkTools, SparseArrays, LinearAlgebra

julia> A = I + sprandn(10_000, 10_000, 0.001);

julia> function f(A)
           c = zero(eltype(A))
           for (i, j, v) in SparseArrays.iternz(A)
               if (i + j) % 2 == 0
                   c += v 
               end
           end
           return c
       end
f (generic function with 1 method)

julia> function g(A)
           c = zero(eltype(A))
           for (i, j, v) in zip(findnz(A)...)
               if (i + j) % 2 == 0
                   c += v 
               end
           end
           return c
       end
g (generic function with 1 method)

julia> function h(A)
           c = zero(eltype(A))
           @inbounds for j in axes(A, 2),
               ind in A.colptr[j]: A.colptr[j + 1] - 1
               i = A.rowval[ind]
               if (i + j) % 2 == 0
                   c += A.nzval[ind]
               end
           end
       
           c
       end
h (generic function with 1 method)

julia> function k(A)
           c = zero(eltype(A))
           cptr = A.colptr
           rv = A.rowval
           nzv = A.nzval
           @inbounds for j in axes(A, 2),
               ind in cptr[j]: cptr[j + 1] - 1
               i = rv[ind]
               if (i + j) % 2 == 0
                   c += nzv[ind]
               end
           end
       
           c
       end 
k (generic function with 1 method)

julia> function l(A)
           c = zero(eltype(A))
           @inbounds for j in axes(A, 2),
               ind in SparseArrays.getcolptr(A)[j]:SparseArrays.getcolptr(A)[j + 1] - 1
               i = SparseArrays.getrowval(A)[ind]
               if (i + j) % 2 == 0
                   c += SparseArrays.getnzval(A)[ind]
               end
           end
           c
       end 
l (generic function with 1 method)

julia> @assert f(A) == g(A) == h(A) == k(A) == l(A)

julia> @benchmark f($A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  149.655 μs  272.489 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     156.244 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   162.705 μs ±  18.228 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▃▁▁▃▄▂▂▂▂▇▃▂▁▁▁▃▁▁▁▁▁                           ▁▄          ▂
  █████████████████████████▇▇▇██▆▅▆▆▆▆▆▅▆▆▆▅▆▅▅▄▅▅▆▆██▇▄▅▆▅▆█▆▆ █
  150 μs        Histogram: log(frequency) by time        227 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g($A)
BenchmarkTools.Trial: 9411 samples with 1 evaluation.
 Range (min  max):  301.049 μs    4.224 ms  ┊ GC (min  max): 0.00%  55.37%
 Time  (median):     393.407 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   527.771 μs ± 346.183 μs  ┊ GC (mean ± σ):  8.61% ± 12.64%

  █▆▅▅▅▄▄▄▄▃▂▃▂▂▂▂▂▁▁▁▁▁▁▁      ▁▁   ▁▂                         ▂
  ██████████████████████████▇██████▇█████████▇▆▇▆▅▆▅▆▆▆▅▅▄▄▄▅▅▄ █
  301 μs        Histogram: log(frequency) by time        1.8 ms <

 Memory estimate: 2.52 MiB, allocs estimate: 6.

julia> @benchmark h($A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  370.690 μs  559.212 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     389.377 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   397.357 μs ±  21.079 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▃▇█▆▅▄▃▁                                                  
  ▁▁▂▃▇████████▇▅▄▃▃▃▄▄▅▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▂▂▂▁▁▁▁▁▁▁▁▁ ▃
  371 μs           Histogram: frequency by time          472 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark k($A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  380.104 μs  556.205 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     394.372 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   401.398 μs ±  17.597 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▃▆█▇▃                                                    
  ▁▁▂▃▄▇█████▇▃▃▂▂▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  380 μs           Histogram: frequency by time          470 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark l($A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  377.320 μs  619.899 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     388.873 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   395.194 μs ±  16.974 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▄█▇▄▁                                                     
  ▁▁▂▅██████▆▄▃▃▂▂▂▂▃▂▂▂▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  377 μs           Histogram: frequency by time          463 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f($A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  145.030 μs  290.142 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     156.375 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   163.418 μs ±  18.701 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▇█▆▃▁▄▄▃▃▃▂▇▃▃▂▁▃▃▂▂▁▂▁▁▂▁▁▁   ▁                 ▄▃        ▂
  ▃▃▁█████████████████████████████████████▇█▇█▇▇█▇▆▇▆▇██▇▆▆▇▇██ █
  145 μs        Histogram: log(frequency) by time        225 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

tho my problem that it makes no sense that it's faster than manual iteration :/

@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2022

Codecov Report

Attention: Patch coverage is 96.29630% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.14%. Comparing base (f51b64c) to head (2e16fcc).
Report is 214 commits behind head on main.

Files with missing lines Patch % Lines
src/sparsematrix.jl 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
+ Coverage   92.11%   92.14%   +0.02%     
==========================================
  Files          11       11              
  Lines        7203     7230      +27     
==========================================
+ Hits         6635     6662      +27     
  Misses        568      568              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rayegun
Copy link
Member

rayegun commented Jun 26, 2022

This is great, I have a prototype version based on ArrayIteration.jl, but having a version with no dependencies is great and very valuable.

@SobhanMP
Copy link
Member Author

SobhanMP commented Jun 28, 2022

@ViralBShah what do you think about implementing the same for all special matrices (symmetric, transpose, diagonal, etc)? this could give rise to fast generic implementations for many of the functions like dot. if you think it's a good idea, where should i submit a pull request?

@rayegun
Copy link
Member

rayegun commented Jun 28, 2022

That should almost certainly be experimented with in a separate package first.

@ViralBShah
Copy link
Member

Agree. I would put that in a separate package.

@SobhanMP
Copy link
Member Author

i made https://github.com/SobhanMP/SparseExtra.jl for now, guess i'll add a bunch of other stuff while i'm at it...

@ViralBShah
Copy link
Member

ViralBShah commented Jul 12, 2022

I would love to understand why this is faster? Will we use this iterator across other parts of SparseArrays?

Maybe we revisit once your SparseExtra.jl package has more stuff in it and we can then decide if it makes sense to bring it in here, or register separately.

@SobhanMP
Copy link
Member Author

SobhanMP commented Jul 12, 2022

note that it's not faster than the last function (the pattern used almost everywhere in Sparsearrays). accessing via the getcolptr is faster than naming a variable (i don't know why). it could be used to simplify a lot of code. for instance

const S_or_SS{Tv, Ti} = Union{
    AbstractSparseMatrixCSC{Tv, Ti},
    [i{AbstractSparseMatrixCSC{Tv, Ti}} 
        for i in [Symmetric, LowerTriangular, UpperTriangular, Transpose, Adjoint, Hermitian]]...}
function dot(x::AbstractVector, 
    A::S_or_SS{Tv},
    y) where Tv  
    acc = zero(promote_type(eltype(x), eltype(A), eltype(y)))
    for (v, i, j) in iternz(A)
        acc += x[i] * v * y[j]
    end
end

would be a correct implementation for 1 indexed x and y and all sort of special matrix. It could also be used for the dense version but the dense twice as slow.

admittedly, my implementation is missing a lot of the functions for this to work right now, but that's the idea.

@ViralBShah ViralBShah marked this pull request as draft July 13, 2022 13:25
@SobhanMP
Copy link
Member Author

@ViralBShah i've updated the readme of my repo, feel free to check it out

@SobhanMP
Copy link
Member Author

@ViralBShah should this stay in SparseExtra.jl (and close this) or should i make this a pull request (in contrast to a draft)?

this would make closing #23, #83, and possibly other dot implementation possible. Think transpose of diagonal matrix of sparse vector. It would also simplify every function that has a fast loop over sparse arrays.

would maybe close #50
ref https://sobhanmp.github.io/SparseExtra.jl/iternz/

@ViralBShah
Copy link
Member

ViralBShah commented Aug 10, 2022

If the design is ready, let's bring it in. I will look at the SparseExtra repo soon.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 10, 2022

@Wimmerer Can you review too? If good enough to bring here, we can do finer reviews in a PR (if necessary).

@ViralBShah
Copy link
Member

ViralBShah commented Aug 10, 2022

I am persuaded enough by the real nice documentation. Let's open the PR. 1.9 feature freeze may be soon, so let's get it in. Let's also get the design doc in. The real moment of truth will be when we start using it throughout the library.

@fredrikekre @KristofferC @dkarrasch Any comments on the API in https://sobhanmp.github.io/SparseExtra.jl/iternz/, which we are discussing bringing into SparseArrays.jl?

@rayegun
Copy link
Member

rayegun commented Aug 10, 2022

I like it. A lite version of ArrayIteration.jl, should simplify the code base a lot.

Thoughts on exporting? I lean towards no for now.

@SobhanMP
Copy link
Member Author

in term of API it's done (at least the outward facing part) but there are three things worth considering.

  • skip_col/skip_row_to skip to the next element - 1 so that calling next on their returned state gives the desired element.
  • how about some infrastructure for matrices that are not column major
  • does it need to support arrays that don't start at one? (for algebraic ones)

@ViralBShah
Copy link
Member

We can keep it internal for a whole release cycle and export it later. My thought is to get something in that is small and addresses the current data structure, use it widely in the package, and then expand it to be more general.

@SobhanMP
Copy link
Member Author

@fredrikekre @KristofferC @dkarrasch Any comments on the API in https://sobhanmp.github.io/SparseExtra.jl/iternz/, which we are discussing bringing into SparseArrays.jl?

@SobhanMP
Copy link
Member Author

SobhanMP commented Sep 19, 2022

@ViralBShah
Copy link
Member

ViralBShah commented Sep 20, 2022

In the link, the API just shows, which feels nice enough to bring in.

all(iternz(x)) do (v, k...)
    x[k...] == v
end

I hadn't seen the rest of the API here https://sobhanmp.github.io/SparseExtra.jl/. I'm not sure if it makes sense to bring all of it into SparseArrays.jl. What's the thinking behind bringing all of it into SparseArrays.jl?

@SobhanMP
Copy link
Member Author

I hadn't seen the rest of the API here

i don't think it makes sense to bring that in SparseArrays. the parallel \ is nice but i'm not sure there is any merit in bringing it here.

@dkarrasch
Copy link
Member

So I finally took a look at SparseExtra.jl. It's very nice that it composes with outer wrappers! What seems to be currently missing is that for Symmetric and Hermitian, the elements of the viewed triangle are transposed and adjoint, respectively. That doesn't show up in cases with real eltypes, but it will show up, for instance, in 3-arg dot with a (complex) Hermitian matrix. Moreover, when you read diagonal elements, you need to take real part in the Hermitian case, and, more generally, call symmetric or hermitian on the underlying element; see

https://github.com/JuliaLang/julia/blob/690a5f67c13fd23c7b48e60c31bfa565c0eee861/stdlib/LinearAlgebra/src/symmetric.jl#L236-L255

Unfortunately, I wasn't able to follow all the details to propose how to fix this. I believe that these are the only two wrappers that have non-trivial manipulations, so when we get those right, this could be ported here and measured for performance.

Also, x-ref JuliaLang/julia#40103.

@SobhanMP
Copy link
Member Author

SobhanMP commented Mar 3, 2024

@dkarrasch The links were very helpful. I think that this works as intended now.

julia> A = Symmetric([randn(20, 20) for _ in 1:20, _ in 1:20]);
julia> sum(sum(A[i, j] - v for (v, i, j) in iternz(A))) == 0
true
julia> A = Hermitian([randn(20, 20) + randn(20, 20) * 1im for _ in 1:20, _ in 1:20]);
julia> sum(sum(A[i, j] - v for (v, i, j) in iternz(A))) == 0
true

Are there still interesting in this? Should I update the pull request? Do we want more tests?

@ViralBShah
Copy link
Member

I thought the main issue here was the significant performance gap. I would be supportive of bringing the iterator in - it is a nice API, but perhaps we can't migrate the internals to it until it is on par or faster. However, it can still be made available to users (with an appropriate note).

@rayegun @dkarrasch @jishnub @fredrikekre Thoughts?

@SobhanMP
Copy link
Member Author

SobhanMP commented Mar 7, 2024

I thought the main issue here was the significant performance gap.

I don't think that this is an issue look at the plots in
https://sobhanmp.github.io/SparseExtra.jl/iternz/

The loss of performance is fairly small. IMHO, even if we do not replace existing code, we can at least make fast fallbacks for all methods with this.

@ViralBShah
Copy link
Member

ViralBShah commented Mar 7, 2024

I am onboard with introducing it but would be good at least to hear from @rayegun

@SobhanMP
Copy link
Member Author

@rayegun (ping)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants