Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee inbounds iteration over Array{T} #27386

Merged
merged 5 commits into from
Jun 27, 2018

Conversation

haampie
Copy link
Contributor

@haampie haampie commented Jun 2, 2018

I think the new iteration protocol really shines in this PR :). Fixes #27384 and replaces #15291.

As an example, extrema(itr) is a generic function that applies to any iterator, and hence it cannot safely use @inbounds. Therefore currently:

julia> @benchmark extrema(v) setup = (v = rand(Int, 1000)) # 0.7.0-alpha.0
  median time:      1.132 μs (0.00% GC)

After this PR however, we don't need bounds checking and we can have SIMD and everything 🎉 :

julia> @benchmark extrema(v) setup = (v = rand(Int, 1000))
  median time:      391.347 ns (0.00% GC)

It preserves the new behaviour of #27079.

@haampie
Copy link
Contributor Author

haampie commented Jun 2, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

@Sacha0
Copy link
Member

Sacha0 commented Jun 2, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

(Nanosoldier listens only to members regrettably :).)

@haampie
Copy link
Contributor Author

haampie commented Jun 2, 2018

Ah, thx, but I think I should have been more patient since the tests are failing. Will fix soon.

@nalimilan nalimilan added performance Must go faster arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol labels Jun 2, 2018
base/array.jl Outdated
@inbounds return (A[1], 1)
end

function iterate(A::Array, i::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that iterate is not necessarily a private interface, so if somebody passed say iterate(A, -1), that could cause trouble. Perhaps a better thing to do would be to teach LLVM more about array lengths such that it can elide the boundscheck itself when the iteration protocol is used correctly.

Copy link
Contributor Author

@haampie haampie Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler, high-level solution exists by taking the state a struct that wraps i, with an inner constructor that sets i = 1. I think the only way to break things then is to reinterpret a negative integer as that state struct, but well... who would do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try using an unsigned type and starting the state at i=0. That way all inputs, other than the one we explicitly check for would be valid.

Copy link
Contributor Author

@haampie haampie Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, maybe a pattern to enforce users not to break the state:

struct State
  i::Int
  State() = new(1)
  State(prev::State) = new(prev.i + 1)
end

State() # State(1)
State(State()) # State(2)
State(-1) # error
first(reinterpret(State, [-1])) # the only far-fetched way to break it I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're currently asserting i <= N && 0 < i <= N, dropping just the duplicate test (and keeping the i > 0 || throw(BoundsError())) seems like it would still be a strict improvement

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@haampie
Copy link
Contributor Author

haampie commented Jun 2, 2018

Seems there's an anti-pattern where iterate(iterator, i) is being used as a substitute for getindex(iterator, i).

The tests that fail are all iterators that wrap other iterators and dictate the initial state of the inner iterator. In my opinion a wrapper like that should call iterate(inner_iterator), obtain preferably an immutable state of the inner iterator, and pass that through in the next iteration.

Example from the tests:

mutable struct CharStr; chars::Vector{Char}; end
Base.iterate(x::CharStr, i::Integer=1) = iterate(x.chars, i)  # CharStr dictates the state of a Vector{Char} iterator.

More obvious example:

function iterate(itr::PartitionIterator{<:Vector}, state=1)
   iterate(itr.c, state) === nothing && return nothing # this is not iterating, but indexing
   l = state
   r = min(state + itr.n-1, length(itr.c))
   return view(itr.c, l:r), r + 1
end

@JeffBezanson
Copy link
Member

In my opinion a wrapper like that should call iterate(inner_iterator)

That is completely true; one should never make assumptions about the state of an arbitrary iterator. But I think that can be relaxed somewhat if the type of the iterator is known, since then it is possible to know about its state.

@andreasnoack
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs = ":master")

@haampie
Copy link
Contributor Author

haampie commented Jun 3, 2018

Example:

julia> @btime mapreduce(abs, +, 1, v) setup = (v = rand(Int, 10000) .% 10)

11.164 μs on master vs 2.340 μs on this branch :)

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@haampie
Copy link
Contributor Author

haampie commented Jun 3, 2018

The pairs iterator also requires the call site to provide @inbounds. On 0.7.0-alpha.0 when summing index times value:

> @benchmark mapreduce(prod, +, 0, pairs(v)) setup = (v = rand(Int, 100))
  median time:      357.021 ns (0.00% GC)

compared to this branch with the zip iterator:

> @benchmark mapreduce(prod, +, 0, zip(LinearIndices(v), v)) setup = (v = rand(Int, 100))
  median time:      94.672 ns (0.00% GC)

which suggests that we should implement pairs in terms of zip -- previously this was avoided because it was slower. Not only is it faster, it is also safer:

> w = rand(10)
> for (k, v) in zip(LinearIndices(w), w)
     k == 2 && resize!(w, 1)
     println(v)
   end
0.5219358747840877
0.8023332880886593

versus

> w = rand(10)
> for (k, v) in pairs(w) # we cannot use inbounds on the call site!
     k == 2 && resize!(w, 1)
     println(v)
   end
0.08481281339520486
0.37336917588944907
ERROR: BoundsError: attempt to access 1-element Array{Float64,1} at index [3]

@Sacha0
Copy link
Member

Sacha0 commented Jun 3, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

Edit: Woops, should have read the logs before cycling Nanosoldier. Thanks Kris!

@KristofferC
Copy link
Member

ERROR: LoadError: std has been moved to the package StatsBase.jl.
Run `Pkg.add("StatsBase")` to install it, restart Julia,
and then run `using StatsBase` to load it.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]

dont think rerunning nanosoldier will help until that is fixed

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@haampie
Copy link
Contributor Author

haampie commented Jun 4, 2018

Hm, one thing I did not consider is that restricting the state type in iterate(::Array, ::SomeStateType) will make iterate([1,2,3], 1) dispatch to iterate(::AbstractArray, ::Any), rather than throwing a MethodError.

I only found out about this because in #27415 there's a failing doctest 😕. The issue is:

collect(Iterators.rest([1,2,3,4,5,6], 2)) # [2]

which is a result of Iterators.rest assuming it can set the state of its inner iterator:

iterate(i::Rest, st=i.st) = iterate(i.itr, st) # :(

So, the only reasonable fix I can think of is to restrict the state type in iterate(::AbstractArray, ::AbstractArrayIteratorState).

Another thing is making the Rest iterator work (efficiently). Obviously the generic fallback should be to iterate just i.st times, and then yield values; but maybe one can wrap random access iterators in a view.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2018

So, the only reasonable fix I can think of is to restrict the state type in iterate(::AbstractArray, ::AbstractArrayIteratorState).

Alternatively, I think I'd just define:

function iterate(A::Array, s)
    s isa ArrayIteratorState || throw(ArgumentError("…"))

We could even make that a bit more graceful with an inner function and a deprecation.

@vtjnash
Copy link
Member

vtjnash commented Jun 4, 2018

I seriously don't think we should make an iterator state type. As Keno mentioned, we can just make this an unsigned type and then we get the full range check for free (this is also how we emit bounds-checks for array):

(i % UInt) - 1 > length(a) ? nothing : @inbounds a[i]

@mbauman
Copy link
Member

mbauman commented Jun 4, 2018

My major hesitation with UInt is this: Julia 0.6 allows:

julia> next([1,2,3], 0x1)
(1, 2)

Yes, it's technically an abuse of the iteration protocol to do something like this, but it's an abuse that's fairly widespread. This would incur surprising off-by-one effects, whereas the State object could throw errors or depwarns.

But I think that can be relaxed somewhat if the type of the iterator is known, since then it is possible to know about its state.

@vtjnash
Copy link
Member

vtjnash commented Jun 4, 2018

This would incur surprising off-by-one effects

I think I handled this in the right direction, but I didn't test my code snippet. If it's not working quite right, that should be a simple matter of fixing the offset of the bounds-check. Note closely that my code is using a modulus trick of unsigned number overflow that's fairly common in embedded code, not just a trivial comparison.

@haampie
Copy link
Contributor Author

haampie commented Jun 4, 2018

To add to the comment of @mbauman, some bugs have gone unnoticed up to now which would probably not have been around if there was more type safety:

1. The Rest iterator is flawed as it assumes the inner iterator has an integer states (some form of indexing basically);
2. https://github.com/JuliaLang/julia/blob/master/base/iterators.jl#L373 where the state of the one iterator is passed to the other iterator.

I'd choose type safety over one-liners then. Also, it more or less enforces one to stop using iterators as indexing functions.

@Keno
Copy link
Member

Keno commented Jun 4, 2018

The Rest iterator is obviously flawed as it assumes the inner iterator has an integer states (some form of indexing basically);

On this point, I had proposed at some point a rest-like iterator that takes an index (the difference between this and a subarray being that pairs(rest(a, i)) would iterate parent indices).

@haampie
Copy link
Contributor Author

haampie commented Jun 4, 2018

Maybe I'm missing something trivial in the conversion trick, but assuming the first state is UInt(1), how do we prevent overflow of UInt to 0 when indexing?

I get the trick 😅

@mbauman
Copy link
Member

mbauman commented Jun 4, 2018

Something like this:

julia> @eval Base function iterate(A::Array, i=1)
           (i % UInt) - 1 < length(A) ? (@inbounds A[i], i+1) : nothing
       end

julia> iterate([1,2,3], 0)

julia> iterate([1,2,3], 1)
(1, 2)

julia> iterate([1,2,3], 2)
(2, 3)

julia> iterate([1,2,3], 3)
(3, 4)

julia> iterate([1,2,3], 4)

@JeffBezanson
Copy link
Member

I would also prefer not to add an ArrayIteratorState. It has to be ok to use a simple integer as a state.

I don't see where the Rest iterator assumes the state is an integer?

@haampie haampie force-pushed the fix-inbounds-iteration-over-array branch from ae53af8 to e4dd4f9 Compare June 5, 2018 06:27
@KristofferC
Copy link
Member

Rerunning benchmarks after inline change

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@mbauman
Copy link
Member

mbauman commented Jun 25, 2018

Holy cow, that's amazing.

Just a few regressions that are far outweighed by the sea of ✅ . I feel bad pulling these out because all these red ❌s look horrible by themselves. This post needs more ✅. ✅✅✅. Or just click the report. ✅✅✅. 🎉

ID time ratio memory ratio
["collection", "queries & updates", "(\"Dict\", \"Any\", \"in\", \"false\")"] 3.78 (25%) ❌ Inf (1%) ❌
["collection", "queries & updates", "(\"Dict\", \"Any\", \"push!\", \"new\")"] 2.88 (25%) ❌ 2.50 (1%) ❌
["collection", "queries & updates", "(\"Dict\", \"Any\", \"setindex!\", \"new\")"] 2.79 (25%) ❌ 2.50 (1%) ❌
["collection", "queries & updates", "(\"Set\", \"Any\", \"in\", \"false\")"] 4.58 (25%) ❌ Inf (1%) ❌
["collection", "queries & updates", "(\"Set\", \"Any\", \"push!\", \"new\")"] 3.61 (25%) ❌ 4.00 (1%) ❌
["collection", "queries & updates", "(\"Vector\", \"Any\", \"in\", \"true\")"] 1.75 (25%) ❌ 1.93 (1%) ❌
["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Set\")"] 1.13 (25%) 1.44 (1%) ❌
["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Set\", \"Set\")"] 1.05 (25%) 1.41 (1%) ❌
["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Vector\")"] 1.08 (25%) 1.32 (1%) ❌
["sparse", "constructors", "(\"IJV\", 100)"] 1.50 (15%) ❌ 2.29 (1%) ❌

I'll look into this; my guess is that we just need a few extra @inlines sprinkled around.

@KristofferC
Copy link
Member

KristofferC commented Jun 25, 2018

Hm, those regressions weren't there before the force inline to iterate, https://github.com/JuliaCI/BaseBenchmarkReports/blob/5cff370bc9f8ab2b614417d5cc9ad7781441d339/bb5b625_vs_948b088/report.md which was made to fix #27386 (comment).

Although the force inline also brought some extra improvements.

@mbauman
Copy link
Member

mbauman commented Jun 25, 2018

Yup, exactly, which is why I'm thinking we're now pushing a few functions that call iterate over their inlining threshold.

@Sacha0
Copy link
Member

Sacha0 commented Jun 25, 2018

(Amazing work as always @haampie! ❤️)

@nalimilan
Copy link
Member

Has anybody been able to reproduce the skipmissing improvements locally? It's really weird that I don't see them...

@KristofferC
Copy link
Member

KristofferC commented Jun 26, 2018

Alpha vs this branch:

julia> A = [rand() > 0.9 ? missing : rand(Int) for i in 1:10000];

julia> f(A) = sum(skipmissing(A))
f (generic function with 1 method)

# alpha
julia> @btime f(A)
  42.513 μs (5 allocations: 80 bytes)
-678814004979352337

# branch
julia> @btime f(A)
  22.888 μs (2 allocations: 32 bytes)
-166018160176763731

@nalimilan
Copy link
Member

Thanks, but alpha is too old, it lacks recent compiler improvements. :-/

@KristofferC
Copy link
Member

On beta they are indeed the same. Odd.

@nalimilan
Copy link
Member

OK, that's intriguing... See also a similar phenomenon at #27743 (comment).

@mbauman
Copy link
Member

mbauman commented Jun 26, 2018

I cannot reproduce skipmissing either. This did hit JuliaCI/Nanosoldier.jl#48, so the PR got the additional advantage of 8eba7c4, and d7bf89f, but I don't see how that would affect skipmissing.

On the regression front, the Dict{Any}/Set{Any} regressions are spurious. The issue is here:

https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L299

This line non-deterministically allocates depending upon the hash structure of the dictionary. If we never have a spurious hash collision, then we need to access the Array{Any} — an unstable operation that allocates. Except that Nanosoldier tries to keep its randomization the same between runs… ooooh, but that gets foiled by JuliaCI/Nanosoldier.jl#48: 8eba7c4 changes it.

Oh, and now I understand how the skipmissing test also changed. The random data were different between runs. That took far too long.

Alright, let's try this again:

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nalimilan
Copy link
Member

Looks like Nanosoldier doesn't want to start... :-/

@ararslan
Copy link
Member

Restarted the server. @nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@mbauman
Copy link
Member

mbauman commented Jun 27, 2018

Ok, that's slightly better. Looks like Nanosoldier did the correct comparison this time. We still have a handful of big regressions flagged (16-17x !?), but I cannot reproduce them locally.

@KristofferC
Copy link
Member

@nanosoldier runbenchmarks("collection", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC
Copy link
Member

Good to go then?

@mbauman mbauman merged commit c404bb7 into JuliaLang:master Jun 27, 2018
@nalimilan
Copy link
Member

It's so nice to see SIMD instructions with functions like this!

function mysum(X)
    s = zero(eltype(X))
    for x in X
        s += x
    end
    s
end

Do you think this solution could be used with any AbstractArray, or do we need a more general fix?

@haampie haampie deleted the fix-inbounds-iteration-over-array branch June 28, 2018 10:53
@haampie
Copy link
Contributor Author

haampie commented Jun 28, 2018

It's not easy, see @mbauman's comment here #27384 (comment)

jrevels pushed a commit that referenced this pull request Jul 2, 2018
* Guarantee inbounds iteration over Array{T}

* Replace indexing with an iterator with actual indexing

* Make the CharStr iterator unaware of the initial state of a Vector{Char} iterator

* Fix tests

* Force inlining of ::Array iterate function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.