Guarantee inbounds iteration over Array{T} #27386

haampie · 2018-06-02T16:32:18Z

I think the new iteration protocol really shines in this PR :). Fixes #27384 and replaces #15291.

As an example, extrema(itr) is a generic function that applies to any iterator, and hence it cannot safely use @inbounds. Therefore currently:

julia> @benchmark extrema(v) setup = (v = rand(Int, 1000)) # 0.7.0-alpha.0
  median time:      1.132 μs (0.00% GC)

After this PR however, we don't need bounds checking and we can have SIMD and everything 🎉 :

julia> @benchmark extrema(v) setup = (v = rand(Int, 1000))
  median time:      391.347 ns (0.00% GC)

It preserves the new behaviour of #27079.

haampie · 2018-06-02T16:57:51Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

Sacha0 · 2018-06-02T17:30:24Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

(Nanosoldier listens only to members regrettably :).)

haampie · 2018-06-02T18:04:31Z

Ah, thx, but I think I should have been more patient since the tests are failing. Will fix soon.

Keno · 2018-06-02T19:14:57Z

base/array.jl

+    @inbounds return (A[1], 1)
+end
+
+function iterate(A::Array, i::Int)


The problem with this is that iterate is not necessarily a private interface, so if somebody passed say iterate(A, -1), that could cause trouble. Perhaps a better thing to do would be to teach LLVM more about array lengths such that it can elide the boundscheck itself when the iteration protocol is used correctly.

I think a simpler, high-level solution exists by taking the state a struct that wraps i, with an inner constructor that sets i = 1. I think the only way to break things then is to reinterpret a negative integer as that state struct, but well... who would do that.

We could try using an unsigned type and starting the state at i=0. That way all inputs, other than the one we explicitly check for would be valid.

For the record, maybe a pattern to enforce users not to break the state:

struct State i::Int State() = new(1) State(prev::State) = new(prev.i + 1) end State() # State(1) State(State()) # State(2) State(-1) # error first(reinterpret(State, [-1])) # the only far-fetched way to break it I think.

Since we're currently asserting i <= N && 0 < i <= N, dropping just the duplicate test (and keeping the i > 0 || throw(BoundsError())) seems like it would still be a strict improvement

nanosoldier · 2018-06-02T19:31:10Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

haampie · 2018-06-02T22:00:30Z

Seems there's an anti-pattern where iterate(iterator, i) is being used as a substitute for getindex(iterator, i).

The tests that fail are all iterators that wrap other iterators and dictate the initial state of the inner iterator. In my opinion a wrapper like that should call iterate(inner_iterator), obtain preferably an immutable state of the inner iterator, and pass that through in the next iteration.

Example from the tests:

mutable struct CharStr; chars::Vector{Char}; end
Base.iterate(x::CharStr, i::Integer=1) = iterate(x.chars, i)  # CharStr dictates the state of a Vector{Char} iterator.

More obvious example:

function iterate(itr::PartitionIterator{<:Vector}, state=1)
   iterate(itr.c, state) === nothing && return nothing # this is not iterating, but indexing
   l = state
   r = min(state + itr.n-1, length(itr.c))
   return view(itr.c, l:r), r + 1
end

JeffBezanson · 2018-06-03T00:29:45Z

In my opinion a wrapper like that should call iterate(inner_iterator)

That is completely true; one should never make assumptions about the state of an arbitrary iterator. But I think that can be relaxed somewhat if the type of the iterator is known, since then it is possible to know about its state.

andreasnoack · 2018-06-03T06:34:36Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

haampie · 2018-06-03T07:54:04Z

Example:

julia> @btime mapreduce(abs, +, 1, v) setup = (v = rand(Int, 10000) .% 10)

11.164 μs on master vs 2.340 μs on this branch :)

nanosoldier · 2018-06-03T08:35:11Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

haampie · 2018-06-03T11:23:17Z

The pairs iterator also requires the call site to provide @inbounds. On 0.7.0-alpha.0 when summing index times value:

> @benchmark mapreduce(prod, +, 0, pairs(v)) setup = (v = rand(Int, 100))
  median time:      357.021 ns (0.00% GC)

compared to this branch with the zip iterator:

> @benchmark mapreduce(prod, +, 0, zip(LinearIndices(v), v)) setup = (v = rand(Int, 100))
  median time:      94.672 ns (0.00% GC)

which suggests that we should implement pairs in terms of zip -- previously this was avoided because it was slower. Not only is it faster, it is also safer:

> w = rand(10)
> for (k, v) in zip(LinearIndices(w), w)
     k == 2 && resize!(w, 1)
     println(v)
   end
0.5219358747840877
0.8023332880886593

versus

> w = rand(10)
> for (k, v) in pairs(w) # we cannot use inbounds on the call site!
     k == 2 && resize!(w, 1)
     println(v)
   end
0.08481281339520486
0.37336917588944907
ERROR: BoundsError: attempt to access 1-element Array{Float64,1} at index [3]

Sacha0 · 2018-06-03T16:53:49Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

Edit: Woops, should have read the logs before cycling Nanosoldier. Thanks Kris!

KristofferC · 2018-06-03T17:02:37Z

ERROR: LoadError: std has been moved to the package StatsBase.jl.
Run `Pkg.add("StatsBase")` to install it, restart Julia,
and then run `using StatsBase` to load it.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]

dont think rerunning nanosoldier will help until that is fixed

nanosoldier · 2018-06-03T18:54:27Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

haampie · 2018-06-04T18:42:42Z

Hm, one thing I did not consider is that restricting the state type in iterate(::Array, ::SomeStateType) will make iterate([1,2,3], 1) dispatch to iterate(::AbstractArray, ::Any), rather than throwing a MethodError.

I only found out about this because in #27415 there's a failing doctest 😕. The issue is:

collect(Iterators.rest([1,2,3,4,5,6], 2)) # [2]

which is a result of Iterators.rest assuming it can set the state of its inner iterator:

iterate(i::Rest, st=i.st) = iterate(i.itr, st) # :(

So, the only reasonable fix I can think of is to restrict the state type in iterate(::AbstractArray, ::AbstractArrayIteratorState).

Another thing is making the Rest iterator work (efficiently). Obviously the generic fallback should be to iterate just i.st times, and then yield values; but maybe one can wrap random access iterators in a view.

mbauman · 2018-06-04T18:48:24Z

So, the only reasonable fix I can think of is to restrict the state type in iterate(::AbstractArray, ::AbstractArrayIteratorState).

Alternatively, I think I'd just define:

function iterate(A::Array, s)
    s isa ArrayIteratorState || throw(ArgumentError("…"))

We could even make that a bit more graceful with an inner function and a deprecation.

vtjnash · 2018-06-04T19:07:10Z

I seriously don't think we should make an iterator state type. As Keno mentioned, we can just make this an unsigned type and then we get the full range check for free (this is also how we emit bounds-checks for array):

(i % UInt) - 1 > length(a) ? nothing : @inbounds a[i]

mbauman · 2018-06-04T19:12:56Z

My major hesitation with UInt is this: Julia 0.6 allows:

julia> next([1,2,3], 0x1)
(1, 2)

Yes, it's technically an abuse of the iteration protocol to do something like this, but it's an abuse that's fairly widespread. This would incur surprising off-by-one effects, whereas the State object could throw errors or depwarns.

But I think that can be relaxed somewhat if the type of the iterator is known, since then it is possible to know about its state.

vtjnash · 2018-06-04T19:25:11Z

This would incur surprising off-by-one effects

I think I handled this in the right direction, but I didn't test my code snippet. If it's not working quite right, that should be a simple matter of fixing the offset of the bounds-check. Note closely that my code is using a modulus trick of unsigned number overflow that's fairly common in embedded code, not just a trivial comparison.

haampie · 2018-06-04T19:25:11Z

To add to the comment of @mbauman, some bugs have gone unnoticed up to now which would probably not have been around if there was more type safety:

~~1. The Rest iterator is flawed as it assumes the inner iterator has an integer states (some form of indexing basically);~~
2. https://github.com/JuliaLang/julia/blob/master/base/iterators.jl#L373 where the state of the one iterator is passed to the other iterator.

I'd choose type safety over one-liners then. Also, it more or less enforces one to stop using iterators as indexing functions.

Keno · 2018-06-04T19:27:05Z

The Rest iterator is obviously flawed as it assumes the inner iterator has an integer states (some form of indexing basically);

On this point, I had proposed at some point a rest-like iterator that takes an index (the difference between this and a subarray being that pairs(rest(a, i)) would iterate parent indices).

haampie · 2018-06-04T20:23:13Z

~~Maybe I'm missing something trivial in the conversion trick, but assuming the first state is UInt(1), how do we prevent overflow of UInt to 0 when indexing?~~

I get the trick 😅

mbauman · 2018-06-04T20:34:46Z

Something like this:

julia> @eval Base function iterate(A::Array, i=1)
           (i % UInt) - 1 < length(A) ? (@inbounds A[i], i+1) : nothing
       end

julia> iterate([1,2,3], 0)

julia> iterate([1,2,3], 1)
(1, 2)

julia> iterate([1,2,3], 2)
(2, 3)

julia> iterate([1,2,3], 3)
(3, 4)

julia> iterate([1,2,3], 4)

JeffBezanson · 2018-06-05T02:37:25Z

I would also prefer not to add an ArrayIteratorState. It has to be ok to use a simple integer as a state.

I don't see where the Rest iterator assumes the state is an integer?

KristofferC · 2018-06-25T09:05:30Z

Rerunning benchmarks after inline change

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-06-25T20:10:19Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

mbauman · 2018-06-25T20:16:51Z

Holy cow, that's amazing.

Just a few regressions that are far outweighed by the sea of ✅ . I feel bad pulling these out because all these red ❌s look horrible by themselves. This post needs more ✅. ✅✅✅. Or just click the report. ✅✅✅. 🎉

ID	time ratio	memory ratio
`["collection", "queries & updates", "(\"Dict\", \"Any\", \"in\", \"false\")"]`	3.78 (25%) ❌	Inf (1%) ❌
`["collection", "queries & updates", "(\"Dict\", \"Any\", \"push!\", \"new\")"]`	2.88 (25%) ❌	2.50 (1%) ❌
`["collection", "queries & updates", "(\"Dict\", \"Any\", \"setindex!\", \"new\")"]`	2.79 (25%) ❌	2.50 (1%) ❌
`["collection", "queries & updates", "(\"Set\", \"Any\", \"in\", \"false\")"]`	4.58 (25%) ❌	Inf (1%) ❌
`["collection", "queries & updates", "(\"Set\", \"Any\", \"push!\", \"new\")"]`	3.61 (25%) ❌	4.00 (1%) ❌
`["collection", "queries & updates", "(\"Vector\", \"Any\", \"in\", \"true\")"]`	1.75 (25%) ❌	1.93 (1%) ❌
`["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Set\")"]`	1.13 (25%)	1.44 (1%) ❌
`["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Set\", \"Set\")"]`	1.05 (25%)	1.41 (1%) ❌
`["collection", "set operations", "(\"BitSet\", \"Int\", \"intersect\", \"Vector\")"]`	1.08 (25%)	1.32 (1%) ❌
`["sparse", "constructors", "(\"IJV\", 100)"]`	1.50 (15%) ❌	2.29 (1%) ❌

I'll look into this; my guess is that we just need a few extra @inlines sprinkled around.

KristofferC · 2018-06-25T20:28:28Z

Hm, those regressions weren't there before the force inline to iterate, https://github.com/JuliaCI/BaseBenchmarkReports/blob/5cff370bc9f8ab2b614417d5cc9ad7781441d339/bb5b625_vs_948b088/report.md which was made to fix #27386 (comment).

Although the force inline also brought some extra improvements.

mbauman · 2018-06-25T20:30:45Z

Yup, exactly, which is why I'm thinking we're now pushing a few functions that call iterate over their inlining threshold.

Sacha0 · 2018-06-25T21:35:56Z

(Amazing work as always @haampie! ❤️)

nalimilan · 2018-06-26T08:02:02Z

Has anybody been able to reproduce the skipmissing improvements locally? It's really weird that I don't see them...

KristofferC · 2018-06-26T08:57:47Z

Alpha vs this branch:

julia> A = [rand() > 0.9 ? missing : rand(Int) for i in 1:10000];

julia> f(A) = sum(skipmissing(A))
f (generic function with 1 method)

# alpha
julia> @btime f(A)
  42.513 μs (5 allocations: 80 bytes)
-678814004979352337

# branch
julia> @btime f(A)
  22.888 μs (2 allocations: 32 bytes)
-166018160176763731

nalimilan · 2018-06-26T08:59:09Z

Thanks, but alpha is too old, it lacks recent compiler improvements. :-/

KristofferC · 2018-06-26T09:04:47Z

On beta they are indeed the same. Odd.

nalimilan · 2018-06-26T11:43:38Z

OK, that's intriguing... See also a similar phenomenon at #27743 (comment).

mbauman · 2018-06-26T15:24:37Z

I cannot reproduce skipmissing either. This did hit JuliaCI/Nanosoldier.jl#48, so the PR got the additional advantage of 8eba7c4, and d7bf89f, but I don't see how that would affect skipmissing.

On the regression front, the Dict{Any}/Set{Any} regressions are spurious. The issue is here:

https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L299

This line non-deterministically allocates depending upon the hash structure of the dictionary. If we never have a spurious hash collision, then we need to access the Array{Any} — an unstable operation that allocates. Except that Nanosoldier tries to keep its randomization the same between runs… ooooh, but that gets foiled by JuliaCI/Nanosoldier.jl#48: 8eba7c4 changes it.

Oh, and now I understand how the skipmissing test also changed. The random data were different between runs. That took far too long.

Alright, let's try this again:

@nanosoldier runbenchmarks(ALL, vs = ":master")

nalimilan · 2018-06-26T16:13:05Z

Looks like Nanosoldier doesn't want to start... :-/

ararslan · 2018-06-26T17:16:27Z

Restarted the server. @nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-06-26T22:45:44Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

mbauman · 2018-06-27T20:30:31Z

Ok, that's slightly better. Looks like Nanosoldier did the correct comparison this time. We still have a handful of big regressions flagged (16-17x !?), but I cannot reproduce them locally.

KristofferC · 2018-06-27T20:40:43Z

@nanosoldier runbenchmarks("collection", vs=":master")

nanosoldier · 2018-06-27T22:08:02Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

KristofferC · 2018-06-27T22:21:53Z

Good to go then?

nalimilan · 2018-06-28T10:33:35Z

It's so nice to see SIMD instructions with functions like this!

function mysum(X)
    s = zero(eltype(X))
    for x in X
        s += x
    end
    s
end

Do you think this solution could be used with any AbstractArray, or do we need a more general fix?

haampie · 2018-06-28T11:29:17Z

It's not easy, see @mbauman's comment here #27384 (comment)

* Guarantee inbounds iteration over Array{T} * Replace indexing with an iterator with actual indexing * Make the CharStr iterator unaware of the initial state of a Vector{Char} iterator * Fix tests * Force inlining of ::Array iterate function

nalimilan added performance Must go faster arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol labels Jun 2, 2018

Keno reviewed Jun 2, 2018

View reviewed changes

haampie mentioned this pull request Jun 2, 2018

Don't ask the call site to provide @inbounds when iterating over an array #27384

Closed

haampie mentioned this pull request Jun 4, 2018

Simplify zip iterator and improve performance in certain cases #27415

Closed

haampie force-pushed the fix-inbounds-iteration-over-array branch from ae53af8 to e4dd4f9 Compare June 5, 2018 06:27

Force inlining of ::Array iterate function

86cca96

nalimilan mentioned this pull request Jun 26, 2018

Add optimized mapreduce implementation for SkipMissing #27743

Merged

mbauman merged commit c404bb7 into JuliaLang:master Jun 27, 2018

nalimilan mentioned this pull request Jun 28, 2018

Add @inbounds annotations to any(), all() and count() #21402

Closed

haampie deleted the fix-inbounds-iteration-over-array branch June 28, 2018 10:53

haampie mentioned this pull request Aug 23, 2018

Remove bounds checks in iterate(::Tuple) #28847

Merged

KristofferC mentioned this pull request Mar 21, 2019

while i <= L makes function >5x slower the while i < (L+1) #31416

Closed

mbauman mentioned this pull request Apr 8, 2021

Annotate iterate(::AbstractArray) with inbounds #40397

Closed

Guarantee inbounds iteration over Array{T} #27386

Guarantee inbounds iteration over Array{T} #27386

Conversation

haampie commented Jun 2, 2018 • edited Loading

haampie commented Jun 2, 2018

Sacha0 commented Jun 2, 2018 • edited Loading

haampie commented Jun 2, 2018

Keno Jun 2, 2018

Choose a reason for hiding this comment

haampie Jun 2, 2018 • edited Loading

Choose a reason for hiding this comment

Keno Jun 2, 2018

Choose a reason for hiding this comment

haampie Jun 2, 2018 • edited Loading

Choose a reason for hiding this comment

vtjnash Jun 2, 2018

Choose a reason for hiding this comment

nanosoldier commented Jun 2, 2018

haampie commented Jun 2, 2018 • edited Loading

JeffBezanson commented Jun 3, 2018

andreasnoack commented Jun 3, 2018

haampie commented Jun 3, 2018

nanosoldier commented Jun 3, 2018

haampie commented Jun 3, 2018 • edited Loading

Sacha0 commented Jun 3, 2018 • edited Loading

KristofferC commented Jun 3, 2018

nanosoldier commented Jun 3, 2018

haampie commented Jun 4, 2018 • edited Loading

mbauman commented Jun 4, 2018

vtjnash commented Jun 4, 2018

mbauman commented Jun 4, 2018

vtjnash commented Jun 4, 2018

haampie commented Jun 4, 2018 • edited Loading

Keno commented Jun 4, 2018

haampie commented Jun 4, 2018 • edited Loading

mbauman commented Jun 4, 2018 • edited Loading

JeffBezanson commented Jun 5, 2018

KristofferC commented Jun 25, 2018

nanosoldier commented Jun 25, 2018

mbauman commented Jun 25, 2018 • edited Loading

KristofferC commented Jun 25, 2018 • edited Loading

mbauman commented Jun 25, 2018

Sacha0 commented Jun 25, 2018

nalimilan commented Jun 26, 2018

KristofferC commented Jun 26, 2018 • edited Loading

nalimilan commented Jun 26, 2018

KristofferC commented Jun 26, 2018

nalimilan commented Jun 26, 2018

mbauman commented Jun 26, 2018

nalimilan commented Jun 26, 2018

ararslan commented Jun 26, 2018

nanosoldier commented Jun 26, 2018

mbauman commented Jun 27, 2018

KristofferC commented Jun 27, 2018

nanosoldier commented Jun 27, 2018

KristofferC commented Jun 27, 2018

nalimilan commented Jun 28, 2018

haampie commented Jun 28, 2018

haampie commented Jun 2, 2018 •

edited

Loading

Sacha0 commented Jun 2, 2018 •

edited

Loading

haampie Jun 2, 2018 •

edited

Loading

haampie Jun 2, 2018 •

edited

Loading

haampie commented Jun 2, 2018 •

edited

Loading

haampie commented Jun 3, 2018 •

edited

Loading

Sacha0 commented Jun 3, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

mbauman commented Jun 4, 2018 •

edited

Loading

mbauman commented Jun 25, 2018 •

edited

Loading

KristofferC commented Jun 25, 2018 •

edited

Loading

KristofferC commented Jun 26, 2018 •

edited

Loading