Simplify zip iterator and improve performance in certain cases #27415

haampie · 2018-06-04T09:27:16Z

This is a separate pr built on top of #27386. It removes the Zip2 struct, since what used to be Zip1 is already returning a 1-tuple and hence is a base case (turns out @StefanKarpinski was right in the comments here: eaf5a95, but maybe only after Zip1 was updated.)

Further it fixes a bug in isdone where the tail of the zip iterator was never checked (https://github.com/JuliaLang/julia/blob/master/base/iterators.jl#L373). Should still add a test for this.

I've built this on top of #27386 to show that we now finally get fully optimized code in some cases -- note the lack of @inbounds:

$ ./julia
> using BenchmarkTools
> function f(v)
     s = 0
     for (i, w) = zip(LinearIndices(v), v)
       s += i * w
     end
     s
   end
> @benchmark f(v) setup = (v = rand(Int, 1000))

On current master:

  median time:      546.735 ns (0.00% GC)

On this branch:

  median time:      245.483 ns (0.00% GC)

This is now on par with writing for (i, w) = enumerate(v) and with the handwritten loop

@inbounds for i = 1 : length(v)
  s += i * v[i]
end

StefanKarpinski · 2018-06-04T12:42:44Z

Very cool! Test failure in test/iterators looks legitimate though. It would be nice to be able to define enumerate(v) simply as zip(LinearIndices(v), v) and have that be maximally efficient.

haampie · 2018-06-04T13:29:43Z

Yeah, test failures are annoying, but potentially there's still a workaround without the machinery proposed in #27412.

For vectorization it seems crucial to call iterate first on both iterators, and only afterwards check the state. This corresponds to the case isdone(a) === isdone(b) === missing. Otherwise we have at least one stateful iterator, in which case early returns are required.

haampie · 2018-06-04T15:00:26Z

It seems this passes the iterators tests ~~while retaining the performance in the example above :)~~, hm, not really, apparently some last changes spoiled it.

Can we tell the compiler somehow the first branch in zip_iterate is almost always true?

haampie · 2018-06-04T15:31:19Z

Alright, things get vectorized again. This stuff is extremely fragile, so maybe I should add comments with warnings about spoiling vectorization.

It could be way more robust and simpler if we did not have to worry about stateful iterators.

edit: I stumbled upon an inference issue in the case of 3+ iterators -- let's see if it can be fixed.

…ar} iterator

Furthermore, reorder the instructions for the fast path to enable vectorization in some cases.

bramtayl · 2018-06-05T14:51:20Z

n.b. #26765

haampie · 2018-06-05T15:44:40Z

Thanks for the pointer, I wasn't aware of that issue.

haampie · 2018-06-05T22:21:46Z

Some observations: https://gist.github.com/haampie/5dd6803b217709c4629d3eeb75c16258

Basically, we can do really well if we specialize for stateless iterators. Type inference seems perfect even with recursion up to 14 calls of iterate deep (cc @bramtayl); some daunting looking loops that have closed-form expressions get optimized away; and vectorization kicks in.

I cannot get all these great results to work with the current way stateful iterators are handled. If I find time I'll take a look at how we could use simple traits to work around this.

haampie · 2018-06-06T14:32:26Z

I think I might give up on this pr -- at least on the recursive structure. It's really more of an art than a science to get performance right.

The results in the comment above look promising, but in turns out it does not generate optimal code for

function vectorize_multiple_products(xs::Vector{T}, ys::Vector{T}, zs::Vector{T}) where {T}
    s = zero(T)
    for (x,y,z) = zip(xs, ys, zs)
        s += x * y * z
    end
    s
end

@benchmark vectorize_multiple_products(xs, ys, zs) = setup(xs = rand(1000); ys = rand(1000); zs = rand(1000);)

which is about 60% slower than the current implementation of zip; unless one interchanges the tail = iterate(z.tail, state[2]) and head === nothing && return nothing lines, but then it does not recognize it can simplify optimize_away_loop_with_closed_form. In the recursive setting it seems virtually impossible to fix all problems.

haampie force-pushed the simplify-zip branch from 5701e79 to 7d2c0e6 Compare June 4, 2018 09:41

haampie force-pushed the simplify-zip branch from 7d2c0e6 to 95f30a0 Compare June 4, 2018 14:58

haampie force-pushed the simplify-zip branch from 95f30a0 to f68afde Compare June 4, 2018 15:26

haampie mentioned this pull request Jun 4, 2018

Guarantee inbounds iteration over Array{T} #27386

Merged

haampie force-pushed the simplify-zip branch from f68afde to 23669f7 Compare June 5, 2018 07:24

haampie added 4 commits June 5, 2018 11:58

Guarantee inbounds iteration over Array{T}

4ce9066

Replace indexing with an iterator with actual indexing

45174ab

Make the CharStr iterator unaware of the initial state of a Vector{Ch…

8923c1b

…ar} iterator

Remove the Zip2 struct, since Zip1 already forms a valid base case.

20723d8

Furthermore, reorder the instructions for the fast path to enable vectorization in some cases.

haampie force-pushed the simplify-zip branch from 23669f7 to 20723d8 Compare June 5, 2018 08:58

StefanKarpinski requested a review from JeffBezanson June 5, 2018 14:55

haampie closed this Jun 6, 2018

haampie deleted the simplify-zip branch June 6, 2018 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify zip iterator and improve performance in certain cases #27415

Simplify zip iterator and improve performance in certain cases #27415

haampie commented Jun 4, 2018 •

edited

Loading

StefanKarpinski commented Jun 4, 2018

haampie commented Jun 4, 2018

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

bramtayl commented Jun 5, 2018

haampie commented Jun 5, 2018

haampie commented Jun 5, 2018

haampie commented Jun 6, 2018 •

edited

Loading

Simplify zip iterator and improve performance in certain cases #27415

Simplify zip iterator and improve performance in certain cases #27415

Conversation

haampie commented Jun 4, 2018 • edited Loading

StefanKarpinski commented Jun 4, 2018

haampie commented Jun 4, 2018

haampie commented Jun 4, 2018 • edited Loading

haampie commented Jun 4, 2018 • edited Loading

bramtayl commented Jun 5, 2018

haampie commented Jun 5, 2018

haampie commented Jun 5, 2018

haampie commented Jun 6, 2018 • edited Loading

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 4, 2018 •

edited

Loading

haampie commented Jun 6, 2018 •

edited

Loading