Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Zip data structure to a flat one #29238

Merged
merged 16 commits into from
Nov 15, 2018
Merged

Change Zip data structure to a flat one #29238

merged 16 commits into from
Nov 15, 2018

Conversation

martinholters
Copy link
Member

Instead of creating a zip of more than two iterators by forming a tree structure, store all iterators in a tuple. It should be possible to get rid of Zip1 and Zip2, but I wasn't sure of a good benchmark to quickly check locally whether the more general code was sub-optimal in the short case, hence WIP—ideas welcome!

Fixes #26765.

@martinholters
Copy link
Member Author

@Keno you have implemented this elaborate dance:

julia/base/iterators.jl

Lines 295 to 316 in 3a15e6b

function zip_iterate(a, b, sta, stb) # the states are either Tuple{} or Tuple{Any}
da, db = isdone(a), isdone(b)
da === true && return nothing
db === true && return nothing
if da === missing
ya = iterate(a, sta...)
ya === nothing && return nothing
end
if db === missing
yb = iterate(b, stb...)
yb === nothing && return nothing
end
if da === false
ya = iterate(a, sta...)
ya === nothing && return nothing
end
if db === false
yb = iterate(b, stb...)
yb === nothing && return nothing
end
return (ya, yb)
end

IIUC, you first iterate those iterators where isdone yielded missing because that has a change to bail out early, i.e. if the first iterator gives nothing there is no need to even call iterate on the second one. Do you have an example where that makes a tangible difference? (I have not replicated this in full for the Zip iterator here, yet, and a test case would help me understand the issue when doing so.)

@haampie
Copy link
Contributor

haampie commented Sep 18, 2018

About the isdone stuff: if iterator 1 is not exhausted but stateful and iterator 2 is exhausted, then you should not advance iterator 1 in the next iterate call. An example is where you have one iterator reading lines from a file, zipped with a range 1:5 or something -- iterator 1 should not end up consuming line number 6.

See also my attempt in #27511.

This test covers this issue: https://github.com/JuliaLang/julia/blob/master/test/iterators.jl#L537-L542

Shoudn't you remove Zip1 and Zip2 in this PR? I don't read very well.

Edit: I would be a big fan of finally getting this zip business fixed!

@martinholters
Copy link
Member Author

martinholters commented Sep 19, 2018

Stateful iterators! Of course! [hand against forehead]

Thanks for the pointers!

@martinholters martinholters force-pushed the mh/zip branch 2 times, most recently from 4f8e9e3 to 6663c8f Compare September 20, 2018 14:04
@martinholters
Copy link
Member Author

i did some benchmarking collecting zipped UnitRanges as those should contribute relatively little runtime themselves. Here is what I got:

collect(...) master this PR
zip(1:1) 28.990 ns 28.008 ns
zip(1:1,1:1) 28.710 ns 30.705 ns
zip(1:1,1:1,1:1) 32.954 ns 34.288 ns
zip(1:1,1:1,1:1,1:1) 304.869 ns 118.260 ns
zip(1:1000) 1.090 μs 1.097 μs
zip(1:1000,1:1000) 1.767 μs 1.781 μs
zip(1:1000,1:1000,1:1000) 2.305 μs 2.351 μs
zip(1:1000,1:1000,1:1000,1:1000) 757.447 μs 4.971 μs

The only regressions above measurement noise here are for zip(1:1,1:1) and zip(1:1,1:1,1:1), so apparently the setup time in these cases is slightly higher. I'm inclined not to worry about this. On the other hand, I'm surprised by the improvement for 4 zipped iterators, where the old Zip is still type-stable (at least on the surface).

However, I have no idea whether these benchmarks are relevant, and BaseBenchmarks do not include any direct benchmarks of zip performance. So I'm all ears for proposals here.

That said, I think the code is now in a state where review makes sense (unless CI finds something).

@martinholters martinholters changed the title WIP: Change Zip data structure to a flat one Change Zip data structure to a flat one Sep 21, 2018
@haampie
Copy link
Contributor

haampie commented Sep 21, 2018

I think the bench in the last row hits a type instability:

f() = collect(zip(1:1000,1:1000,1:1000,1:1000))
@btime f()

gives 1.254 ms (16439 allocations: 678.52 KiB) on Julia 1.0 and 6.328 μs (2 allocations: 31.33 KiB) on this branch.

@@ -540,10 +540,73 @@ end
@test isempty(collect(zip(b,a)))
@test !isempty(a)
end
let a = Iterators.Stateful("a"), b = "", c = Iterators.Stateful("c")
@test isempty(collect(zip(a,b,c)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can I say, I have an eye for pedantry :P

@test length(collect(zip(a,b,c))) == 1
@test !isempty(a)
@test !isempty(c)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

@martinholters
Copy link
Member Author

AV 64bit says:

Error in testset bigint:
Test Failed at C:\projects\julia\julia-2baf73b8f9\share\julia\test\bigint.jl:422
  Expression: cmp(big(x), x) === cmp(x, big(x)) === 0
   Evaluated: -1 === 1 === 0
ERROR: LoadError: Test run finished with errors

I think is due to a bug in big:

julia> big(Int8(-128))
-18446744073709551488

(x is chosen randomly in the failed test above, so may just have hit this case.)

@StefanKarpinski
Copy link
Member

Failure is unrelated: introduced by #29165, fixed by #29340.

@martinholters
Copy link
Member Author

martinholters commented Sep 25, 2018

Hm, master has improved its handling of the old Zip implementation: EDIT: Not true, see comments below.

collect(...) master this PR
zip(1:1) 27.867 ns 28.849 ns
zip(1:1,1:1) 28.878 ns 30.777 ns
zip(1:1,1:1,1:1) 32.988 ns 34.253 ns
zip(1:1,1:1,1:1,1:1) 115.974 ns 121.897 ns
zip(1:1000) 1.069 μs 1.069 μs
zip(1:1000,1:1000) 1.781 μs 1.613 μs
zip(1:1000,1:1000,1:1000) 2.413 μs 2.367 μs
zip(1:1000,1:1000,1:1000,1:1000) 3.204 μs 4.749 μs

Inferability is still slightly better:

julia> x = zip(1:1, ["a"], (1.0,), Base.OneTo(1), Iterators.repeated("a"), 1.0:0.2:2.0,
               (1 for i in 1:1), Iterators.Stateful(["a"]), (1.0 for i in 1:2, j in 1:3), 1);

julia> z = Iterators.filter(x -> x[1] == 1, x);

julia> Base.return_types(first, Tuple{typeof(z)})
1-element Array{Any,1}:   # this PR
 Tuple{Int64,String,Float64,Int64,String,Float64,Int64,String,Float64,Int64}
1-element Array{Any,1}:   # master and 1.0
 Tuple{Int64,Any,Float64,Any,String,Any,Int64,String,Float64,Int64}

But I'm not sure the balance struck by the PR in its current form is better than master. If anyone wants to take a stab at tweaking this further, welcome! I won't have time to work on this for at least one week.

@martinholters
Copy link
Member Author

Or I might be doing something very wrong, because I now see significantly better timing on 1.0, too. Compare e.g.

julia> f() = collect(zip(1:1000,1:1000,1:1000,1:1000))
f (generic function with 1 method)

julia> @btime f()
  2.864 μs (2 allocations: 31.33 KiB)

with #29238 (comment).

@haampie
Copy link
Contributor

haampie commented Sep 25, 2018

Which version exactly? For reference:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
julia> f() = collect(zip(1:1000,1:1000,1:1000,1:1000));
julia> @benchmark f()
BenchmarkTools.Trial: 
  memory estimate:  678.52 KiB
  allocs estimate:  16439
  --------------
  minimum time:     619.442 μs (0.00% GC)
  median time:      625.912 μs (0.00% GC)
  mean time:        681.834 μs (5.98% GC)
  maximum time:     31.411 ms (97.53% GC)
  --------------
  samples:          7326
  evals/sample:     1

@martinholters
Copy link
Member Author

Ah, it's a question of compilation (inference) order. I created the data for above tables with

for N in (1,1000), M in (1,2,3,4)
    @btime collect(zip($(Iterators.repeated(1:N, M)...,)...))
end

That gives the very fast timings on 1.0, also for timing f() in the same session later on. Reversing the order of values for M (number of zipped iterators) and using a fresh session gives the bad timing again, also on master. This PR make the performance independent of the compilation/inference order (which is good), but the performance is a bit worse than the best that was possible before (which is not so good). Why can't I have my cake and eat it too?

@oscardssmith
Copy link
Member

Why would performance be different for different orders? That sounds like a bug which should be fixed.

@martinholters
Copy link
Member Author

Inference will limit its work on recursive functions (it has to to avoid working infinitely). However, if it hits a previously computed result before it reaches its limit, it will use that. That's why running inference on increasing problem sizes often produces better results than immediately running it on the largest problem size.

@haampie
Copy link
Contributor

haampie commented Sep 28, 2018

So, what is left to do in this PR? Is the only issue that we have to figure out why this PR is somewhat slower than master when inference would work well?

Instead of creating a `zip` of more than two iterators by forming a tree 
structure, store all iterators in a tuple.
(don't advance isdone=false iterators if any other has reached its end)
@haampie
Copy link
Contributor

haampie commented Oct 4, 2018

I think there is still an inference test missing for iterate(::Zip, state), no? I could rebase #27516 so that inference testing is less awkward. But that could also happen after this is merged.

@martinholters
Copy link
Member Author

I'd still appreciate a review by another pair of eyes. (I've poked at this for long enough that I'm quite likely somewhat blind to stupid mistakes/obvious possible improvements.) That said, I'll nevertheless go ahead and merge in the next couple of days unless any objections are voiced (or CI hits something).

@martinholters
Copy link
Member Author

I think there is still an inference test missing for iterate(::Zip, state)

True. @inferred(first(Iterators.drop(z, 1))) should accomplish that, right?

@martinholters
Copy link
Member Author

Even though there are no zip-specific benchmarks (yet), it might still be used somewhere, so @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@martinholters
Copy link
Member Author

Ah, I had lost track of this one... I'll re-trigger CI and also @nanosoldier runbenchmarks(ALL, vs = ":master") and if all goes well will merge soonish.

@StefanKarpinski
Copy link
Member

Both failures are the now-standard #29923.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC KristofferC added the potential benchmark Could make a good benchmark in BaseBenchmarks label Nov 14, 2018
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 14, 2018

#29923 again, otherwise looks fine.

@martinholters
Copy link
Member Author

From the 👍 to the benchmark results, I take it that the few regressions are within the expected fluctuations? (I have not been looking at nanosoldier outputs that much recently.) In any case, they seem unrelated at least at first glance.

Regarding the "benchmark wanted" label: The new ["misc", "iterators", "zip(...)"] benchmarks exercise the zip iterator. Unfortunately, it depends on the evaluation order whether the current master zip is successfully inferred. Apparently, we're hitting the benign case, which is why this PR does not show up as a prominent improvement. Do you consider that insufficient, @KristofferC, or did you miss the existence of these new benchmarks?

@KristofferC
Copy link
Member

If this is just fixing inference order, feel free to remove the label. I also want to point out that the "benchmark beneficial" label is not a requirement for merging nor is it even a request for the author to make a PR to BaseBenchmarks. It is just as a tag so when someone feels they want to add stuff to BaseBenchmarks (JuliaCI/BaseBenchmarks.jl#233) they can just filter on the label.

@martinholters martinholters merged commit e209c3d into master Nov 15, 2018
@martinholters martinholters deleted the mh/zip branch November 15, 2018 07:49
@martinholters martinholters removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Nov 15, 2018
@martinholters
Copy link
Member Author

The benchmarks cover zip now, and the inferability is verified by the tests added in this PR, so I don't think we need more benchmarks.

tkf pushed a commit to tkf/julia that referenced this pull request Nov 21, 2018
Instead of creating a `zip` of more than two iterators by forming a tree 
structure, store all iterators in a tuple.
@ararslan
Copy link
Member

ararslan commented Nov 28, 2018

This broke the Bridge package, which was using Zip2. Edit: OnlineStats too.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 28, 2018

I think the Bridge package needs to adjust then since Zip2 was not public (but good to know!).

@ararslan
Copy link
Member

@mschauer, @joshday

@joshday
Copy link

joshday commented Nov 28, 2018

OnlineStats should be an easy fix. I defined a method for some function on Zip2 in the tests only.

@mschauer
Copy link
Contributor

Great to get the heads up. No problem from my side, this PR is very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants