-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subsets(collection,Val{k})
#13
Add subsets(collection,Val{k})
#13
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
+ Coverage 87.79% 88.64% +0.84%
==========================================
Files 2 2
Lines 295 317 +22
==========================================
+ Hits 259 281 +22
Misses 36 36
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a few comments; be sure to re-run performance checks to ensure my suggestions don't negatively impact performance.
I have a feeling a couple of them will reduce allocations.
src/IterTools.jl
Outdated
sidx = ((_,idx...)->idx)(idx.data...) # Type-stable version of idx.data[2:end] | ||
x = map(i->xs[i], sidx) | ||
|
||
begin # i = findlast(i->idx[i] != length(xs)-K+i-1, 2:K+1)+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
begin
doesn't introduce a scope. Is that what you were intending to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I am using the code blocks to indicate that these bits of code are the type-stable versions of the commented snippets.
src/IterTools.jl
Outdated
|
||
function next(it::StaticSizeBinomial{K,C}, idx) where {K,C} | ||
xs = it.xs | ||
sidx = ((_,idx...)->idx)(idx.data...) # Type-stable version of idx.data[2:end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be shift(idx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated (see below).
src/IterTools.jl
Outdated
function next(it::StaticSizeBinomial{K,C}, idx) where {K,C} | ||
xs = it.xs | ||
sidx = ((_,idx...)->idx)(idx.data...) # Type-stable version of idx.data[2:end] | ||
x = map(i->xs[i], sidx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x = xs[sidx]
should work just as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note that the above two changes might change the eltype
, which should be reflected elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to work:
julia> (1:5)[(2,4,3)]
ERROR: ArgumentError: invalid index: (2, 4, 3)
Stacktrace:
[1] getindex(::UnitRange{Int64}, ::Tuple{Int64,Int64,Int64}) at ./abstractarray.jl:883
julia> collect(1:5)[(2,4,3)]
ERROR: ArgumentError: invalid index: (2, 4, 3)
Stacktrace:
[1] getindex(::Array{Int64,1}, ::Tuple{Int64,Int64,Int64}) at ./abstractarray.jl:883
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it does work if we index with a StaticArray
. Changed it to x = xs[shift(idx)].data
. I think we should anyway return a tuple and not a StaticArray
here.
src/IterTools.jl
Outdated
xs::Container | ||
end | ||
|
||
iteratorsize(::Type{<:StaticSizeBinomial}) = HasLength() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if the container has length; see the Subsets
type. Also define iteratoreltype
as this will only have eltype if Container
has eltype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the implementation of Binomial
, which has the same issues. The point with iteratoreltype
is fair enough and should be changed for all three cases of subsets
. All three implementations require length(xs)
, though, so all subsets
iterators have a length whenever they're defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems several iterators in this package do not define iteratoreltype
correctly. So it might be worth opening up another PR to fix this for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binomial doesn't have the same issues, as it only accepts a Vector
which always has an eltype and length.
src/IterTools.jl
Outdated
eltype(::Type{StaticSizeBinomial{K,C}}) where {K,C} = NTuple{K,eltype(C)} | ||
length(it::StaticSizeBinomial{K,C}) where {K,C} = binomial(length(it.xs),K) | ||
|
||
subsets(xs,::Type{Val{K}}) where {K} = StaticSizeBinomial{K,typeof(xs)}(xs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth replacing subsets(xs, k::Integer)
with this implementation as well? It wouldn't be as fast as this but might it be as fast or faster than the Binomial
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, although I am not sure whether it is worth the effort. Imagine you want to list all subsets of size n-1
of a set of size n
, for n = O(10_000)
. That means you would have tuples of length O(n)
in your code, which seems odd. And it would be a breaking change, so a bit painful to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that makes sense. Perhaps include a comment about the tradeoffs?
src/IterTools.jl
Outdated
subsets(xs,::Type{Val{K}}) where {K} = StaticSizeBinomial{K,typeof(xs)}(xs) | ||
|
||
using StaticArrays | ||
function start(it::StaticSizeBinomial{K,C}) where {K,C} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave the parameters off the type when they're not used; C
is not used here and in other places, and neither K
nor C
is used in done
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Still getting used to the <:Type
syntax.
src/IterTools.jl
Outdated
using StaticArrays | ||
function start(it::StaticSizeBinomial{K,C}) where {K,C} | ||
n = length(it.xs) | ||
return MVector((K <= n ? 0 : 1, ntuple(i->i,Val{K})...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ntuple(identity, Val{K}())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/IterTools.jl
Outdated
eltype(::Type{StaticSizeBinomial{K,C}}) where {K,C} = NTuple{K,eltype(C)} | ||
length(it::StaticSizeBinomial{K,C}) where {K,C} = binomial(length(it.xs),K) | ||
|
||
subsets(xs,::Type{Val{K}}) where {K} = StaticSizeBinomial{K,typeof(xs)}(xs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be subsets(xs, ::Val{K})
instead (you'll have to adjust the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subsets(xs, ::Val{K})
syntax is from 0.7 onwards, and mere mortals like me are still on 0.6. This will be a trivial change once 0.7 gets officially released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsets(xs, ::Val{K})
works just fine on 0.6. You just need to pass Val{K}()
.
All the above requests should be addressed now. Unfortunately, I found a bug in my code which meant that I effectively only iterated over all subsets of size
I further found the culprit for the extra allocations: In conclusion, this implementation of |
Found an alternative implementation which avoids
|
src/IterTools.jl
Outdated
n = length(it.xs) | ||
return MVector((K <= n ? 0 : 1, ntuple(identity,Val{K}())...)) | ||
end | ||
@generated minus1(::Val{A}) where {A} = :(Val{$(A-1)}()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jameson has said "no" to this in the past. However, you can use first
and Base.tail
for inferred head and tail on tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced pop()
with
@inline pop(t::NTuple) = reverse(Base.tail(reverse(t))), t[end]
Performance seems to stay the same. Don't really see the problem with the old code, though.
Did you benchmark this on 0.7 as well? Inference has changed in ways that might affect this sort of code (it did in AxisArrays, for example). |
I'm trying to benchmark on 0.7, but |
I think you could checkout this branch: JuliaIO/JSON.jl#226 Or wait for it to be merged. |
I added documentation for the new method and benchmarked on 0.7. The performance gets worse indeed.
|
Sorry, I realised I never came round to fix |
Could you rebase to resolve the conflicts? I'll revisit this today. |
I believe I resolved the conflicts, but as it is the first time of me doing this I might have well screwed up some things. |
It turns out the Thanks for your perseverance! |
Nope, I forgot when and why I put the Thank you for the mentoring, it's been very instructive for me! |
Yup, ran all the same benchmarks. You're welcome!! |
Implements #12.
Performance shoot-out:
Conclusions: