-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made WeightVec
a subtype of RealVector
#248
Conversation
@@ -281,6 +288,9 @@ Base.mean{T<:Number,W<:Real}(A::AbstractArray{T}, w::WeightVec{W}, dim::Int) = | |||
|
|||
|
|||
###### Weighted median ##### | |||
function Base.median(v::AbstractArray, w::WeightVec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this method be unnecessary since all you're doing is throwing a MethodError
? Presumably without the method, it would also be a MethodError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to added this because this test was failing with a BoundsError
instead. It appears that making WeightVec
a RealVector
allowed it to dispatch to a method in base.
julia> median([4 3 2 1 0], weights(wt))
ERROR: BoundsError: attempt to access 2-element Array{Any,1} at index [3]
in mapslices(::Base.#median!, ::Array{Int64,2}, ::StatsBase.WeightVec{Int64,Array{Int64,1},Int64}) at ./abstractarray.jl:1619
in median(::Array{Int64,2}, ::StatsBase.WeightVec{Int64,Array{Int64,1},Int64}) at ./statistics.jl:579```
I figured it was better to manually throw a `MethodError` rather than letting it hit the `BoundsError` in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. This is fine then. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the current behavior isn't that different:
julia> median([4 3 2 1 0], weights([1,2,3,4,5]))
ERROR: MethodError: no method matching start(::StatsBase.WeightVec{Int64,Array{Int64,1}})
Closest candidates are:
start(::SimpleVector) at essentials.jl:259
start(::Base.MethodList) at reflection.jl:560
start(::ExponentialBackOff) at error.jl:107
...
Stacktrace:
[1] append_any(::StatsBase.WeightVec{Int64,Array{Int64,1}}, ::Vararg{StatsBase.WeightVec{Int64,Array{Int64,1}},N} where N) at ./essentials.jl:170
[2] median(::Array{Int64,2}, ::StatsBase.WeightVec{Int64,Array{Int64,1}}) at ./statistics.jl:619
So I'm not sure this MethodError
is really needed. We typically don't throw them for all possible cases that fail.
It would be great if you could add tests for this. |
src/weights.jl
Outdated
immutable WeightVec{W,Vec<:RealVector} | ||
values::Vec | ||
sum::W | ||
immutable WeightVec{T, V<:RealVector, S} <: RealVector{T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better do WeightVec{S<:Real, T<:Real, V<:AbstractVector{T}} <: AbstractVector{T}
. That should simplify constructors since the parameter could be inferred from the input arguments.
EDIT: RealVector
is just an alias used for conciseness, but it's not useful when defining types since it's redundant with the information provided by T<:Real
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really useful to allow typeof(sum)
and eltype(values)
to be different? Having three parameters for such a simple type seems overly complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this too, then I realized that it might be useful to allow storing weights using a small type (UInt8
, Float16
) to save space, and yet be able to store the sum in a type that won't overflow or lose precision too easily (Int64
, BigInt
, Float64
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view was that all of these could be useful in the cases like WeightVec{Bool, NullableArray{Bool}, Int}
, but I figured you'd be more likely to want to dispatch on V
than S
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan Isn't WeightVec{S<:Real, T<:Real, V<:AbstractVector{T}} <: AbstractVector{T}
triangular dispatch which is only supported in 0.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan I've updated the parameterization to match your suggestion, but I needed to version check whether we can use triangular dispatch (implemented in this PR).
- Parameterization to `WeightVec{S<:Real, T<:Real, T<:AbstractVector}` - Uses triangular dispatch for julia version passed v"0.6.0-dev.2123"
src/weights.jl
Outdated
end | ||
|
||
function WeightVec{T<:Real, V<:AbstractVector{T}}(vs::V) | ||
sum_ = sum(vs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing underscore is a bit weird. Why not use s
or any other simple name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly just habit as I try to avoid using single character variable names. For math and stats code I tend to use the name of the function name with an underscore as my variable names if I can't think of anything better.
src/weights.jl
Outdated
sum::S | ||
end | ||
|
||
function WeightVec{S<:Real, T<:Real, V<:AbstractVector{T}}(vs::V, s::S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to move this function and the next one out of the version-dependent block using V<:RealVector
and replacing T
with eltype(vs)
. That is, just slightly adapting the existing constructors.
src/weights.jl
Outdated
end | ||
|
||
""" | ||
WeightVec(vs, [wsum]) | ||
function WeightVec{S<:Real, V<:RealVector}(vs::V, s::S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is incorrect, the signature should just reflect how the user can call the function, not how it's implemented.
src/weights.jl
Outdated
|
||
Construct a `WeightVec` with weight values `vs` and sum of weights `wsum`. | ||
If omitted, `wsum` is computed. | ||
""" | ||
WeightVec{Vec<:RealVector,W<:Real}(vs::Vec,wsum::W) = WeightVec{W,Vec}(vs, wsum) | ||
WeightVec(vs::RealVector) = WeightVec(vs, sum(vs)) | ||
function WeightVec{S<:Real, V<:RealVector}(vs::V, s::S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can actually use s::S=sum(vs)
to get rid of the next method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't realize that was valid syntax in 0.5 (ie: referencing an earlier argument when setting the default value).
src/weights.jl
Outdated
@@ -281,6 +290,9 @@ Base.mean{T<:Number,W<:Real}(A::AbstractArray{T}, w::WeightVec{W}, dim::Int) = | |||
|
|||
|
|||
###### Weighted median ##### | |||
function Base.median(v::AbstractArray, w::WeightVec) | |||
throw(MethodError(median, (typeof(v), typeof(w)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadn't noticed that one. Should pass (v, w)
instead of their types. Otherwise, looks good to me.
Should I squash my commits or is a more verbose history preferred here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! No need for squashing, we can do it from GitHub when merging.
I'm pretty sure this change is what broke DataArrays. Weighted |
@ararslan Can you post the specific error? This seems to work.
Not the exact same version of StatsBase, but pretty close for this use case. |
Can you post the code which triggers the failure? |
From the DataArrays tests: da1 = DataArray(randn(128))
da2 = DataArray(randn(128))
da1[1:3:end] = NA
da2[1:2:end] = NA
mean(da1, weights(da2)) The last line is now a
|
@ararslan That makes sense cause |
Removing the restriction |
OK. I guess we could remove the |
Perhaps, but I'm not sure it's worth loosening that restriction; I don't think it makes sense to have a weight vector whose sum is missing. |
Perhaps it would make more sense to require that the input to a weight constructor be a |
Well, there's no point in restricting the type of underlying array. Let's just remove that test from DataArrays if we don't have a use case for it. |
@ararslan To answer your specific question, it was initially |
Seems like a better approach for that use case would be to use |
Not necessarily, since you can |
Okay ¯\_(ツ)_/¯ |
Cool, DataArrays is all good now. However, it appears this change still introduced ambiguities with Base methods:
|
Shouldn't the first two go away once those deprecations are removed? Not sure about the last one though. |
Yeah the first two aren't too worrisome, it's mostly the last one. This previously wasn't a problem because we didn't have |
Could we do something like |
We can't overload a Base function with Base types though; this is the sinful act of type piracy. |
All the more reason for it not to be a Base function? :) |
Wait also does this even count as type piracy cause we're overloading our own behaviour and not base? If StatsBase wasn't loading these would result in a |
If we extend a Base function using Base types, regardless of how we define the behavior, we've committed |
Well in that case we're already committing type piracy with |
Oh dang you're right. If we get rid of those we should be okay on ambiguities. We should commence... the purge. |
Indeed it's surprising that we ship this method. It's not even documented in StatsBase. Let's remove it, the ambiguities due to deprecations are OK (though they shouldn't be hard to fix if we wanted to). |
I think it makes sense to either keep that method or add it to Base which already has a |
I guess we could add it to Base. But since it's not exported, it's not really an issue whether a method is missing or not. |
Ah, cool. I extrapolated incorrectly from |
That wouldn't help us for 0.5 and 0.6 though, since if we add the method to Base then it will only be available in 0.7+. |
We can just define an internal function that behaves like |
As far as i can tell it's only being used here, so we could probably rename it to |
Not worth a deprecation IMO since it's a Base function. |
Certainly not worth a deprecation. I think the easiest solution is to define the method on older Julia versions using a |
Seems like
WeightVec
should support the same functionality as regular vectors.