Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for weighted StatsBase methods #28

Merged
merged 7 commits into from
Nov 17, 2020

Conversation

rofinn
Copy link
Collaborator

@rofinn rofinn commented Sep 17, 2020

This PR doesn't wrap everything, but it wraps most of the main weighted functions that we use at Invenia. Closes #26

@mcabbott
Copy link
Owner

This looks good, thanks.

I have one design question. In some sense a natural place for AbstractWeights to live is as one of the key-vectors, attached to a dimension. Is this something which ought to be made easy? I'm not sure how, probably it would it be too confusing for mean(A) to behave completely differently for axiskeys(A,2) isa AbstractWeights compared to a vector, and you may also want mean(A, dims=2) to ignore the keys if they have a different meaning.

@rofinn
Copy link
Collaborator Author

rofinn commented Sep 18, 2020

Yeah, I feel like having the keys be statistical weights would be a bit confusing... at least for our use cases. We typically want to do things like compute the exponentially weighted mean along a :time dimension which would already have ordered DateTime keys. I suppose if you could embed the weights alongside the existing keys then that would be kinda slick, but I'm not sure if that's worth the extra complications. I think a more useful addition might be to support wrapped weights. For example:

mean(A::KeyedArray, wv::KeyedVector{T, <:AbstractWeights}; kwargs...)

That could still be useful for seeing how weights map to each key along that dimension.

@rofinn
Copy link
Collaborator Author

rofinn commented Sep 22, 2020

Alright, I've rebased and added support for KeyedArray weight vectors. I've also opted to wrap the covariance estimation methods related to a NameDims.jl issue. invenia/NamedDims.jl#120

@rofinn
Copy link
Collaborator Author

rofinn commented Sep 22, 2020

Queued job seems to have completed, but github wasn't notified.

@mcabbott
Copy link
Owner

mcabbott commented Oct 4, 2020

I dropped the ball here too, sorry.

Re using key-vector as weights, maybe I start to agree that this isn't such a great idea.

Is CovarianceEstimation a fairly lightweight package? I guess if invenia/NamedDims.jl#122 loads it, then there's no cost to loading it here too. While if NamedDims uses Requires.jl, then possibly this should follow that?

src/functions.jl Outdated Show resolved Hide resolved
src/functions.jl Outdated Show resolved Hide resolved
@rofinn
Copy link
Collaborator Author

rofinn commented Oct 4, 2020

Yeah, CovarianceEstimation.jl would only add StatsBase.jl, which we're considering adding here anyways. My issue with Requires.jl is that the load time performance hasn't been great (might be better on julia >= 1.5 though). It also doesn't let you specify dependency requirements which is particularly problematic for packages like StatsBase and CovarianceEstimation which are still pre-1.0.

@mcabbott
Copy link
Owner

mcabbott commented Oct 5, 2020

OK, I think Requires has got faster, but haven't timed things exhaustively. It is loaded by NamedDims already, but I also don't know whether that influences things. I am a little surprised how many things StatsBase depends on, despite its name: https://juliahub.com/ui/Packages/StatsBase/EZjIG/0.33.1?t=1 But that's not the end of the world.

@rofinn
Copy link
Collaborator Author

rofinn commented Oct 24, 2020

I am a little surprised how many things StatsBase depends on, despite its name

Yeah, that's fair. I think the plan is for a lot of the core functionality to move into the Statistics stdlib? That being said, it's only 4 non-stdlibs and those package are minimal enough to not have further indirect dependencies. I guess my comparison is packages like Flux.jl https://juliahub.com/ui/Packages/Flux/QdkVy/0.11.1?t=1

@rofinn
Copy link
Collaborator Author

rofinn commented Oct 29, 2020

Sorry, I got confused. Is there anything in particular you wanted me to fix with this PR (e.g., use Requires.jl)?

@mcabbott
Copy link
Owner

Have just been busy, sorry. Will look properly soon.

Copy link
Owner

@mcabbott mcabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took forever.

I think this is fine but we should compact some things, by making some more loops, by shortening some bodies, and by deleting some methods.

src/statsbase.jl Outdated Show resolved Hide resolved
test/_packages.jl Outdated Show resolved Hide resolved
src/statsbase.jl Outdated Show resolved Hide resolved
src/statsbase.jl Outdated Show resolved Hide resolved
src/statsbase.jl Outdated Show resolved Hide resolved
src/statsbase.jl Outdated Show resolved Hide resolved
@rofinn
Copy link
Collaborator Author

rofinn commented Nov 16, 2020

Alright, looks like that managed to remove about 100 lines of largely duplicated code.

src/statsbase.jl Outdated Show resolved Hide resolved
src/functions.jl Outdated Show resolved Hide resolved
@mcabbott mcabbott merged commit d8e8580 into mcabbott:master Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrappers for StatsBase
2 participants