Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(geomean): add geomean function #6223

Merged
merged 10 commits into from
Jul 15, 2024
Merged

Conversation

prabhathc
Copy link
Contributor

@prabhathc prabhathc commented Jun 13, 2022

Add geomean function, closes #6152!

@prabhathc prabhathc changed the title feature(geomean): added geomean function + upkeep feature(geomean): added geomean function Jun 13, 2022
@prabhathc prabhathc changed the title feature(geomean): added geomean function feature(geomean): add geomean function Jun 13, 2022
@prabhathc prabhathc marked this pull request as draft June 13, 2022 00:49
@acxz acxz force-pushed the feature/geomean branch 2 times, most recently from e2c41a9 to cf2dc9a Compare June 13, 2022 01:09
@prabhathc prabhathc marked this pull request as ready for review June 13, 2022 01:13
@acxz acxz force-pushed the feature/geomean branch from cf2dc9a to 74ba473 Compare June 13, 2022 01:13
src/plots/plots.js Outdated Show resolved Hide resolved
draftlogs/6223_add.md Outdated Show resolved Hide resolved
@nicolaskruchten
Copy link
Contributor

@alexcjohnson should this be supported in other places like histfunc etc?

@acxz
Copy link
Contributor

acxz commented Jun 14, 2022

@nicolaskruchten is that referring to just geomean or the entire categoryorder attribute?

@nicolaskruchten
Copy link
Contributor

geomean: all the other aggregation functions you can sort by (sum, mean, median) are also supported in a few other places like histfunc, so it would be nice to keep that parallelism going if possible.

@acxz
Copy link
Contributor

acxz commented Jun 14, 2022

I'd be down to tackle that in another PR if you can list where else to add the geomean statistic.

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

Wondering geomean may not be the best naming as it could be point to our geo subplots.
How do you feel about geometric mean or g-mean alternatives?
Also there is harmonic mean which one may want to add. Right?
In that case something like h-mean works for me; but harmean does not :)

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!
Do you use any special flags when pushing?

@acxz
Copy link
Contributor

acxz commented Jun 14, 2022

yea, harmean doesn't sound so nice.
I don't particularly like g-mean either since it may not be obvious. I'd actually rather pref geometric_mean over g-mean.

edit: a quick goog/bing search shows the wikipedia article of Geometric mean with a search query of g-mean. I suppose, it is descriptive enough.

@acxz
Copy link
Contributor

acxz commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!

I was gonna ask you guys the same thing lol.
There were commits where the CI was triggered, but I believe this stopped when we made this PR a draft and then undid that action.

@acxz
Copy link
Contributor

acxz commented Jun 14, 2022

I can try rebasing these commits on master and force pushing to see if that triggers the CI?

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

I can try rebasing these commits on master and force pushing to see if that triggers the CI?

Force pushing is discouraged on open pull requests as it could confuse reviewers.
And your branch is only 11 commits behind upstream master. So that shouldn't be the reason.
Let's see next time you push a commit into this PR, that may trigger the tests to run on CircleCI.

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!

I was gonna ask you guys the same thing lol. There were commits where the CI was triggered, but I believe this stopped when we made this PR a draft and then undid that action.

@antoinerg any idea how we could trigger CI runs on this pull?

@alexcjohnson
Copy link
Collaborator

Wondering geomean may not be the best naming as it could be point to our geo subplots.

I see your point, though I think in context it's pretty clear. Looking around I see gmean used in scipy.stats and various npm packages, geomean used in MATLAB and Excel, and geometric_mean used in the Python stdlib statistics package. I'm comfortable leaving it as geomean but @archmoj if you feel strongly about it I'd be OK with either gmean or geometric mean (with a space, not a dash or underscore, so the full values would be geometric mean ascending or geometric mean descending)

Harmonic mean commonly refers to the inverse of the mean of inverses (see eg npm and python stdlib) so let's stay away from that. But given that we all agree harmean is unpleasant, so if we ever did add it we'd need to use hmean or harmonic mean, perhaps that argues against geomean for consistency? In that case I'm leaning toward geometric mean, clarity over terseness.

should this be supported in other places like histfunc etc?

Yes, that would be nice. I only see two more places we could add this, and one of them (the aggregate transform) is deprecated so I'm happy to ignore that. So histfunc (attribute values, bin functions, which get applied here) is really the only place this needs to be added. @nicolaskruchten am I missing anything?

Two frustrating things though:

(1) we weren't consistent about the naming - for histfunc and the aggregate transform we said avg rather than mean. I would suggest that we add mean as a synonym for avg and maybe eventually deprecate avg, then add geometric mean alongside this.

(2) The implementation for histfunc is different from that used in category ordering, in that it loops over the array only once, calculating all the quantities it needs for all the bins simultaneously (ie sums and counts), then in a separate step it loops over all the bins to complete the calculation (divide each sum by each count).

src/lib/stats.js Outdated Show resolved Hide resolved
@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

I'd vote for geometric mean too, when there is an election here :)

@nicolaskruchten
Copy link
Contributor

Two frustrating things though

Ah yeah I remember these. Well, probably reason enough not to do extra work here then :)

@acxz
Copy link
Contributor

acxz commented Jul 8, 2022

@archmoj I'm not sure how to fix the following error: https://app.circleci.com/pipelines/github/plotly/plotly.js/10836/workflows/443b723f-0ded-4052-9189-1977b51761ae/jobs/238687?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary

9) takes the geometric mean of all values per category across traces of type ohlc
     calculated data and points category ordering by value
     Error: Expected $[1][1] = 2.82842712474619 to equal 2.8284271247461903.
    at <Jasmine>
    at /tmp/test/jasmine/tests/calcdata_test.js:1057:1 <- /tmp/8028cfc1362ed8eb2b52841f495d61eb.browserify.js:283624:41

10) takes the geometric mean of all values per category across traces of type candlestick
     calculated data and points category ordering by value
     Error: Expected $[1][1] = 2.82842712474619 to equal 2.8284271247461903.
    at <Jasmine>
    at /tmp/test/jasmine/tests/calcdata_test.js:1057:1 <- /tmp/8028cfc1362ed8eb2b52841f495d61eb.browserify.js:283624:41

Also if you could tell me how to run just a specific test locally that would be great!

@acxz acxz force-pushed the feature/geomean branch from 077c485 to e49fd08 Compare July 8, 2022 17:54
@acxz acxz force-pushed the feature/geomean branch from e49fd08 to 447021b Compare July 8, 2022 17:55
@gvwilson gvwilson self-assigned this Jul 3, 2024
@archmoj
Copy link
Contributor

archmoj commented Jul 10, 2024

@prabhathc Could you please fetch upstream/master and merge master into this branch to resolve the conflicts?

@acxz
Copy link
Contributor

acxz commented Jul 10, 2024

@archmoj done, merge caused the geometric mean name to be single quoted unlike the others in aggFn. Please confirm if that is okay in terms of code style.

Note: the same test failure from 2 years ago still occurs, that I'm still not sure how to resolve
any way to relax the tolerance on the equality maybe?

@archmoj
Copy link
Contributor

archmoj commented Jul 13, 2024

To reflect the changes to the API, please run

npm run schema && git add test/plot-schema.json && git commit -m "update plot-schema diff"

@archmoj archmoj added this to the v2.34.0 milestone Jul 13, 2024
@acxz
Copy link
Contributor

acxz commented Jul 14, 2024

@archmoj done, all the tests finally pass!
Thanks for your help in pushing this PR through!

Copy link
Contributor

@archmoj archmoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done.
💃
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community community contribution feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add "geometric mean" to categorical ordering
6 participants