-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: A quantile function for ordinal data only #27367
Comments
How about adding an argument to |
For reference here is the relevant code. We would just add a new function that's just like |
I agree that this would be useful. This was discussed a bit in https://github.com/JuliaLang/julia/issues/19190#issuecomment-257885325, #19359 (comment), and in the very old #1333. |
Would it be so bad to change the default quantile to type 1 (lower value)? It sounds that it would simplify a lot things (type stability, quantile of ordinal data). Type 7 is also harder to extend with weights. |
Doesn't Stata default to types 6 or 2 depending on the commands (see this)? Anyway, I think we should use the same type for |
Actually yes you are right. Stata uses type 2 which does give the median as the average of the two central elements. |
To be clear, julia doesn't have implementations for all the different quantile versions described here, right? Should there be an effort to implement all 9? seems excessive, but maybe there is more demand. Maybe I should write a |
No, we only support one variant currently. I'm not sure there's a real demand to support all variants, most software only implement a few of them. But having a way to compute them would still be useful to replicate results, either in a package or in Base. |
There are a variety of types of data for which order is defined, but not other mathematical operations. It seems to be the consensus, for instance, that the
Date
type should not have+
or/
defined for it.However, if you have a vector of dates, you might still want to know the "quantiles" of those dates. If you can sort a vector of dates, because
<
is defined, you can ask "What is the 25th percentile of dates in my vector?"You can't do this with the current
quantile
function, because in the case of a tie, it finds a midpoint between the two values by taking a mean.R
's quantile function has the keyword argumentType
, and when you callquantile(x, ..., Type = 1)
it returns the lower of the two values in the case of a tie.I am currently working on a better
describe
function for returning summary statistics of aDataFrame
, and think it would be useful to return a quantile-like value for ordinal data. Unfortunately, such a function is not defined either here or inStatsBase
.quantile
is a super well-written function inBase
, being clever enough to only sort values between the minimum and maximum percentiles asked for. Writing anordinal quantile
function inStatsBase
would essentially mean re-writing thequantile
function entirely. Rather, I think it makes sense to add a new method, call itordquantile
or something that keeps everything in the currentquantile
function except for the part that takes the mean of ties, and returns the lower value instead.Does this reasoning make sense for it to live here?
The text was updated successfully, but these errors were encountered: