Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile chokes on integral vectors if it has to interpolate #1333

Closed
HarlanH opened this issue Oct 4, 2012 · 11 comments
Closed

quantile chokes on integral vectors if it has to interpolate #1333

HarlanH opened this issue Oct 4, 2012 · 11 comments

Comments

@HarlanH
Copy link
Contributor

HarlanH commented Oct 4, 2012

julia> quantile([1,2,3,4.], .5)
2.5

julia> quantile([1,2,3,4], .5)
InexactError()
 in assign at array.jl:493
 in quantile at statistics.jl:361
 in quantile at statistics.jl:368

Anyone want to argue the correct behavior? Return a float? Round to the nearest integer? Fail with a nicer error message?

@StefanKarpinski
Copy link
Member

Oo. This is a bit of a tough call. If it rounds to a float then you'd want the answer to always be a float, which is kind of a bummer.

@HarlanH
Copy link
Contributor Author

HarlanH commented Oct 4, 2012

Yep. Boost gives several options, defaulting to using ceiling for quantiles
above 50% and floor for quantiles below 50%:
http://www.boost.org/doc/libs/1_51_0/libs/math/doc/sf_and_dist/html/math_toolkit/policy/pol_ref/discrete_quant_ref.html

I do agree that the type of the argument should be preserved, so that float
quantiles return floats and integer quantiles return integers. So, the
equivalent of Boost's "real" would just be quantile(float(x), .5) --
although that won't be as fast. Then we could have an optional argument
that can be one of the iround/iceil/ifloor functions, which would cover
those three of Boost's options. Not sure how to specify the other two --
round_outwards and round_inwards options -- as their behavior depends on
the value(s) of the quantiles. Any clever ideas that avoid an enum or a
string parameter?

On Wed, Oct 3, 2012 at 11:25 PM, Stefan Karpinski
[email protected]:

Oo. This is a bit of a tough call. If it rounds to a float then you'd want
the answer to always be a float, which is kind of a bummer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9129593.

@nolta
Copy link
Member

nolta commented Oct 4, 2012

Since we've implemented quantile variant #7, i think by definition we have to return 2.5.

@HarlanH
Copy link
Contributor Author

HarlanH commented Oct 4, 2012

The problem is that "variant 7" as described by R doesn't apply in a
language like Julia that doesn't convert most numbers to floats. As Stefan
says, we don't want to be returning different types from quantile()
depending on the values passed in. So we need to either say that
quantile(Number) always returns Float, which would keep the existing
behavior but force the return type, or write quantile(Float)->Float and
quantile(Int)->Int with different behavior.

On Thu, Oct 4, 2012 at 11:37 AM, Mike Nolta [email protected]:

Since we've implemented quantile variant #7#7,
i think by definition we have to return 2.5.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9145763.

@nolta
Copy link
Member

nolta commented Oct 4, 2012

The problem is that "variant 7" as described by R doesn't apply in a
language like Julia that doesn't convert most numbers to floats.

What? Sure it does. It's just math.

... or write quantile(Float)->Float and quantile(Int)->Int with different behavior.

That's just too weird and magical.

If we want quantile to return the same type as the input, we should switch to variant 1 (like Mathematica) or variant 3.

@HarlanH
Copy link
Contributor Author

HarlanH commented Oct 4, 2012

Would it make sense for quantile to return variant 1 for Integer arguments
and variant 7 for Floats? I don't see why we have to be consistent for data
of different types. To me, it'd be more than a little weird for the median
of a set of integers to be a float.

On Thu, Oct 4, 2012 at 12:07 PM, Mike Nolta [email protected]:

The problem is that "variant 7" as described by R doesn't apply in a
language like Julia that doesn't convert most numbers to floats.

What? Sure it does. It's just math.

... or write quantile(Float)->Float and quantile(Int)->Int with different
behavior.

That's just too weird and magical.

If we want quantile to return the same type as the input, we should switch
to variant 1 (like Mathematica) or variant 3.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9146890.

@nolta
Copy link
Member

nolta commented Oct 4, 2012

To me, it'd be more than a little weird for the median of a set of integers to be a float.

Wait, what?

julia> median([1,2,3,4])
2.5

@HarlanH
Copy link
Contributor Author

HarlanH commented Oct 4, 2012

Someone else is going to have to decide what the proper behavior is here!
Perhaps my intuitions are uncommon! I would have thought that since
integers aren't closed under division, that a definition of median that
uses division will have to round, to keep the results in the same domain as
the input.

Another idea would be to have separate quantile() and iquantile()
functions, where the former returns floats for integer input and the latter
(which only works on an integer domain) always returns an int. This would
sorta reflect the way the round() and iround() functions work.

Incidentally, this works nicely, preserving input type:

julia> x=[1//2, 2//3, 6//11, 9//102]
4-element Rational{Int64} Array:
1//2
2//3
6//11
3//34

julia> quantile(x, .5)
23//44

julia> typeof(ans)
Rational{Int64}

On Thu, Oct 4, 2012 at 12:42 PM, Mike Nolta [email protected]:

To me, it'd be more than a little weird for the median of a set of
integers to be a float.

Wait, what?

julia> median([1,2,3,4])
2.5


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9148171.

@johnmyleswhite
Copy link
Member

I'm reasonably sure that the definition of the median in most textbooks is consistent with Mike's assertion, which only guarantees closure for the rationals.

-- John

On Oct 4, 2012, at 12:56 PM, Harlan Harris wrote:

Someone else is going to have to decide what the proper behavior is here!
Perhaps my intuitions are uncommon! I would have thought that since
integers aren't closed under division, that a definition of median that
uses division will have to round, to keep the results in the same domain as
the input.

Another idea would be to have separate quantile() and iquantile()
functions, where the former returns floats for integer input and the latter
(which only works on an integer domain) always returns an int. This would
sorta reflect the way the round() and iround() functions work.

Incidentally, this works nicely, preserving input type:

julia> x=[1//2, 2//3, 6//11, 9//102]
4-element Rational{Int64} Array:
1//2
2//3
6//11
3//34

julia> quantile(x, .5)
23//44

julia> typeof(ans)
Rational{Int64}

On Thu, Oct 4, 2012 at 12:42 PM, Mike Nolta [email protected]:

To me, it'd be more than a little weird for the median of a set of
integers to be a float.

Wait, what?

julia> median([1,2,3,4])
2.5


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9148171.


Reply to this email directly or view it on GitHub.

@HarlanH
Copy link
Contributor Author

HarlanH commented Oct 5, 2012

Do we have a consensus then that the proper course of action is to
special-case integers so that the returned quantile values are floats? Are
there other types that we should be worried about? Complex numbers aren't
ordered, so they won't work, of course.

On Thu, Oct 4, 2012 at 1:22 PM, John Myles White
[email protected]:

I'm reasonably sure that the definition of the median in most textbooks is
consistent with Mike's assertion, which only guarantees closure for the
rationals.

-- John

On Oct 4, 2012, at 12:56 PM, Harlan Harris wrote:

Someone else is going to have to decide what the proper behavior is
here!
Perhaps my intuitions are uncommon! I would have thought that since
integers aren't closed under division, that a definition of median that
uses division will have to round, to keep the results in the same domain
as
the input.

Another idea would be to have separate quantile() and iquantile()
functions, where the former returns floats for integer input and the
latter
(which only works on an integer domain) always returns an int. This
would
sorta reflect the way the round() and iround() functions work.

Incidentally, this works nicely, preserving input type:

julia> x=[1//2, 2//3, 6//11, 9//102]
4-element Rational{Int64} Array:
1//2
2//3
6//11
3//34

julia> quantile(x, .5)
23//44

julia> typeof(ans)
Rational{Int64}

On Thu, Oct 4, 2012 at 12:42 PM, Mike Nolta [email protected]:

To me, it'd be more than a little weird for the median of a set of
integers to be a float.

Wait, what?

julia> median([1,2,3,4])
2.5


Reply to this email directly or view it on GitHub<
https://github.com/JuliaLang/julia/issues/1333#issuecomment-9148171>.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9149548.

@johnmyleswhite
Copy link
Member

My initial reaction is that there shouldn't be special cases: everything that's a subtype of Real should return a floating point answer as the median. While the rationals may be closed under this operation, it seems like that's the special case.

Maybe a good way to reason about this is to think carefully about how one wants a quantile() function to behave, since median is just a special case. My instinct is that you can only put in types that derive from Real and you always get a floating point back out. I believe the implementations in distributions.jl behave this way because Rmath does.

Do other languages actually define medians over any totally ordered set that's not the reals? My assumption is no, but I don't really know.

-- John

On Oct 5, 2012, at 11:39 AM, Harlan Harris wrote:

Do we have a consensus then that the proper course of action is to
special-case integers so that the returned quantile values are floats? Are
there other types that we should be worried about? Complex numbers aren't
ordered, so they won't work, of course.

On Thu, Oct 4, 2012 at 1:22 PM, John Myles White
[email protected]:

I'm reasonably sure that the definition of the median in most textbooks is
consistent with Mike's assertion, which only guarantees closure for the
rationals.

-- John

On Oct 4, 2012, at 12:56 PM, Harlan Harris wrote:

Someone else is going to have to decide what the proper behavior is
here!
Perhaps my intuitions are uncommon! I would have thought that since
integers aren't closed under division, that a definition of median that
uses division will have to round, to keep the results in the same domain
as
the input.

Another idea would be to have separate quantile() and iquantile()
functions, where the former returns floats for integer input and the
latter
(which only works on an integer domain) always returns an int. This
would
sorta reflect the way the round() and iround() functions work.

Incidentally, this works nicely, preserving input type:

julia> x=[1//2, 2//3, 6//11, 9//102]
4-element Rational{Int64} Array:
1//2
2//3
6//11
3//34

julia> quantile(x, .5)
23//44

julia> typeof(ans)
Rational{Int64}

On Thu, Oct 4, 2012 at 12:42 PM, Mike Nolta [email protected]:

To me, it'd be more than a little weird for the median of a set of
integers to be a float.

Wait, what?

julia> median([1,2,3,4])
2.5


Reply to this email directly or view it on GitHub<
https://github.com/JuliaLang/julia/issues/1333#issuecomment-9148171>.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1333#issuecomment-9149548.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants