Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnitSimplex #102

Open
cscherrer opened this issue Oct 6, 2021 · 11 comments
Open

UnitSimplex #102

cscherrer opened this issue Oct 6, 2021 · 11 comments
Labels

Comments

@cscherrer
Copy link
Contributor

cscherrer commented Oct 6, 2021

Instead of checking that the sum is one, UnitSimplex seems to check that it's at most one:

julia> [0.3,0.6]  UnitSimplex(2)
true

julia> [0.3,0.7]  UnitSimplex(2)
true

julia> [0.3,0.8]  UnitSimplex(2)
false

It also doesn't seem to check the length (though UnitSimplex(Val(2)) does check this):

julia> [0.3,0.4,0.2]  UnitSimplex(2)
true

julia> [0.3,0.4,0.3]  UnitSimplex(2)
true

julia> [0.3,0.4,0.4]  UnitSimplex(2)
false

EDIT: I just realized the first of these is intentional. This is unusual, maybe a different use of "unit simplex" across fields? In statistics we use this for the support of Dirichlet distributions.

@cscherrer cscherrer changed the title Bug in UnitSimplex UnitSimplex Oct 6, 2021
@daanhb
Copy link
Member

daanhb commented Oct 6, 2021

Thanks for reporting. The length is definitely an issue. For the first examples, I was interpreting the simplex as a volume I guess.

@cscherrer
Copy link
Contributor Author

How would you express the "stats simplex"?

@daanhb
Copy link
Member

daanhb commented Oct 6, 2021

Depends - what is the "stats simplex" exactly? :-)

It seems you can ask for the boundary of a simplex, which in 2D currently is the full triangle, hence its boundary is a collection of edges:

julia> boundary(UnitSimplex(Val(2)))
D₃  D₁  D₂

D₁ = [-1.0, 1.0] .* (0.0..1.0 (Unit)) .+ [1.0, 0.0]
D₂ = [0.0, -1.0] .* (0.0..1.0 (Unit)) .+ [0.0, 1.0]
D₃ = [1.0, 0.0] .* (0.0..1.0 (Unit)) .+ [0.0, 0.0]

It is a bit clunky though.

@cscherrer
Copy link
Contributor Author

The simplex I have in mind is
https://en.wikipedia.org/wiki/Simplex

@cscherrer
Copy link
Contributor Author

A few nice things about that definition:

  • It's more symmetric, in that all edges have the same length. If you include the origin, edges can either be length one or sqrt(2)
  • It's useful for representing probabilities
  • It's equivalent topologically, you just need to go up a dimension

Maybe there are fields where it's standard to include the origin? I just haven't seen this formulation before

@daanhb
Copy link
Member

daanhb commented Oct 6, 2021

I didn't really give it much thought. I wanted at least one polytope in each dimension, so all others could be represented by mapping it. The simplest definition would be best. Also, I'd like a representation both of the volume and of the boundary, that also seems an issue here.

@daanhb daanhb added the 0.6 label Oct 6, 2021
@cscherrer
Copy link
Contributor Author

By "here" do you mean in the current implementation? I think this is very easy for the definition I'm used to

@daanhb
Copy link
Member

daanhb commented Oct 6, 2021

Yes, I meant the question of what to implement here in DomainSets. The difference between == 1 and <= 1 confused me, because it seems to refer to the difference between a boundary and a volume (i.e. circle versus disk), but I see now that the standard simplex you have in mind is an n-dimensional object in an (n+1) dimensional space. It does have a simple definition. From the wikipedia reference, I guess I implemented what they refer to as "corner of cube" or a "standard orthogonal simplex", which involves the origin.

Clearly the name UnitSimplex is not a good one if it leads to confusion, and changing the name or the domain is a breaking change, hence I labelled this 0.6.

In practice I'm interested in having a reference triangle that maps to any other triangle. I'm not too keen on making the reference triangle live in a higher dimensional space, as that would introduce a host of numerical issues (the map would be rectangular, and verifying membership becomes sensitive to rounding errors).

@cscherrer
Copy link
Contributor Author

Yes, rounding errors are unavoidable if you try to do in directly on embeddings. Have you ever done any work with the Stan language? It focuses on Hamiltonian Monte Carlo sampling, which can only sample over ℝⁿ. But we usually want a lot more than than, so they have all kinds of transforms. The approach has become pretty standard.

So we don't usually ask whether a parameter is in whatever manifold we're interested in, but instead have a Markov chain walk around in ℝⁿ and map it into the space we want, so it's there by construction. You do sometimes need to go the other way; in that case, I usually project to the space, maybe with some tolerance if that's an issue.

@daanhb daanhb closed this as completed in f88bca3 Oct 7, 2021
@daanhb
Copy link
Member

daanhb commented Oct 7, 2021

Okay. I'm not familiar with Stan, but the approach makes sense. In DomainSets we have in and approx_in with a tolerance, and I do try to avoid mixing them.

As for changes in DomainSets I'd like to keep the current implementation for now because it is practical, but I do wonder about its name. Even if there is just a difference in fields, it's best to avoid confusion. Perhaps it could be OrthogonalSimplex?

@dlfivefifty what is a "unit simplex" to you? See https://en.wikipedia.org/wiki/Simplex#The_standard_simplex

@daanhb daanhb reopened this Oct 7, 2021
@daanhb
Copy link
Member

daanhb commented Oct 7, 2021

(the issue was closed by my recent commit, but that only fixes the length-bug, not the naming)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants