You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Semidefinite programs are challenging to solve and so it's crucial to try to exploit as much structure as possible to solve them.
This makes it challenging when writing a generic SDP solver since it seems tailored implementation would always be faster.
That's where the extensibility of MOI/JuMP can actually be really helpful.
One particular structure is that the matrices A_i in the SDPA format are actually low-rank.
This is the case for instance in the MAX-CUT problem where they have rank 1 and in Sum-of-Squares (in particular with specific bases such as the interpolation basis).
Some solvers already support already supports this:
If we do have it part of the interface, the support might also appear in other solvers as well (@mariohsouto and @joaquimg told me it would be easy to add in ProxSDP.jl for instance).
This is actually supported by YALMIP as detailed in https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf. YALMIP even does not compute matrix products and keep then in symbolic form so that the user does not even have to specify the structure. But let's assume that the user give this explicitly for now.
I guess two important questions to see if it's worth it are:
To be fair, it should be pointed out that the theoretical idea exploited in this paper was incorporated already in LMI Lab (Gahinet et al. 1995), which was one of the first public implementations of an SDP solver, tailored specifically for control problems. However, many years of active research in the SDP field has led to much more efficient algorithms, and it is our goal to leverage on this. A major reason for pursuing the idea in this paper is that it is slightly embarrassing for people coming from the SDP field, that the now over 15 year old LMI Lab solver actually outperforms state-of-the-art general purpose SDP solvers on some small- to medium-scale control problems.
About 2., tl;dr; I detail below different natural ideas and the pros and cons, a shorter version is that #2198 is the one I think we should go for
We have two options, defining a new set or defining a new function. We want to avoid defining new functions as much as possible so let's first try by defining a new set.
A first possibility is to merge the PSD set and the low rank constraints together.
So we could have the constraint that sum_i <A_i, X_i> = b_i where <., .> is the Frobenius inner product of symmetric matrices and A_i are in LDLT form.
X = Vector{MOI.VariableIndex}[]
for...
Xi, cXi = MOI.add_constrained_variables(model, MOI.PositiveSemidefiniteConeTriangle(...))
push!(X, Xi)
end
MOI.add_constraint(model, MOI.VectorOfVariables(reduce(vcat, X)), MOI.LowRankFrobeniusEqualTo(A, 1.0))
And then of course, there would be a bridge from F-in-LowRankFrobeniusEqualTo into F-in-EqualTo.
There are a few things to think about here.
What if the user gives an invalid vector of variables ?
The vector of variables need to be a concatenation of PSD matrices of the right size for solvers to work.
If a user gives a vector of variables that isn't like that, we should bridge it into EqualTo so that the solver does work but there is no way to see that from the constraint type so we won't be able to see that when bridging and we will pass the constraint as is to the solver.
I think here we should simply not complicate things for a use case that does not exists. The users is not allowed to use non-SDP variables in this set similarly to the Indicator case where the first variable has to be binary. If he does not do that, he should expect an error.
What about variable bridges ?
If a variable of the VectorOfVariables get variable-bridged then it will turn into a VectorAffineFunction which won't be supported anymore by the solver so it will then turn into a ScalarAffineFunction-in-EqualTo.
This is actually a good thing because it means the vector of variables wasn't a concatenation of PSD variables so no worry to have on this side.
What if we dualize ?
We of course want this to work with Dualization so what's the dual of this set ? Just like EqualTo, this is not a cone so it does not really have a dual. However, we can make it work by implementing Dualization._dual_set so that one variable is added to the dual and then in MOI.Utilities.dot_coefficients, we can multiply this variables by the matrices A_i, expressed in symmetric-vectorized for (without the low-rank structure) so that it would be added in the VectorAffineFunction-in-PSDCone along with the dual of the other constraints of each PSD variable. The sad part of this story is that we lost the low-rank structure.
In the dual, to keep the low-rank structure, we would need something like (assuming the PSD matrix only appears on LowRankFrobeniusEqualTo constraints and not in any EqualTo constraint)
In the dual, to keep the low-rank structure, we would need something like
sum y_i A_i is PSD
where the A_i are low-rank.
The sum y_i A_i part would be represented by a VectorAffineFunction but how do we represent the sum z_i B_i ?
Maybe with a set
Wait, isn't that a convex cone ? So why don't we use the dual of that in standard conic form so that it's easy to dualize ?
Ok, so according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S)).
So here we have M(y) = sum z_i A_i and <M(y), Q> = sum y_i <A_i, Q> + <x, Q> = <y, (<A_1, Q>, ..., <A_m, Q>)) so the dual set is
which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>) where Q is PSD.
Here, that's preventing Q to be used as scalar product with something else than low-rank equality constraints because Q is not part of the set itself!
If we want to allow Q to be used in classical ScalarAffineFunction-in-Equalto then we need to consider that the dual will become
sum y_i A_i + sum z_i B_i is PSD
where the A_i are low-rank and the B_i are not low-rank.
The sum z_i B_i part would be represented by a VectorAffineFunction but how do we represent the sum z_i B_i` ?
Maybe with a set
Where the first length(matrices) dimensions would be y and the last dimensions would be sum z_i B_i.
Wait, so the first dimensions should be VectorOfVariables and the last ones should be a VectorAffineFunction, isn't that an issue ? No, we are already doing that for IndicatorSet.
Again, according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S)).
So here we have M(y, x) = x + sum y_i A_i and <M(y, x), Q> = sum y_i <A_i, Q> + <x, Q> = <(y, x), ((<A_i, Q>)_i, Q) so the set is
which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>, Q) where Q is PSD. Note that this time we include Q at the end.
So I think there are two options here.
The first one is the pair of dual cones PlusSumLowRankPSD and LowRankFrobeniusAndPSD. It's good because it also allows Q to be used outside but if a solver only supports only matrices with pure low-rank or matrices with no low-rank then it won't know how to communicate it with supports_constraint.
The second one LowRankFrobeniusOfPSD and SumLowRankPSD is simpler and allows solvers that only support that to communicate it but we should check which solver is in this case and also that does not really allow Q to be used in the objective. So even the MAX-CUT problem which has n rank-1 constraints on one nxn PSD matrix and a linear objective wouldn't fall into that category because of the objective. The dual version of this is that the symmetric matrix of the objective needs to be in the second part of the constants field of the VectorAffineFunction in the PlusSumLowRankPSD constraint.
So I would probably start experimenting with the first option and see what happens.
The text was updated successfully, but these errors were encountered:
Semidefinite programs are challenging to solve and so it's crucial to try to exploit as much structure as possible to solve them.
This makes it challenging when writing a generic SDP solver since it seems tailored implementation would always be faster.
That's where the extensibility of MOI/JuMP can actually be really helpful.
One particular structure is that the matrices
A_i
in the SDPA format are actually low-rank.This is the case for instance in the MAX-CUT problem where they have rank 1 and in Sum-of-Squares (in particular with specific bases such as the interpolation basis).
Some solvers already support already supports this:
LinearAlgebra.I
asA[i]
If we do have it part of the interface, the support might also appear in other solvers as well (@mariohsouto and @joaquimg told me it would be easy to add in ProxSDP.jl for instance).
This is actually supported by YALMIP as detailed in https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf. YALMIP even does not compute matrix products and keep then in symbolic form so that the user does not even have to specify the structure. But let's assume that the user give this explicitly for now.
I guess two important questions to see if it's worth it are:
I'll try to convince you that it's "Yes and Yes".
About 1., it is improving performance quite a lot and it's been reported in numerous places. For instance, https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf says:
About 2., tl;dr; I detail below different natural ideas and the pros and cons, a shorter version is that #2198 is the one I think we should go for
We have two options, defining a new set or defining a new function. We want to avoid defining new functions as much as possible so let's first try by defining a new set.
A first possibility is to merge the PSD set and the low rank constraints together.
So we could have the constraint that
sum_i <A_i, X_i> = b_i
where<., .>
is the Frobenius inner product of symmetric matrices andA_i
are in LDLT form.And then you would do
And then of course, there would be a bridge from
F
-in-LowRankFrobeniusEqualTo
intoF
-in-EqualTo
.There are a few things to think about here.
What if the user gives an invalid vector of variables ?
The vector of variables need to be a concatenation of PSD matrices of the right size for solvers to work.
If a user gives a vector of variables that isn't like that, we should bridge it into
EqualTo
so that the solver does work but there is no way to see that from the constraint type so we won't be able to see that when bridging and we will pass the constraint as is to the solver.I think here we should simply not complicate things for a use case that does not exists. The users is not allowed to use non-SDP variables in this set similarly to the
Indicator
case where the first variable has to be binary. If he does not do that, he should expect an error.What about variable bridges ?
If a variable of the
VectorOfVariables
get variable-bridged then it will turn into aVectorAffineFunction
which won't be supported anymore by the solver so it will then turn into aScalarAffineFunction
-in-EqualTo
.This is actually a good thing because it means the vector of variables wasn't a concatenation of PSD variables so no worry to have on this side.
What if we dualize ?
We of course want this to work with
Dualization
so what's the dual of this set ? Just likeEqualTo
, this is not a cone so it does not really have a dual. However, we can make it work by implementingDualization._dual_set
so that one variable is added to the dual and then inMOI.Utilities.dot_coefficients
, we can multiply this variables by the matricesA_i
, expressed in symmetric-vectorized for (without the low-rank structure) so that it would be added in theVectorAffineFunction
-in-PSDCone
along with the dual of the other constraints of each PSD variable. The sad part of this story is that we lost the low-rank structure.In the dual, to keep the low-rank structure, we would need something like (assuming the PSD matrix only appears on
LowRankFrobeniusEqualTo
constraints and not in anyEqualTo
constraint)In the dual, to keep the low-rank structure, we would need something like
where the
A_i
are low-rank.The
sum y_i A_i
part would be represented by aVectorAffineFunction
but how do we represent thesum z_i B_i
?Maybe with a set
Wait, isn't that a convex cone ? So why don't we use the dual of that in standard conic form so that it's easy to dualize ?
Ok, so according to [Rockafellar "Convex Analysis", Corollary 16.3.2],
dual(adjoint(M), S) = preimage(M, dual(S))
.So here we have
M(y) = sum z_i A_i
and<M(y), Q> = sum y_i <A_i, Q> + <x, Q> = <y, (<A_1, Q>, ..., <A_m, Q>))
so the dual set iswhich is defined as the set of vectors of the form
(<matrices[1], Q>, ..., <matrices[m], Q>)
whereQ
is PSD.Here, that's preventing
Q
to be used as scalar product with something else than low-rank equality constraints becauseQ
is not part of the set itself!If we want to allow
Q
to be used in classicalScalarAffineFunction
-in-Equalto
then we need to consider that the dual will becomewhere the
A_i
are low-rank and theB_i
are not low-rank.The
sum z_i B_i
part would be represented by aVectorAffineFunction but how do we represent the
sum z_i B_i` ?Maybe with a set
Where the first
length(matrices)
dimensions would bey
and the last dimensions would besum z_i B_i
.Wait, so the first dimensions should be
VectorOfVariables
and the last ones should be aVectorAffineFunction
, isn't that an issue ? No, we are already doing that forIndicatorSet
.Again, according to [Rockafellar "Convex Analysis", Corollary 16.3.2],
dual(adjoint(M), S) = preimage(M, dual(S))
.So here we have
M(y, x) = x + sum y_i A_i
and<M(y, x), Q> = sum y_i <A_i, Q> + <x, Q> = <(y, x), ((<A_i, Q>)_i, Q)
so the set iswhich is defined as the set of vectors of the form
(<matrices[1], Q>, ..., <matrices[m], Q>, Q)
whereQ
is PSD. Note that this time we includeQ
at the end.So I think there are two options here.
PlusSumLowRankPSD
andLowRankFrobeniusAndPSD
. It's good because it also allowsQ
to be used outside but if a solver only supports only matrices with pure low-rank or matrices with no low-rank then it won't know how to communicate it withsupports_constraint
.LowRankFrobeniusOfPSD
andSumLowRankPSD
is simpler and allows solvers that only support that to communicate it but we should check which solver is in this case and also that does not really allowQ
to be used in the objective. So even the MAX-CUT problem which hasn
rank-1 constraints on onen
xn
PSD matrix and a linear objective wouldn't fall into that category because of the objective. The dual version of this is that the symmetric matrix of the objective needs to be in the second part of theconstants
field of theVectorAffineFunction
in thePlusSumLowRankPSD
constraint.So I would probably start experimenting with the first option and see what happens.
The text was updated successfully, but these errors were encountered: