-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Semidefinite programs are challenging to solve and so it's crucial to try to exploit as much structure as possible to solve them.
This makes it challenging when writing a generic SDP solver since it seems tailored implementation would always be faster.
That's where the extensibility of MOI/JuMP can actually be really helpful.
One particular structure is that the matrices A_i
in the SDPA format are actually low-rank.
This is the case for instance in the MAX-CUT problem where they have rank 1 and in Sum-of-Squares (in particular with specific bases such as the interpolation basis).
Some solvers already support already supports this:
- https://github.com/jump-dev/DSDP.jl
- https://github.com/davidsd/sdpb
- https://github.com/jump-dev/SDPT3.jl (this https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf mentions that you might have to use a STRUL extension with it, not sure what's the status now)
- https://github.com/nanleij/ClusteredLowRankSolver.jl/ (@nanleij)
- https://github.com/kocvara/Loraine.m for which the pure-Julia implementation is soon to be open sourced (@kocvara)
- https://github.com/alpyurtsever/SketchyCGAL supports constraints with the trace which can be expresssed by using
LinearAlgebra.I
asA[i]
If we do have it part of the interface, the support might also appear in other solvers as well (@mariohsouto and @joaquimg told me it would be easy to add in ProxSDP.jl for instance).
This is actually supported by YALMIP as detailed in https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf. YALMIP even does not compute matrix products and keep then in symbolic form so that the user does not even have to specify the structure. But let's assume that the user give this explicitly for now.
I guess two important questions to see if it's worth it are:
- Is that improving the performance a lot ?
- Is that easy to add support for this ?
I'll try to convince you that it's "Yes and Yes".
About 1., it is improving performance quite a lot and it's been reported in numerous places. For instance, https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf says:
To be fair, it should be pointed out that the theoretical idea exploited in this paper was incorporated already in LMI Lab (Gahinet et al. 1995), which was one of the first public implementations of an SDP solver, tailored specifically for control problems. However, many years of active research in the SDP field has led to much more efficient algorithms, and it is our goal to leverage on this. A major reason for pursuing the idea in this paper is that it is slightly embarrassing for people coming from the SDP field, that the now over 15 year old LMI Lab solver actually outperforms state-of-the-art general purpose SDP solvers on some small- to medium-scale control problems.
About 2., tl;dr; I detail below different natural ideas and the pros and cons, a shorter version is that #2198 is the one I think we should go for
We have two options, defining a new set or defining a new function. We want to avoid defining new functions as much as possible so let's first try by defining a new set.
A first possibility is to merge the PSD set and the low rank constraints together.
So we could have the constraint that sum_i <A_i, X_i> = b_i
where <., .>
is the Frobenius inner product of symmetric matrices and A_i
are in LDLT form.
struct LDLT{T}
L::Matrix{T}
diagonal::Vector{T}
end
struct LowRankFrobeniusEqualTo{T}
matrices::Vector{LDLT{T}}
constant::T
end
And then you would do
X = Vector{MOI.VariableIndex}[]
for ...
Xi, cXi = MOI.add_constrained_variables(model, MOI.PositiveSemidefiniteConeTriangle(...))
push!(X, Xi)
end
MOI.add_constraint(model, MOI.VectorOfVariables(reduce(vcat, X)), MOI.LowRankFrobeniusEqualTo(A, 1.0))
And then of course, there would be a bridge from F
-in-LowRankFrobeniusEqualTo
into F
-in-EqualTo
.
There are a few things to think about here.
What if the user gives an invalid vector of variables ?
The vector of variables need to be a concatenation of PSD matrices of the right size for solvers to work.
If a user gives a vector of variables that isn't like that, we should bridge it into EqualTo
so that the solver does work but there is no way to see that from the constraint type so we won't be able to see that when bridging and we will pass the constraint as is to the solver.
I think here we should simply not complicate things for a use case that does not exists. The users is not allowed to use non-SDP variables in this set similarly to the Indicator
case where the first variable has to be binary. If he does not do that, he should expect an error.
What about variable bridges ?
If a variable of the VectorOfVariables
get variable-bridged then it will turn into a VectorAffineFunction
which won't be supported anymore by the solver so it will then turn into a ScalarAffineFunction
-in-EqualTo
.
This is actually a good thing because it means the vector of variables wasn't a concatenation of PSD variables so no worry to have on this side.
What if we dualize ?
We of course want this to work with Dualization
so what's the dual of this set ? Just like EqualTo
, this is not a cone so it does not really have a dual. However, we can make it work by implementing Dualization._dual_set
so that one variable is added to the dual and then in MOI.Utilities.dot_coefficients
, we can multiply this variables by the matrices A_i
, expressed in symmetric-vectorized for (without the low-rank structure) so that it would be added in the VectorAffineFunction
-in-PSDCone
along with the dual of the other constraints of each PSD variable. The sad part of this story is that we lost the low-rank structure.
In the dual, to keep the low-rank structure, we would need something like (assuming the PSD matrix only appears on LowRankFrobeniusEqualTo
constraints and not in any EqualTo
constraint)
In the dual, to keep the low-rank structure, we would need something like
sum y_i A_i is PSD
where the A_i
are low-rank.
The sum y_i A_i
part would be represented by a VectorAffineFunction
but how do we represent the sum z_i B_i
?
Maybe with a set
struct SumLowRankPSD
side_dimension::Int
matrices::Vector{LDLT{T}}
end
Wait, isn't that a convex cone ? So why don't we use the dual of that in standard conic form so that it's easy to dualize ?
Ok, so according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S))
.
So here we have M(y) = sum z_i A_i
and <M(y), Q> = sum y_i <A_i, Q> + <x, Q> = <y, (<A_1, Q>, ..., <A_m, Q>))
so the dual set is
struct LowRankFrobeniusOfPSD
side_dimension::Int
matrices::Vector{LDLT{T}}
end
which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>)
where Q
is PSD.
Here, that's preventing Q
to be used as scalar product with something else than low-rank equality constraints because Q
is not part of the set itself!
If we want to allow Q
to be used in classical ScalarAffineFunction
-in-Equalto
then we need to consider that the dual will become
sum y_i A_i + sum z_i B_i is PSD
where the A_i
are low-rank and the B_i
are not low-rank.
The sum z_i B_i
part would be represented by a VectorAffineFunction but how do we represent the
sum z_i B_i` ?
Maybe with a set
struct PlusSumLowRankPSD
side_dimension::Int
matrices::Vector{LDLT{T}}
end
Where the first length(matrices)
dimensions would be y
and the last dimensions would be sum z_i B_i
.
Wait, so the first dimensions should be VectorOfVariables
and the last ones should be a VectorAffineFunction
, isn't that an issue ? No, we are already doing that for IndicatorSet
.
Again, according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S))
.
So here we have M(y, x) = x + sum y_i A_i
and <M(y, x), Q> = sum y_i <A_i, Q> + <x, Q> = <(y, x), ((<A_i, Q>)_i, Q)
so the set is
struct LowRankFrobeniusAndPSD
side_dimension::Int
matrices::Vector{LDLT{T}}
end
which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>, Q)
where Q
is PSD. Note that this time we include Q
at the end.
So I think there are two options here.
- The first one is the pair of dual cones
PlusSumLowRankPSD
andLowRankFrobeniusAndPSD
. It's good because it also allowsQ
to be used outside but if a solver only supports only matrices with pure low-rank or matrices with no low-rank then it won't know how to communicate it withsupports_constraint
. - The second one
LowRankFrobeniusOfPSD
andSumLowRankPSD
is simpler and allows solvers that only support that to communicate it but we should check which solver is in this case and also that does not really allowQ
to be used in the objective. So even the MAX-CUT problem which hasn
rank-1 constraints on onen
xn
PSD matrix and a linear objective wouldn't fall into that category because of the objective. The dual version of this is that the symmetric matrix of the objective needs to be in the second part of theconstants
field of theVectorAffineFunction
in thePlusSumLowRankPSD
constraint.
So I would probably start experimenting with the first option and see what happens.