0x043 Convex Optimization
1. Convex Sets
1.1. Affine Set
Definition (affine set) A set \(C \subseteq R^n\) is affine if for any \(x_1, x_2 \in C\), then \(\theta x_1 + (1-\theta)x_2 \in C\) where \(\theta \in R\)
note: Affine set is a subspace plus an offset
Definition (affine hull) The affine hull of \(C\) is the affine combination of all points in \(C\). The affine dimension of \(C\) is the dimension of its affine hull, which is the dimension of the underlying subspace
solution set of linear equations
The solution set of a system of linear equations is an affine set
hyperplane
A hyperplane is also a solution set of linear equation, therefore affine set
1.2. Convex Set
Definition (convex set) A set \(C\) is convex if for any \(x1,x2 \in C\), and any $0 \leq \theta \leq 1 $, then
Note the difference between affine set and convex set is the range of \(\theta\)
Definition (convex hull) The convex hull of \(C\) is the convex combination of all points in \(C\)
halfspace
A halfspace is a convex set
ball and ellipsoid
A norm ball is a convex set
A lreated family are ellipsoids, they are also convex
polyhedra and simplex
A polyhedron is defined as the solution set of a finite number of halfspaces and hyperplanes, therefore convex
Simplexes are an important family of polyhedra, it is defined with \(k+1\) affinely independent points \(v_0, ..., v_k\) (i.e. : \(v_1-v_0, v_2-v_0, ...\) are independent), the simplex is given by their convex hull
1.3. Operations that preserve convexity
Proposition (Intersection preserves convexity) The intersection of any number of convex sets is a convex set
For example, polyhedron is the intersection of halfspaces and hyperplanes
positive semidefinite cone
positive semidefinite cone can be expressed
which is the intersection of infinite number of halfspaces $ { X \in S^n | z^T X z \geq 0 }$, and so is convex
Proposition (convex set and halfspaces) a closed convex set \(S\) is the intersection of all halfspaces that contain it
Proposition (Affine function preserves convexity) \(y = Ax + b\) and its reverse preserves convexity, which means suppose \(S \subset R^n\) is convex and \(f: R^n \to R^m\) is an affine function, then the image and reverse image
are both convex
limear matrix inequality
Solution set of linear matrix inequality (LMI)
is convex because it is the inverse image of the positive semidefinite cone under the affine function \(f(x) = B - A(x)\)
Definition (perspective function) The perspective function \(P: R^{n+1} \to R^n\) with the domain \(R^{n} \times R_}\) is
Proposition (perspective function preserves convexity) If \(C \subset \text{dom}(P)\) is convex, then its image is convex
If \(C \subset R^n\) is convex, then the inverse image is also convex
Definition (linear-fractional function) The linear fractional function is the composition of perspective function with an affine function
\(\(f(x) = \frac{Ax + b}{c^Tx + d}\)\) where \(\text{dom }f = \{x | c^T x + d > 0 \}\)
Affine function and linear function can be thought as the special cases of linear-fraction by setting \(c=0\)
Proposition (Linear-fractional and perspective function preserve convexity) The perspective functions, linear-fractional functions preserve convexity
1.4. Cone Set
Definition (cone) A set is cone if for every \(x \in C, \theta \geq 0\) then \(\theta x \in C\).
Definition (convex cone) A set is called convex cone if it is convex and cone
norm cones
The norm cone is convex cone
A special case of norm cone is the second order cone (also known as the Lorentz cone or ice-cream cone)
Definition (conic hull) The conic hull of \(C\) is the conic combination of all points in \(C\)
Definition (proper cone) A cone \(K \subset R^n\) is called a proper cone if it satisfies the following:
- convex
- closed
- solid (has noempty interiors)
- pointed (contains no line)
positive semidefinite cone
The set of symmetric positive semidefinite matrices is a proper cone
Definition (normal cone) The normal cone of a set \(C\) at a boundary point \(x_0\) denoted by \(N_C(x_0)\) such that
1.5. Generalized Inequalities
Definition (generalized inequality) A generalized inequality defined with respect to a proper cone \(K\) is a partial ordering on \(R^n\)
The following two generalized inequalities are commonly used
nonnegative orthant and componentwise inequality
The nonnegative orthant \(K=R^n_{+}\) is a proper cone. The associated generalized inequality \(\preceq_K\) is the componentwise inequality \((\forall{i})x_i \leq y_i\)
positive semidefinite cone and matrix inequality
The positive semidefinite cone is a proper cone. The associated inequality \(\preceq_K\) means
Definition (minimum, maximum) \(x \in S\) is the minimum element of \(S\) with respect to \(\preceq_{K}\) if for all \(y \in S\), \(x \preceq_{K} y\)
Definition (minimal, maximal) \(x \in S\) is the minimal element of \(S\) with respect to \(\preceq_{K}\) if $y \preceq_{K} x $, then \(y = x\)
1.6. Seperating and Supporting Hyperplanes
Theorem (separating hyperplane theorem) Let \(A,B\) be two disjoint convex set in \(R^n\), there exists a nonzero vector \(v\)
and scalar \(c\), such that for all \(x \in A, y \in B\),
The converse, however, is not necessarily true: the existence of a separating hyperplane implies \(A,B\) are disjoint.
We need additional constraints to make it true, for example, any two convex sets \(A,B\), at least one of which is open, are disjoint iff there exists a separating hyperplane.
Theorem (supporting hyperplane theorem) For a non-empty convex set \(C\) and its boundary point \(x_0\), there exists a supporting hyperplane to \(C\) at \(x_0\)
1.7. Dual Cones
Definition (dual cone) The following set \(K^{*}\) is called the dual cone with respect to cone \(K\) convex
Notice the similarity to the orthogonal space \(K^\perp = \{ y | y^Tx = 0 (\forall{x \in K}) \}\)
Lemma (dual cone is convex) Dual cone is a convex cone even \(K\) is neither convex nor cone
Again notice the similarity to the conjugate operation (i.e: conjugated function will be convex even the original is not)
dual cone of subspace
The dual cone of a subspace \(V\) is its orthogonal complement
dual cone of PSD cone
With the standard inner product over matrix
The positive semidefinite cone \(S^n_{+}\) is self-dual
dual cone of norm cone
The dual of norm cone \(K=\{ (x,t) | ||x|| \leq t \}\) is the cone defined by the dual norm
where the dual norm is defined by
Notice the similarity between the dual norm and matrix norm. Dual norm is a measure of size for a continuous linear function defined on a normed vector space.
Definition (dual generalized inequality) Suppose cone \(K\) is proper, then the dual cone \(K^*\) is also proper which induces the generalized inequality \(\prec_{K^*}\)
Lemma (dual generalized inequalities preserves order)
Criterion (dual characterization of minimum) \(x\) is the minimum element of \(S\) with respect to \(K\) iff for every \(\lambda \prec_{K^*} 0\), \(x\) is the unique minimizer of \(\lambda^T z\) over \(z \in S\)
Criterion (dual characterization of minimal) \(x\) is the minimal element of \(S\) with respect to \(K\) iff \(x\) minimizes a \(\lambda^T x\) for a particular \(\lambda \prec_{K^{*}} 0\)
2. Convex Functions
2.1. Basic Properties and Examples
Definition (Convex function) A function \(f: R^n \to R\) is convex if its domain is a convex set and for all \(x, y \in dom(f), \theta \in [0, 1]\) we have
Note that, convexity of the domain is a prerequisite for convexity of a function. When we call a function convex, we imply that its domain is convex
A function is strictly convex if strict inequality holds whenever $x \neq y \land \theta \in (0, 1) $
Similarly, a function \(f\) is (strictly) concave iff \(-f\) is (strictly) convex
affine function
An affine function is a convex function whose form is
norm
A norm \(|| \cdot ||\) is also an convex function become accoring to the triangle inequality, we have
max function
The max function is convex on \(R^n\) \(\(f(x)= max \{ x_1, ..., x_n \}\)\)
log-sum-exp
The log-sum-exp is convex on \(R^n\)
log-determinant
The log-det function is concave on \(S^n_{\)
Corollary (convex \(\land\) concave = affine) all affine (and also linear) functions are both convex and concave, and vice versa
Under the assumption of differentiability, we have two equivalent statements of convexity.
Criterion (1st order condition) Suppose \(f\) is differentiable. Then \(f\) is convex iff \(dom(f)\) is convex and \(\forall x, y \in dom(f)\)
It states that the first taylor approximation is a global underestimator for convex function. Conversely , if the first Taylor approximation is aways a global underestimator, then the function is convex.
Criterion (2nd order condition) Suppose \(f\) is twice differentiable, then \(f\) is convex iff \(dom(f)\) is convex and its Hessian is positive semidefinite \(\forall x \in dom(f)\)
For a function on \(\mathbb{R}\), this reduces to the simple condition \(f''(x) \geq 0\)
quadratic function
Consider the quadratic function \(f: \mathbb{R} \to \mathbb{R}\) given by
with \(P \in S^n\). \(f(x)\) is convex iff \(P \succeq 0\)
Definition (sublevel set, superlevel set) The \(\alpha\)-sublevel set of \(f: R^n \to R\) is defined as
Corollary (sublevel set and convex) Sublevel sets of a convex function are convex. Note that the converse is not true: a function can have all its sublevel sets convex, but not be a convex function. (e.g: \(f(x) = e^{-x}\))
Definition (epigraph) The epigraph of a function \(f: R^n \to R\) is defined as
\(dom(f)\) can be seen as a projection of \(epi(f)\).
Theorem (epigraph and convexity) A function is convex iff its epigraph is a convex set
Many results for convex functions can be proved geometrically by using epigraphs and apply results for convex sets.
convex set and indicator function
A convex set \(X\) can also be mapped to a indicator function \(\delta(x|X)\). A set if convex iff its indicator function is convex
2.2. Closedness and Semicontinuity
Definition (closed function) A function \(f\) is a closed function if its epigraph is a closed set.
Definition (semicontinuous) A function \(f\) is called lower semicontinuous at \(x \in \mathcal{X}\) if
for every sequence \({x_k} \subset \mathcal{X}\) with \(x_k \to x\). We say that \(f\) is lower semicontinuous at each point \(x\) in its domain \(\mathcal{X}\). Upper semicontinuous is defined that \(-f\) is lower semicontinuous.
2.3. Subgradient
Subgradient can be defined wrt non-differentiable function, it always exists.
Definition (subgradient) A subgradient of a convex function \(f\) at \(x\) is any \(g\) such that
Definition (subdifferential) Set of all subgradient is called subdifferential
If \(f\) is differentiable then, \(\partial f(x) = \{ \nabla f(x) \}\)
subdifferential of indicator function
Consider the indicator function \(I_C(x)\)
For \(x \in C\), \(\partial I_C(x) = N_C(x)\) where \(N_C(x)\) is the normal cone at \(x\)
Lemma (calculus of subgradient)
- scaling: \(\partial (\alpha f) = \alpha (\partial f)\)
- addition: \(\partial (f_1 + f_2) = \partial f_1 + \partial f_2\)
- affine: \(\partial g(x) = A^T \partial f(Ax+b)\) where \(g(x)=f(Ax+b)\)
2.4. Operations that preserve convexity
Proposition (nonnegative weighted sum) Nonnegative weight sum of convex function is convex function
Proposition (affine mapping) Affine mapping preserves convexity
least square problem is convex
that least square problem is a convex problem because norm is convex and its inside is an affine function.
Proposition (pointwise maximum) If \(f_1, f_2\) are convex function then their pointwise maximum is also convex
Extending this to infinite set of convex, we have the supremum
Proposition (supremum) If for each \(y\), \(f(x,y)\) is convex in \(x\), then its supreme over \(y\) is also convex
where the domain is
Proposition (scalar composition) Suppose \(f(x) = h(g(x)))\) where \(h: R \to R\) and \(g: R^n \to R\), then the composition rules is
- \(f\) is convex if \(h\) is convex, extended \(h\) is nondecreasing and \(g\) is convex
- \(f\) is convex if \(h\) is convex, extened \(h\) is nonincreasing and \(g\) is concave
Proposition (vector composition) Suppose \(f(x) = h(g_1(x), ..., g_k(x))\) where \(h: R^k \to R, g_i: R^n \to R\). \(f\) is convex if \(h\) is convex and nondecreasing in each argument, \(g\) is convex
Proposition (infimum) If \(f\) is convex in $(x,y) $ and \(C\) is a convex nonempty set, then the function \(g\) is convex if some \(g(x) > -\infty\)
Notice the difference between previous sup and inf: the inf requires \(y\) to be minimized over a convex set where sup does not require anything.
distance to a set
The distance of a point \(x\) to a set \(S\) is defined as
This function is convex
Proposition (perspective of a function) If \(f: R^n \to R\), then the perspective function of \(f\) is the function \(g: R^{n+1} \to R\) defined by
with domain
euclid norm squared
The perspective function on \(f(x)=x^Tx\) is convex
KL divergence
Applying perspective to the convex function \(f(x)=-\log(x)\), we know the convexity of the following
Taking sum, we show the KL divergence is convex over the joint discrete probability distribution \((p,q)=(p_1, ..., p_n, q_1, ..., q_n)\) where \(p=(p_1, ..., p_n), q=(q_1, ..., q_n)\)
2.5. Conjugate function
Definition (conjugate function) Let \(f: R^n \to R\). The function \(f*: R^n \to R\), defined over the dual space as
The domain of conjugate function consists of \(y \in R^n\) for which the sup is bounded.
Conjugate function \(f*\) is a convex function because it is taking sup over a affine function of \(y\). This is true regardless of \(f\)'s convexity.
conjugate of convex quadratic function
The conjugate of \(f(x) = \frac{1}{2}x^TQx\) with \(Q\) positive definite, then
conjugate of indicator function
The conjugate of indicator function f a set \(S\), i.e \(I(x) = 0\) with \(dom(I)=S\) is the support function of \(S\)
conjugate of log-sum-exp
The conjugate of log-sum-exp \(f(x) = \log (\sum_i e^{x_i})\) is the negative entropy function
with the domain restricted to the probability simplex.
Corollary (Fenchel's inequality) from the conjugate definition, it is obvious that
Lemma (conjugate's conjugate) If \(f\) is convex and closed, then
2.6. Quasiconvex functions
Definition (quasiconvex) A function \(R^n \to R\) is called quasiconvex if its domain and all sublevel sets
for all \(\alpha \in R\) are convex. A function is quasiconcave if \(-f\) is quasiconvex. It is quasilinear if it is both quasiconvex and quasiconcave
log is quasilinear
\(\log x\) on \(R_}\) is both quasiconvex and quasiconcave therefore quasilinear.
linear-fractional function is quasilinear
The function
with \(dom f=\{ x | c^Tx+d > 0 \}\) is quasiconvex and quasiconcave (quasilinear)
Criterion (quasiconvex functions on R) A continous function \(R \to R\) is quasiconvex in following cases
- \(f\) is nondecreasing
- \(f\) is nonincreasing
- \(f\) is nonincreasing until a point and nondecreasing after that
Criterion (Jensen's inequality for quasiconvex function) A function \(f\) is quasiconvex iff domain is convex and for all \(x,y\) and \(\theta \in [0, 1]\)
Criterion (first order condition) Suppose \(f: R^n \to R\) is differentiable. Then \(f\) is quasiconvex iff domain is convex and for all \(x,y \in dom(f)\)
Criterion (second order condition) Suppose \(f\) is twice differentiable. If \(f\) is quasiconvex, then for all \(x \in dom(f), y \in R^n\)
2.7. Log-Concave Function
Definition (log-convex) A function \(f\) is log concave iff \(\log f(x)\) is concave: for all \(x,y \in \text{dom} f, \theta \in [0, 1]\), we have
which means the value of average is at least the geometric mean
Notice log-convex and log-concave are not symmetric:
- A log-convex function is convex and is quaiconvex, a log-concave function is quasiconcave, however it is not necessarily a concave function.
- log-convex is preserved under sum, but log-concave is not.
2.8. Convexity wrt generalized inequality
Definition (K-nondecreasing) A function \(f: R^n \to R\) is called \(K\)-nondecreasing if
matrix monotone
- \(tr(X^{-1})\) is matrix decreasing on \(S_{^n\)
- \(\det(X)\) is matrix increasing on \(S_{++}^n\)
Criterion (monotonicity) A differentiable function \(f\), with convex domain, is \(K\)-nondecreasing iff
This is analagous to \(f\) is an nondecreasing function iff \(f'(x) > 0\). Notice the difference here is it requires the dual inequality \(K^*\)
Definition (K-convex) A function \(f: R^n \to R^m\) is called K-convex if for all \(x,y\) and \(\theta \in [0, 1]\)
3. Duality
3.1. Lagrange Dual Function
The main advantage of the Lagrangian dual function is, that it is concave even if the problem is not convex.
Definition (Lagrangian) The Lagrangian \(L: R^n \times R^m \times R^p \to R\) associated with the standard form optimization problem is
\(\lambda\) is called the Lagrange multiplier and \(\nu\) is called dual variables. Domain \(\mathcal{D} = \mathrm{dom} f_i \cap \mathrm{dom} h_i\)
Definition (dual function) The Lagrange dual function is the minimum value over \(x\)
Dual function over \((\lambda, \nu)\) is concave even the original problem is not convex.
Lemma (lower bound property) One of the motivations of Lagrangian is to gives the lower bound on the optimal value of the original problem.
This is because for any feasible point \(x\), we have \(f_i(x) \leq 0, h_i(x)=0\), for any \(\lambda \geq 0\)
Then we know
Taking \(\inf_x\) yields the conclusion of the lower bound.
3.2. Lagrange Dual Problem
Each pair of \((\lambda, \nu)\) gives a lower bound to the primal problem, a natural question is to ask what is the best lower bound \(g(\lambda, \nu)\).
Definition (dual problem) The Lagrange dual problem associated with the primal problem is a convex optimization problem even the primal problem is not
Dual feasible means there exists a pair \((\lambda, \nu)\) such that \(\lambda \succeq 0, g(\lambda, \nu) > -\infty\). We refer to \((\lambda^*, \nu^*)\) as dual optimal if they are optimal for the dual problem
The dual problem is always a convex optimization problem even the primal problem is not convex
dual problem of LP
Consider the standard form of LP
The dual function is when \(A^T\nu - \lambda + c = 0\)
Combined with the constraint \(\lambda \succeq 0\). The dual problem is
Proposition (weak duality) The optimal value of the dual problem, denoted \(d^*\) is by definition, the best lower bound on the optimal value of the primal problem \(p^*\), therefore
This property is called the weak duality
Proposition (strong duality) The strong duality holds when the optimal duality gap is zero
The strong duality does not always hold, one of condition to make it hold is the Slater's condition
Criterion (Slater, Sufficiency) The primal problem is convex optimization (\(f_i(x)\) is convex) and there exists a strictly feasible point (i.e. \(x \in \text{relint}(\mathcal{D})\))
In the case of affine constraint, the inequality needs to be strict. Therefore LP always achieves the strong duality.
Slater's condition also implies that the dual optimal value is attained when \(d^* > -\infty\)
Definition (proof, certificate) If we can find a dual feasible \((\lambda, \nu)\), we establish a lower bound on the optimal value of the primal problem: \(p^* \geq g(\lambda, \nu)\). Then the point provides proof or certificate that \(p^* \geq g(\lambda, \nu)\)
3.3. KKT Condition
See this pdf note
Criterion (Karush-Kuhn-Tucker, KKT) For any optimization problem, the following conditions are called KKT condition:
3.3.1. Necessasity
If strong duality holdes (i.e: \(x^*, \lambda^*, \nu^*\) are the primal and dual solutions with zero duality gap), the KKT is a necessary condition (i.e: \(x^*, \lambda^*, \nu^*\) must satisfy KKT condition).
For example, the strong duality is guaranteed if it satisfies the Slater's condition. Note that primal needs not to be convex, the necessity is guaranteed even for non-convex under the strong duality condition.
This can be proved by
The inequality should be all equality:
- \(f_0(x^*) = g(\lambda^*, \nu^*)\) is guaranteed by the strong duality
- \(\inf_x L(x, \lambda^*, \nu^*) = L(x^*, \lambda^*, \nu^*)\) implies \(x^*\) is a stationary point for \(L(x, \lambda^*, \nu^*)\), therefore, stationary condition should be satisfied
- \(\leq L(x^*, \lambda^*, \nu^*) = f_0(x^*)\) implies the complementary slackness
Additionally, \(x^*, \lambda^*, \nu^*\) are feasible points, so both primal and dual feasibility must be satisfied.
3.3.2. Sufficiency
If the primal problem is convex, then it also provides the sufficiency condition: if there exists \(x^*, \lambda^*, \nu^*\) satisfies KKT, then they are primal and dual optimal.
This can be proved by
The first equality is implied by: \(L(x, \lambda^*, \nu^*)\) is convex over \(x\) when the primal problem is convex and \(\lambda^* \geq 0\)
The second equality is implied by complementary slackness and feasibility condition
4. Reference
[1] Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[2] Bertsekas, Dimitri P. Convex optimization theory. Belmont: Athena Scientific, 2009.