Skip to content

0x043 Convex Optimization

1. Convex Sets

1.1. Affine Set

Definition (affine set) A set \(C \subseteq R^n\) is affine if for any \(x_1, x_2 \in C\), then \(\theta x_1 + (1-\theta)x_2 \in C\) where \(\theta \in R\)

note: Affine set is a subspace plus an offset

Definition (affine hull) The affine hull of \(C\) is the affine combination of all points in \(C\). The affine dimension of \(C\) is the dimension of its affine hull, which is the dimension of the underlying subspace

solution set of linear equations

The solution set of a system of linear equations is an affine set

\[\{ x | Ax=b \}\]

hyperplane

A hyperplane is also a solution set of linear equation, therefore affine set

\[\{ x | a^Tx =b \}\]

1.2. Convex Set

Definition (convex set) A set \(C\) is convex if for any \(x1,x2 \in C\), and any $0 \leq \theta \leq 1 $, then

\[\theta x_1 + (1-\theta)x_2 \in C\]

Note the difference between affine set and convex set is the range of \(\theta\)

Definition (convex hull) The convex hull of \(C\) is the convex combination of all points in \(C\)

halfspace

A halfspace is a convex set

\[\{ x | a^T x \leq b\}\]

ball and ellipsoid

A norm ball is a convex set

\[\{ x | ||x-x_c|| \leq r \}\]

A lreated family are ellipsoids, they are also convex

\[\{ x | (x-x_c)^TP^{-1}(x-x_c) \leq 1 \}\]

polyhedra and simplex

A polyhedron is defined as the solution set of a finite number of halfspaces and hyperplanes, therefore convex

\[\{ x | Ax \preceq b, Cx=d \}\]

Simplexes are an important family of polyhedra, it is defined with \(k+1\) affinely independent points \(v_0, ..., v_k\) (i.e. : \(v_1-v_0, v_2-v_0, ...\) are independent), the simplex is given by their convex hull

\[\{ \theta_0 v_0 + ... + \theta_k v_k | \theta \succeq 0, 1^T \theta = 1 \}\]

1.3. Operations that preserve convexity

Proposition (Intersection preserves convexity) The intersection of any number of convex sets is a convex set

For example, polyhedron is the intersection of halfspaces and hyperplanes

positive semidefinite cone

positive semidefinite cone can be expressed

\[\bigcap_{z \neq 0} \{ X \in S^n | z^T X z \geq 0 \} \]

which is the intersection of infinite number of halfspaces $ { X \in S^n | z^T X z \geq 0 }$, and so is convex

Proposition (convex set and halfspaces) a closed convex set \(S\) is the intersection of all halfspaces that contain it

\[ S = \bigcap \{ H | H, S \subseteq H \} \]

Proposition (Affine function preserves convexity) \(y = Ax + b\) and its reverse preserves convexity, which means suppose \(S \subset R^n\) is convex and \(f: R^n \to R^m\) is an affine function, then the image and reverse image

\[f(S) = \{ f(x) | x \in S \}\]
\[f^{-1}(S) = \{ x | f(x) \in S \}\]

are both convex

limear matrix inequality

Solution set of linear matrix inequality (LMI)

\[A(x) = x_1 A_1 + ... + x_n A_n \preceq B\]

is convex because it is the inverse image of the positive semidefinite cone under the affine function \(f(x) = B - A(x)\)

Definition (perspective function) The perspective function \(P: R^{n+1} \to R^n\) with the domain \(R^{n} \times R_}\) is

\[P(z, t)=\frac{z}{t}\]

Proposition (perspective function preserves convexity) If \(C \subset \text{dom}(P)\) is convex, then its image is convex

\[P(C) = \{ P(x) | x \in C \}\]

If \(C \subset R^n\) is convex, then the inverse image is also convex

\[P^{-1}(C) = \{ (x, t) \in R^{n+1} | x/t \in C, t > 0 \}\]

Definition (linear-fractional function) The linear fractional function is the composition of perspective function with an affine function

\(\(f(x) = \frac{Ax + b}{c^Tx + d}\)\) where \(\text{dom }f = \{x | c^T x + d > 0 \}\)

Affine function and linear function can be thought as the special cases of linear-fraction by setting \(c=0\)

Proposition (Linear-fractional and perspective function preserve convexity) The perspective functions, linear-fractional functions preserve convexity

1.4. Cone Set

Definition (cone) A set is cone if for every \(x \in C, \theta \geq 0\) then \(\theta x \in C\).

Definition (convex cone) A set is called convex cone if it is convex and cone

norm cones

The norm cone is convex cone

\[\{ (x,t) | ||x|| \leq t \}\]

A special case of norm cone is the second order cone (also known as the Lorentz cone or ice-cream cone)

\[\{ (x, t) | ||x||_2 \leq t \}\]

Definition (conic hull) The conic hull of \(C\) is the conic combination of all points in \(C\)

Definition (proper cone) A cone \(K \subset R^n\) is called a proper cone if it satisfies the following:

  • convex
  • closed
  • solid (has noempty interiors)
  • pointed (contains no line)

positive semidefinite cone

The set of symmetric positive semidefinite matrices is a proper cone

\[S^n_{+} = \{ X | X = X^T, X \succeq 0 \}\]

Definition (normal cone) The normal cone of a set \(C\) at a boundary point \(x_0\) denoted by \(N_C(x_0)\) such that

\[N_C(x_0) = \{ g | g^T(x-x_0) \forall x \in C \}\]

1.5. Generalized Inequalities

Definition (generalized inequality) A generalized inequality defined with respect to a proper cone \(K\) is a partial ordering on \(R^n\)

\[x \preceq_{K} y \iff y-x \in K\]
\[x \prec_{K} y \iff y-x \in K^{\circ}\]

The following two generalized inequalities are commonly used

nonnegative orthant and componentwise inequality

The nonnegative orthant \(K=R^n_{+}\) is a proper cone. The associated generalized inequality \(\preceq_K\) is the componentwise inequality \((\forall{i})x_i \leq y_i\)

positive semidefinite cone and matrix inequality

The positive semidefinite cone is a proper cone. The associated inequality \(\preceq_K\) means

\[X \preceq_K Y \implies Y-X \in S^n_{+}\]

Definition (minimum, maximum) \(x \in S\) is the minimum element of \(S\) with respect to \(\preceq_{K}\) if for all \(y \in S\), \(x \preceq_{K} y\)

Definition (minimal, maximal) \(x \in S\) is the minimal element of \(S\) with respect to \(\preceq_{K}\) if $y \preceq_{K} x $, then \(y = x\)

1.6. Seperating and Supporting Hyperplanes

Theorem (separating hyperplane theorem) Let \(A,B\) be two disjoint convex set in \(R^n\), there exists a nonzero vector \(v\)

and scalar \(c\), such that for all \(x \in A, y \in B\),

\[\langle x, v \rangle \geq c, \langle y, v \rangle \leq c\]

The converse, however, is not necessarily true: the existence of a separating hyperplane implies \(A,B\) are disjoint.

We need additional constraints to make it true, for example, any two convex sets \(A,B\), at least one of which is open, are disjoint iff there exists a separating hyperplane.

Theorem (supporting hyperplane theorem) For a non-empty convex set \(C\) and its boundary point \(x_0\), there exists a supporting hyperplane to \(C\) at \(x_0\)

1.7. Dual Cones

Definition (dual cone) The following set \(K^{*}\) is called the dual cone with respect to cone \(K\) convex

\[K^{*} = \{ y | y^T x \geq 0 (\forall{ x \in K})\}\]

Notice the similarity to the orthogonal space \(K^\perp = \{ y | y^Tx = 0 (\forall{x \in K}) \}\)

Lemma (dual cone is convex) Dual cone is a convex cone even \(K\) is neither convex nor cone

Again notice the similarity to the conjugate operation (i.e: conjugated function will be convex even the original is not)

dual cone of subspace

The dual cone of a subspace \(V\) is its orthogonal complement

\[V^{\perp} = \{ y | y^Tv = 0 (\forall{v \in V}) \}\]

dual cone of PSD cone

With the standard inner product over matrix

\[\langle X, Y \rangle = tr(XY) = \sum_{i,j} X_{ij}Y_{ij}\]

The positive semidefinite cone \(S^n_{+}\) is self-dual

dual cone of norm cone

The dual of norm cone \(K=\{ (x,t) | ||x|| \leq t \}\) is the cone defined by the dual norm

\[K^* = \{ (u, v) | ||u||_{*} \leq v \}\]

where the dual norm is defined by

\[||u||_{*} = \sup \{ u^Tx | ||x|| \leq 1 \}\]

Notice the similarity between the dual norm and matrix norm. Dual norm is a measure of size for a continuous linear function defined on a normed vector space.

Definition (dual generalized inequality) Suppose cone \(K\) is proper, then the dual cone \(K^*\) is also proper which induces the generalized inequality \(\prec_{K^*}\)

Lemma (dual generalized inequalities preserves order)

\[x \preceq_K y \iff (\forall \lambda \preceq_{K^{*}} 0) \lambda^T x \leq \lambda^T y\]

Criterion (dual characterization of minimum) \(x\) is the minimum element of \(S\) with respect to \(K\) iff for every \(\lambda \prec_{K^*} 0\), \(x\) is the unique minimizer of \(\lambda^T z\) over \(z \in S\)

Criterion (dual characterization of minimal) \(x\) is the minimal element of \(S\) with respect to \(K\) iff \(x\) minimizes a \(\lambda^T x\) for a particular \(\lambda \prec_{K^{*}} 0\)

2. Convex Functions

2.1. Basic Properties and Examples

Definition (Convex function) A function \(f: R^n \to R\) is convex if its domain is a convex set and for all \(x, y \in dom(f), \theta \in [0, 1]\) we have

\[f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta)f(y)\]

Note that, convexity of the domain is a prerequisite for convexity of a function. When we call a function convex, we imply that its domain is convex

A function is strictly convex if strict inequality holds whenever $x \neq y \land \theta \in (0, 1) $

Similarly, a function \(f\) is (strictly) concave iff \(-f\) is (strictly) convex

affine function

An affine function is a convex function whose form is

\[f(x)=ax+b\]

norm

A norm \(|| \cdot ||\) is also an convex function become accoring to the triangle inequality, we have

\[||\alpha x + (1-\alpha) y|| \leq \alpha ||x|| (1-\alpha) ||y||\]

max function

The max function is convex on \(R^n\) \(\(f(x)= max \{ x_1, ..., x_n \}\)\)

log-sum-exp

The log-sum-exp is convex on \(R^n\)

\[f(x) = \log \sum_i e^{x_i}\]

log-determinant

The log-det function is concave on \(S^n_{\)

\[f(x) = \log\det X\]

Corollary (convex \(\land\) concave = affine) all affine (and also linear) functions are both convex and concave, and vice versa

Under the assumption of differentiability, we have two equivalent statements of convexity.

Criterion (1st order condition) Suppose \(f\) is differentiable. Then \(f\) is convex iff \(dom(f)\) is convex and \(\forall x, y \in dom(f)\)

\[f(y) \geq f(x) + f(x) + \nabla f(x)^T (y-x)\]

It states that the first taylor approximation is a global underestimator for convex function. Conversely , if the first Taylor approximation is aways a global underestimator, then the function is convex.

first order condition

Criterion (2nd order condition) Suppose \(f\) is twice differentiable, then \(f\) is convex iff \(dom(f)\) is convex and its Hessian is positive semidefinite \(\forall x \in dom(f)\)

\[ \nabla^2 f(x) \succeq 0\]

For a function on \(\mathbb{R}\), this reduces to the simple condition \(f''(x) \geq 0\)

quadratic function

Consider the quadratic function \(f: \mathbb{R} \to \mathbb{R}\) given by

\[f(x) = (1/2)x^TPx + q^Tx + r\]

with \(P \in S^n\). \(f(x)\) is convex iff \(P \succeq 0\)

Definition (sublevel set, superlevel set) The \(\alpha\)-sublevel set of \(f: R^n \to R\) is defined as

\[C_\alpha = \{ x \in dom(f) | f(x) \leq \alpha \}\]

Corollary (sublevel set and convex) Sublevel sets of a convex function are convex. Note that the converse is not true: a function can have all its sublevel sets convex, but not be a convex function. (e.g: \(f(x) = e^{-x}\))

Definition (epigraph) The epigraph of a function \(f: R^n \to R\) is defined as

\[\text{epi } f = \{ (x,t) | x \in dom(f), f(x) \leq t \}\]

\(dom(f)\) can be seen as a projection of \(epi(f)\).

Theorem (epigraph and convexity) A function is convex iff its epigraph is a convex set

Many results for convex functions can be proved geometrically by using epigraphs and apply results for convex sets.

convex set and indicator function

A convex set \(X\) can also be mapped to a indicator function \(\delta(x|X)\). A set if convex iff its indicator function is convex

\[\delta(x|X) = \begin{cases}0 & \text{ if }x \in X \\ \infty & \text{ otherwise} \end{cases}\]

2.2. Closedness and Semicontinuity

Definition (closed function) A function \(f\) is a closed function if its epigraph is a closed set.

Definition (semicontinuous) A function \(f\) is called lower semicontinuous at \(x \in \mathcal{X}\) if

\[f(x) \leq \liminf_{k \to \infty} f(x_k) \]

for every sequence \({x_k} \subset \mathcal{X}\) with \(x_k \to x\). We say that \(f\) is lower semicontinuous at each point \(x\) in its domain \(\mathcal{X}\). Upper semicontinuous is defined that \(-f\) is lower semicontinuous.

2.3. Subgradient

Subgradient can be defined wrt non-differentiable function, it always exists.

Definition (subgradient) A subgradient of a convex function \(f\) at \(x\) is any \(g\) such that

\[f(y) \geq f(x)+g^T(y-x)\]

Definition (subdifferential) Set of all subgradient is called subdifferential

\[\partial f(x) = \{ g | g \text{ is subgradient of f at x} \}\]

If \(f\) is differentiable then, \(\partial f(x) = \{ \nabla f(x) \}\)

subdifferential of indicator function

Consider the indicator function \(I_C(x)\)

For \(x \in C\), \(\partial I_C(x) = N_C(x)\) where \(N_C(x)\) is the normal cone at \(x\)

Lemma (calculus of subgradient)

  • scaling: \(\partial (\alpha f) = \alpha (\partial f)\)
  • addition: \(\partial (f_1 + f_2) = \partial f_1 + \partial f_2\)
  • affine: \(\partial g(x) = A^T \partial f(Ax+b)\) where \(g(x)=f(Ax+b)\)

2.4. Operations that preserve convexity

Proposition (nonnegative weighted sum) Nonnegative weight sum of convex function is convex function

\[f = w_1 f_1 + ... + w_n f_m\]

Proposition (affine mapping) Affine mapping preserves convexity

\[g(x) = f(Ax + b)\]

least square problem is convex

that least square problem is a convex problem because norm is convex and its inside is an affine function.

\[|| Ax - b ||^2_2\]

Proposition (pointwise maximum) If \(f_1, f_2\) are convex function then their pointwise maximum is also convex

\[f(x) = \max \{ f_1(x), f_2(x) \}\]

Extending this to infinite set of convex, we have the supremum

Proposition (supremum) If for each \(y\), \(f(x,y)\) is convex in \(x\), then its supreme over \(y\) is also convex

\[g(x) = \sup_{y \in \mathcal{A}} f(x,y)\]

where the domain is

\[\{ x | (x,y) \in dom(f) \text{ for all } y \in \mathcal{A}, \sup_{y \in \mathcal{A}} f(x,y) < \infty \}\]

Proposition (scalar composition) Suppose \(f(x) = h(g(x)))\) where \(h: R \to R\) and \(g: R^n \to R\), then the composition rules is

  • \(f\) is convex if \(h\) is convex, extended \(h\) is nondecreasing and \(g\) is convex
  • \(f\) is convex if \(h\) is convex, extened \(h\) is nonincreasing and \(g\) is concave

Proposition (vector composition) Suppose \(f(x) = h(g_1(x), ..., g_k(x))\) where \(h: R^k \to R, g_i: R^n \to R\). \(f\) is convex if \(h\) is convex and nondecreasing in each argument, \(g\) is convex

Proposition (infimum) If \(f\) is convex in $(x,y) $ and \(C\) is a convex nonempty set, then the function \(g\) is convex if some \(g(x) > -\infty\)

\[g(x) = \inf_{y \in C} f(x,y) \]

Notice the difference between previous sup and inf: the inf requires \(y\) to be minimized over a convex set where sup does not require anything.

distance to a set

The distance of a point \(x\) to a set \(S\) is defined as

\[dist(x, S) = \inf_{y \in S} ||x - y||\]

This function is convex

Proposition (perspective of a function) If \(f: R^n \to R\), then the perspective function of \(f\) is the function \(g: R^{n+1} \to R\) defined by

\[g(x, t) = t(fx/t)\]

with domain

\[\{ (x,t) | x/t \in dom(f), t > 0 \}\]

euclid norm squared

The perspective function on \(f(x)=x^Tx\) is convex

\[g(x, t) = x^Tx/t\]

KL divergence

Applying perspective to the convex function \(f(x)=-\log(x)\), we know the convexity of the following

\[t \log (t/x)\]

Taking sum, we show the KL divergence is convex over the joint discrete probability distribution \((p,q)=(p_1, ..., p_n, q_1, ..., q_n)\) where \(p=(p_1, ..., p_n), q=(q_1, ..., q_n)\)

\[D_{KL}(p||q) = \sum_i p_i \log(p_i/q_i)\]

2.5. Conjugate function

Definition (conjugate function) Let \(f: R^n \to R\). The function \(f*: R^n \to R\), defined over the dual space as

\[f^*(y) = \sup_{x \in \text{dom} f } (y^T x - f(x))\]

The domain of conjugate function consists of \(y \in R^n\) for which the sup is bounded.

Conjugate function \(f*\) is a convex function because it is taking sup over a affine function of \(y\). This is true regardless of \(f\)'s convexity.

conjugate of convex quadratic function

The conjugate of \(f(x) = \frac{1}{2}x^TQx\) with \(Q\) positive definite, then

\[f^*(y) = \frac{1}{2}y^*Q^{-1}y\]

conjugate of indicator function

The conjugate of indicator function f a set \(S\), i.e \(I(x) = 0\) with \(dom(I)=S\) is the support function of \(S\)

\[I^*(y) = \sup_{x \in S} y^Tx\]

conjugate of log-sum-exp

The conjugate of log-sum-exp \(f(x) = \log (\sum_i e^{x_i})\) is the negative entropy function

\[f^*(y) = \sum_i y_i \log(y_i)\]

with the domain restricted to the probability simplex.

Corollary (Fenchel's inequality) from the conjugate definition, it is obvious that

\[(\forall(x, y)) f(x) + f^*(y) \geq x^Ty\]

Lemma (conjugate's conjugate) If \(f\) is convex and closed, then

\[f^{**} = f\]

2.6. Quasiconvex functions

Definition (quasiconvex) A function \(R^n \to R\) is called quasiconvex if its domain and all sublevel sets

\[S_\alpha = \{ x \in dom(f) | f(x) \leq \alpha \}\]

for all \(\alpha \in R\) are convex. A function is quasiconcave if \(-f\) is quasiconvex. It is quasilinear if it is both quasiconvex and quasiconcave

log is quasilinear

\(\log x\) on \(R_}\) is both quasiconvex and quasiconcave therefore quasilinear.

linear-fractional function is quasilinear

The function

\[f(x) = \frac{a^Tx+b}{c^Tx+d}\]

with \(dom f=\{ x | c^Tx+d > 0 \}\) is quasiconvex and quasiconcave (quasilinear)

Criterion (quasiconvex functions on R) A continous function \(R \to R\) is quasiconvex in following cases

  • \(f\) is nondecreasing
  • \(f\) is nonincreasing
  • \(f\) is nonincreasing until a point and nondecreasing after that

Criterion (Jensen's inequality for quasiconvex function) A function \(f\) is quasiconvex iff domain is convex and for all \(x,y\) and \(\theta \in [0, 1]\)

\[f(\theta x + (1-\theta) y) \leq \max \{ f(x), f(y) \}\]

Criterion (first order condition) Suppose \(f: R^n \to R\) is differentiable. Then \(f\) is quasiconvex iff domain is convex and for all \(x,y \in dom(f)\)

\[f(y) \leq f(x) \implies \nabla f(x)^T (y-x) \leq 0\]

Criterion (second order condition) Suppose \(f\) is twice differentiable. If \(f\) is quasiconvex, then for all \(x \in dom(f), y \in R^n\)

\[y^T \nabla f(x) = 0 \implies y^T \nabla^2 f(x) y \geq 0\]

2.7. Log-Concave Function

Definition (log-convex) A function \(f\) is log concave iff \(\log f(x)\) is concave: for all \(x,y \in \text{dom} f, \theta \in [0, 1]\), we have

\[f(\theta x + (1-\theta)y) \geq f(x)^\theta f(y)^{1-\theta}\]

which means the value of average is at least the geometric mean

Notice log-convex and log-concave are not symmetric:

  • A log-convex function is convex and is quaiconvex, a log-concave function is quasiconcave, however it is not necessarily a concave function.
  • log-convex is preserved under sum, but log-concave is not.

2.8. Convexity wrt generalized inequality

Definition (K-nondecreasing) A function \(f: R^n \to R\) is called \(K\)-nondecreasing if

\[x \preceq_K y \implies f(x) \leq f(y)\]

matrix monotone

  • \(tr(X^{-1})\) is matrix decreasing on \(S_{^n\)
  • \(\det(X)\) is matrix increasing on \(S_{++}^n\)

Criterion (monotonicity) A differentiable function \(f\), with convex domain, is \(K\)-nondecreasing iff

\[\nabla f(x) \succeq_{K^*} 0\]

This is analagous to \(f\) is an nondecreasing function iff \(f'(x) > 0\). Notice the difference here is it requires the dual inequality \(K^*\)

Definition (K-convex) A function \(f: R^n \to R^m\) is called K-convex if for all \(x,y\) and \(\theta \in [0, 1]\)

\[f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y)\]

3. Duality

3.1. Lagrange Dual Function

The main advantage of the Lagrangian dual function is, that it is concave even if the problem is not convex.

Definition (Lagrangian) The Lagrangian \(L: R^n \times R^m \times R^p \to R\) associated with the standard form optimization problem is

\[L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^{p} \nu_i h_i(x)\]

\(\lambda\) is called the Lagrange multiplier and \(\nu\) is called dual variables. Domain \(\mathcal{D} = \mathrm{dom} f_i \cap \mathrm{dom} h_i\)

Definition (dual function) The Lagrange dual function is the minimum value over \(x\)

\[g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu)\]

Dual function over \((\lambda, \nu)\) is concave even the original problem is not convex.

Lemma (lower bound property) One of the motivations of Lagrangian is to gives the lower bound on the optimal value of the original problem.

\[g(\lambda, \nu) \leq p*\]

This is because for any feasible point \(x\), we have \(f_i(x) \leq 0, h_i(x)=0\), for any \(\lambda \geq 0\)

\[\sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x) \leq 0\]

Then we know

\[L(x, \lambda, \nu) = f_0(x) + \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x) \leq f_0(x)\]

Taking \(\inf_x\) yields the conclusion of the lower bound.

3.2. Lagrange Dual Problem

Each pair of \((\lambda, \nu)\) gives a lower bound to the primal problem, a natural question is to ask what is the best lower bound \(g(\lambda, \nu)\).

Definition (dual problem) The Lagrange dual problem associated with the primal problem is a convex optimization problem even the primal problem is not

\[\text{maximize } g(\lambda, \nu) \]
\[\text{subject to } \lambda \succeq 0 \]

Dual feasible means there exists a pair \((\lambda, \nu)\) such that \(\lambda \succeq 0, g(\lambda, \nu) > -\infty\). We refer to \((\lambda^*, \nu^*)\) as dual optimal if they are optimal for the dual problem

The dual problem is always a convex optimization problem even the primal problem is not convex

dual problem of LP

Consider the standard form of LP

\[ \text{minimize } c^Tx+d \\ \text{ subject to } Gx \preceq h, Ax =b\]

The dual function is when \(A^T\nu - \lambda + c = 0\)

\[g(\lambda, \nu) = -b^T\nu\]

Combined with the constraint \(\lambda \succeq 0\). The dual problem is

\[\text{maxmize } -b^T\nu \text{ subject to } A^T\nu + c \succeq 0\]

Proposition (weak duality) The optimal value of the dual problem, denoted \(d^*\) is by definition, the best lower bound on the optimal value of the primal problem \(p^*\), therefore

\[d^* \leq p^*\]

This property is called the weak duality

Proposition (strong duality) The strong duality holds when the optimal duality gap is zero

\[d^* = p^*\]

The strong duality does not always hold, one of condition to make it hold is the Slater's condition

Criterion (Slater, Sufficiency) The primal problem is convex optimization (\(f_i(x)\) is convex) and there exists a strictly feasible point (i.e. \(x \in \text{relint}(\mathcal{D})\))

\[f_i(x) < 0\]

In the case of affine constraint, the inequality needs to be strict. Therefore LP always achieves the strong duality.

Slater's condition also implies that the dual optimal value is attained when \(d^* > -\infty\)

Definition (proof, certificate) If we can find a dual feasible \((\lambda, \nu)\), we establish a lower bound on the optimal value of the primal problem: \(p^* \geq g(\lambda, \nu)\). Then the point provides proof or certificate that \(p^* \geq g(\lambda, \nu)\)

3.3. KKT Condition

See this pdf note

Criterion (Karush-Kuhn-Tucker, KKT) For any optimization problem, the following conditions are called KKT condition:

\[ \text{(primal feasibility) }f_i(x^*) \leq 0 \]
\[\text{(primal feasibility) }h_i(x^*) = 0\]
\[\text{(dual feasibility) }\lambda^* \geq 0\]
\[\text{complementary slackness)}\lambda^* f_i(x^*) = 0\]
\[\text{(stationary condition) }\nabla f_0 (x^*) + \sum_{i=1}^m \lambda^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla h_i (x^*) = 0\]

3.3.1. Necessasity

If strong duality holdes (i.e: \(x^*, \lambda^*, \nu^*\) are the primal and dual solutions with zero duality gap), the KKT is a necessary condition (i.e: \(x^*, \lambda^*, \nu^*\) must satisfy KKT condition).

For example, the strong duality is guaranteed if it satisfies the Slater's condition. Note that primal needs not to be convex, the necessity is guaranteed even for non-convex under the strong duality condition.

This can be proved by

\[f_0(x^*) = g(\lambda^*, \nu^*) = \inf_x L(x, \lambda^*, \nu^*) \leq L(x^*, \lambda^*, \nu^*) \leq f_0(x^*)\]

The inequality should be all equality:

  • \(f_0(x^*) = g(\lambda^*, \nu^*)\) is guaranteed by the strong duality
  • \(\inf_x L(x, \lambda^*, \nu^*) = L(x^*, \lambda^*, \nu^*)\) implies \(x^*\) is a stationary point for \(L(x, \lambda^*, \nu^*)\), therefore, stationary condition should be satisfied
  • \(\leq L(x^*, \lambda^*, \nu^*) = f_0(x^*)\) implies the complementary slackness

Additionally, \(x^*, \lambda^*, \nu^*\) are feasible points, so both primal and dual feasibility must be satisfied.

3.3.2. Sufficiency

If the primal problem is convex, then it also provides the sufficiency condition: if there exists \(x^*, \lambda^*, \nu^*\) satisfies KKT, then they are primal and dual optimal.

This can be proved by

\[g(\lambda^*, \nu^*) = L(x^*, \lambda^*, \nu^*) = f(x^*)\]

The first equality is implied by: \(L(x, \lambda^*, \nu^*)\) is convex over \(x\) when the primal problem is convex and \(\lambda^* \geq 0\)

The second equality is implied by complementary slackness and feasibility condition

4. Reference

[1] Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[2] Bertsekas, Dimitri P. Convex optimization theory. Belmont: Athena Scientific, 2009.