0x411 Convex Optimization

Convex Sets

Affine and convex sets

Definition (affine set) A set $C \subseteq R^n$ is affine if for any $x_1, x_2 \in C$, then $\theta x_1 + (1-\theta)x_2 \in C$ where $\theta \in R$

note: Affine set is a subspace plus an offset

Definition (affine hull) The affine hull of $C$ is the affine combination of all points in $C$. The affine dimension of $C$ is the dimension of its affine hull, which is the dimension of the underlying subspace

Definition (convex set) A set $C$ is convex if for any $x1,x2 \in C$, and any $0 \leq \theta \leq 1 $, then $\theta x_1 + (1-\theta)x_2 \in C$. convex is combinations of line segments through any two distinct points in it.

Definition (convex hull) The convex hull of $C$ is the convex combination of all points in $C$

Definition (cone) A set is cone if for every $x \in C, \theta \geq 0$ then $\theta x \in C$. A set is called convex cone if it is convex and cone

Definition (conic hull) The conic hull of $C$ is the conic combination of all points in $C$

Definition (proper cone) A cone $K \subset R^n$ is called a proper cone if it is convex, closed, solid (has noempty interiors) and pointed (contains no line)

Operations that preserve convexity

Proposition (Intersection preserves convexity) The intersection of any number of convex sets is a convex set

For example

  • polyhedron is the intersection of halfspaces and hyperplanes
  • positive semidefinite cone can be expressed $\bigcap_{z \neq 0} \{ X \in S^n | z^T X z \geq 0 \} $, which is the intersection of infinite number of halfspaces $ \{ X \in S^n | z^T X z \geq 0 \}$, and so is convex

In fact, a closed convex set $S$ is the intersection of all halfspaces that contain it

$$ S = \bigcap \{ H | H, S \subseteq H \} $$

Proposition (Affine function preserves convexity) $y = Ax + b$ and its reverse preserves convexity

Definition (perspective function) The perspective function $P: R^{n+1} \to R^n$ with the domain $R^{n} \times R_{++}$ is $P(z, t)=z/t$

Definition (linear-fractional function) The linear fractional function is the composition of perspective function with an affine function

Proposition (Linear-fractional and perspective function preserve convexity) The perspective functions, linear-fractional functions preserve convexity

Generalized Inequalities

Definition (generalized inequality) A generalized inequality defined with respect to a proper cone $K$ is a partial order

$$x \preceq_{K} y \iff y-x \in K$$

$$x \preceq_{K} y \iff y-x \in K^{\circ}$$

Definition (minimum, maximum) $x \in S$ is the minimum element of $S$ with respect to $\preceq_{K}$ if for all $y \in S$, $x \preceq_{K} y$

Definition (minimal, maximal) $x \in S$ is the minimal element of $S$ with respect to $\preceq_{K}$ if $y \preceq_{K} x $, then $y = x$

Seperating and Supporting Hyperplanes

Theorem (operating hyperplane theorem) Let $A,B$ be two disjoint convex set in $R^n$, there exists a nonzero vector $v$

and scalar $c$, such that for all $x \in A, y \in B$,

$$\langle x, v \rangle \geq c, \langle y, v \rangle \leq c$$

Theorem (supporting hyperplane theorem) For a non-empty convex set $C$ and its boundary point $x_0$, there exists a supporting hyperplane to $C$ at $x_0$

Dual Cones and Generalized Inequalities

Definition (dual cone) The following set $K^{*}$ is called the dual cone with respect to cone $K$

$$K^{*} = \{ y |(\forall{ x \in K}) x^T y \geq 0 \}$$

Note that the dual cone of a subspace is the orthogonal complement of the subspace

Properties (dual generalized inequalities)

  • $x \preceq_K y \iff (\forall \lambda \preceq_{K^{*}} 0) \lambda^T x \leq \lambda^T y$
  • $x \prec_K y \iff (\forall \lambda \prec_{K^{*}} 0) \lambda^T x \leq \lambda^T y$

Criterion (dual characterization of minimum) $x$ is the minimum element of $S$ with respect to $K$ iff for every $\lambda \prec_{K^*} 0$, $x$ is the unique minimizer of $\lambda^T z$ over $z \in S$

Criterion (dual characterization of minimal) $x$ is the minimal element of $S$ with respect to $K$ iff $x$ minimizes a $\lambda^T x$ for a particular $\lambda \prec_{K^{*}} 0$

Convex Functions

Basic Properties and Examples

Definition (Convex function) A function $f: R^n->R$ is convex if its domain is a convex set and for all $x, y \in dom(f), \theta \in [0, 1] $ we have

$$ f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta)f(y)$$

A function is strictly convex if strict inequality holds whenever $x \neq y \land \theta \in (0, 1) $

Definition (Concave function) A function $f$ is (strictly) concave iff $-f$ is (strictly) convex

Proposition (convex $\land$ concave = affine) all affine (and also linear) functions are both convex and concave, and vice versa

Criterion (1st order condition) Suppose $f$ is differentiable. Then $f$ is convex iff $dom(f)$ is convex and

$$(\forall x, y \in dom(f)) f(y) \geq f(x) + f(x) + \nabla f(x)^T (y-x)$$

First taylor approximation is a global underestimator for convex function

Criterion (2nd order condition) Suppose $f$ is twice differentiable, then $f$ is convex iff $dom(f)$ is convex and its Hessian is positive semidefinite

$$(\forall x \in dom(f)) \nabla^2 f(x) \succeq 0$$

Definition (sublevel set, superlevel set) The $\alpha$-sublevel set of $f: R^n \to R$ is defined as

$$C_\alpha = \{ x \in dom(f) | f(x) \leq \alpha \}$$

Definition (epigraph) The epigraph of a function $f: R^n \to R$ is defined as

$$\text{epi } f = \{ (x,t) | x \in dom f, f(x) \leq t \}$$

Operations that preserve convexity

Proposition (nonnegative weighted sum) Nonnegative weight sum of convex function is convex function

$$f = w_1 f_1 + … + w_n f_m$$

Proposition (affine mapping) Affine mapping preserves convexity

$$g(x) = f(Ax + b)$$

Proposition (supremum) If for each $y$, $f(x,y)$ is convex in $x$, then its supreme over $y$ is also convex

$$g(x) = \sup_{y} f(x,y)$$

Proposition (composition) Suppose $f(x) = h(g_1(x), …, g_k(x))$ where $h: R^k \to R, g_i: R^n \to R$. $f$ is convex if $h$ is convex and nondecreasing in each argument, $g$ is convex

Proposition (infimum) If $f$ is convex in $(x,y) $ and $C$ is a convex nonempty set, then the function $g$ is convex if some $g(x) > -\infty$

$$g(x) = \inf_{y \in C} f(x,y) $$

The conjugate function

Definition (conjugate function) Let $f: R^n \to R$. The function $f*: R^n \to R$, defined as

$$f^*(y) = \sup_{x \in \text{dom} f } (y^T x – f(x))$$

Convex Optimization Problems

Convex Optimization

A fundamental property of convex optimization problems is that any local optimal point is also global optimal

Definition (convex optimization) A convex optimization problem is of the form

$$\text{minimize } f_0(x)$$

$$\text{s.t. } f_i(x) \leq 0, a_i ^T = b_i$$

where $f_i$ are convex functions

Criterion (optimality) In the constrained problem. Suppose the objective $f_0$ is differentiable, then $x$ is optimal if $x \in X$ where $X$ is the feasible set

$$\forall(y \in X) \nabla f_0(x)^T (y-x) \geq 0$$

For an unconstrained problem, the condition reduces to the well known condition

$$\nabla f_0(x) = 0$$

Definition (quadratic optimization, QP) The convex optimization is called quadratic program (QP) if it can be expressed in the form

$$\text{minimize } \frac{1}{2} x^T P x + q^T x + r$$

$$\text{s.t. } Gx \preceq h, Ax = 0$$

where $P \in S^n_{+}$

Definition (second-order cone programming, SOCP) A second-order cone program (SOCP) has the following form

$$\text{minimize } f^T x$$

$$\text{s.t. } || A_i x + b_i ||_{2} \leq c_i^T x + d_i $$

Definition (semidefinite programming, SDP) When $K \in S^k_{+}$, the cone of positive semidefinite matrics, the associated conic form problem is called a semidefnite program (SDP)

$$\text{minimize }c^T x$$

$$\text{s.t. } x_1 F_1 + … + x_n F_n + G \preceq 0, Ax = b$$

where $G, F_i \in S^k$


Lagrange Dual Problem

Definition (Lagrangian) The Lagrangian $L: R^n \times R^m \times R^p \to R$ associated with the standard form optimization problem is

$$L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^{p} \nu_i h_i(x)$$

$\lambda$ is called the Lagrange multiplier and $\nu$ is called dual variables. Domain $\mathcal{D} = \mathrm{dom} f_i \cap \mathrm{dom} h_i$

Definition (dual function) The Lagrange dual function is the minimum value over $x$

$$g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu)$$

Dual function over $(\lambda, \nu)$ is concave even the original problem is not convex.

One of the motivations of Lagrangian is to gives the lower bound on the optimal value of the original problem.

Definition (dual problem) The Lagrange dual problem associated with the primal problem is a convex optimization problem even the primal problem is not

$$\text{maximize } g(\lambda, \nu) $$

$$\text{subject to } \lambda \succeq 0 $$

Dual feasible means there exists a pair $(\lambda, \nu)$ such that $\lambda \succeq 0, g(\lambda, \nu) > -\infty$. We refer to $(\lambda^*, \nu^*)$ as dual optimal if they are optimal for the dual problem

The dual problem is always a convex optimization problem even the primal problem is not convex

Proposition (weak duality) The optimal value of the dual problem, denoted $d^*$ is by definition, the best lower bound on the optimal value of the primal problem $p^*$, therefore

$$d^* \leq p^*$$

This property is called the weak duality

Proposition (strong duality and Slater’s condition) The strong duality holds when the optimal duality gap is zero

$$d^* = p^*$$

The strong duality does not always hold, one of condition to make it hold is the Slater’s condition such that there exists a strictly feasible point

$$f_i(x) < 0$$

Definition (proof, certificate) If we can find a dual feasible $(\lambda, \nu)$, we establish a lower bound on the optimal value of the primal problem: $p^* \geq g(\lambda, \nu)$. Then the point provides proof or certificate that $p^* \geq g(\lambda, \nu)$

Criterion (Karush-Kuhn-Tucker, KKT) Let $x^*, (\lambda^*, \nu^*)$ be any primal and dual optimal points with zero duality gap. Then

$$ f_i(x^*) \leq 0 $$

$$h_i(x^*) = 0$$

$$\lambda^* \geq 0$$

$$\lambda^* f_i(x^*) = 0$$

$$ \nabla f_0 (x^*) + \sum_{i=1}^m \lambda^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla h_i (x^*) = 0 $$

If a convex optimization problem with differentiable objective and constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality


[1] Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.