Skip to content

Latest commit

 

History

History
566 lines (518 loc) · 72.9 KB

cs450.md

File metadata and controls

566 lines (518 loc) · 72.9 KB
typora-copy-images-to
./img

$$ \def\R{\mathbb R} \def\abs#1{\left |#1\right |} \def\norm#1{\left|\left|#1\right|\right|} $$

CS450

GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.

Spring 2018: Advanced Algorithms

[TOC]

Greedy & matroid

  • theory of computation
    • computable/decidable : halt
    • NL : nondeterministic logarithmic-space
    • P : deterministic TM polytime
    • NP : solution verified in polytime by deterministic TM or solved by nondeterministic TM in polytime
    • NP-complete : hardest problem in NP
    • NP-hard : at least as hard as hardest problem of NP, reduced in polytime, maybe not even decidable, not enforced to be elements of NP
    • PSPACE : TM in polyspace
    • EXPTIME : deterministic TM in exptime
    • EXPSPACE: deterministic TM in space
  • minimum spanning tree
    • input : connected undirected graph $G=(V, E)$ with edge-weights $w:E\to\R$
    • output spanning tree $T$ of maximum weights $\sum_{e\in T}w(e)$, of $n-1$ edges
  • greedy algorithm : $\Theta(\abs{E}\log\abs{E})$
    • lines 3-4 : almost linear time using UNION-FIND
    • correctness
      • downward closed : $(V, Y)$ acyclic implies $(V, X)$ acyclic $\forall X\subseteq Y$
      • $X, Y$ acyclic edge sets with $\abs{Y}>\abs{X}$ implies $\exists e\in Y\setminus X$ s.t. $X+e$ acyclic
      • proof : $S={s_1,\ldots, s_{n-1}}$ decreasing spanning tree returned, suppose contradiction, exist higher weight $T={t_1,\ldots,t_{n-1}}$ decreasing, $p$ smallest index s.t. $w(t_p)>w(s_p)$, must exist $e\in{t_1,\ldots, t_p}\setminus{s_1,\ldots, s_{p-1}}$ s.t. ${s_1,\ldots, s_{p-1},e}$ acyclic, should have added $e$ instead
      • acyclic graph with $k$ edges has $n-k$ components
  • matroid : $M=(E,J)$ on finite ground set $E$ and family $J$ of subsets called independent sets
    • $X\subseteq Y$ and $Y\in J$ implies $X\in J$
    • $X, Y \in J$ and $\abs{Y}>\abs{X}$ implies $\exists e\in Y\setminus X$ s.t. $X\cup {e} \in J$
    • independent set often exponential size, assume access to independence oracle (whether $E\supseteq I\in J$)
    • base : all maximal set same cardinality 6A302306-6F0A-4D9A-8EB4-0D2490C6D263
    • Rado/Gale/Edmonds theorm : greedy find maximum weight base iff matroid
      • $\Leftarrow$ : greedy correctness
      • $\Rightarrow$ : suppose not, violate at least one of two axioms
        • downward-closed : $S\subset T$ s.t. $S\not\in J$ and $T\in J$, consider $w_i=\cases{2~~~i \in S\ 1~~~ i\in T\setminus S\ 0~~~o/w}$, pick first $S_1\subset S$, then at most $T\setminus S$, would return $2\abs{S_1}+\abs{T\setminus S}$ which is less than $w(T)$
        • $S, T\in J$ independent sets s.t. $\abs{S}<\abs{T}$ and $\forall i\in T\setminus S$, $S+i\not\in J$, consider $w_i=\cases{1+\frac{1}{2\abs{S}}~~~ i\in S\ 1\qquad\quad~~ i\in T\setminus S\ 0\qquad\quad~~ o/w}$, pick all in $S$, would return $\abs{S}+1/2$ which is less than $\abs{T}$
    • graphic matroid : $E$ edges of graph, $J={F\subseteq E : F \text{ acyclic}}$
    • $k$-uniform matroid : $J={X\subseteq E : \abs{X}\le k}$, rank $k$
    • partition matroid : $J={X\subseteq E : \abs{E_i\cap X}\le k_i}$ for $i=1,\ldots, l$ disjoint sets
    • linear matroid : defined from matrix $A$, $E$ index set of cols, $X\subseteq E$ with $A_X$ matrix of cols indexed by $X$, $J={X\subseteq E : rank(A_X)=\abs{X}}$
    • truncated matroid : $M_k=(E,J_k)$ s.t. $J_k={X\in J:\abs{X}\le k}$
    • matroid intersection : $M_1=(E,J_1)$ and $M_2=(E,J_2)$ give $M_1\cap M_2=(E,J_1\cap J_2)$, not a matroid, verify cardinality of solution
      • Edmonds/Lawler : efficient algorithm for finding max-weight independent set in intersection two matroids
    • laminar matroid : $J={S\subseteq E : \abs{S\cap X}\le k_X~~\forall X\in F}$ with $F$ family of subsets of ground set $E$ either disjoint, subset or supset of each others and ${k_X}_{X\in F}$ positive integers
  • graph orientation : replace each undirected edge by an arc in either direction
  • bipartite matching
    • matching : $M\subseteq E$ s.t. every vertex incident to at most one edge of $M$, $\abs{{e\in M : v\in e}}\le 1~~\forall v\in V$
    • input : bipartite graph $G=(V, E)$
    • output : matching of max size
    • matroid intersection : vertex bipartition $A$ and $B$, $E=\cup{\delta(v):v\in A}$ with $\delta(v)$ edges incident to $v$, by similarity $M_A=(E, J_A)$ with $J_A={F\subseteq E : \abs{F\cap\delta(v)}\le 1~~\forall v\in A}$
    • alternating path : alternate between edges in $M$ and $E\setminus M$
    • augmenting path : alternating path in which first and last vertices unmatched 13FF3A2D-01B5-4C19-B4B7-A00D0699E192
    • maximum matching : iff no augmenting path
      • $\Rightarrow$ : contradiction, $P$ augmenting path, $M'=M\Delta P$ greater cardinality
      • $\Leftarrow$ : contradiction, $M^$ maximum matching $\abs{M^}>\abs{M}$, $Q=M\Delta M^$ more edges from $M^$, each vertex incident to at most one edge in $M\cap Q$ and $M^*\cap Q$, must exist augmenting path
  • colorful spanning trees : intersection of partition matroids
    • input : undirected graph
    • output : spanning tree with all edges different color
  • arborescences : intersection of graphic and partition matroids
    • input : direct graph $D=(V, A)$, special root vertex $r\in V$
    • output : spanning tree from

Linear programming

  • linear programming : optimize linear objective function for $n$ variables $x_1,\ldots,x_n\in\R$ subject to $m$ linear constraints
    • minimize $\sum_{i=1}^n c_i x_i$ s.t. $\sum_i e_{i,j}x_i=b_j$, $\sum_i d_{i,k} x_i\ge g_k$, $\sum_i f_{i,p}x_i\le l_p$
    • relaxed : approximation of optimal result for integer programs
  • simplex method : start from extreme point, move to better neighbors since convex polytope, bad exptime examples
  • ellipsoid method : binary search for optimal value, start surrounding and then center of ellipsoid, if not found cut ellipsoid in two parts using constraint, polytime but slow in pratice
  • convex combination of points : $\sum_{i=1}^n\lambda_i x_i$ with $\sum_{i=1}^n \lambda_i = 1$ and $\lambda_i\in [0,1]$
  • extreme point : if cannot be written as convex combination of other feasible solutions
  • convex polytope : $P={x\in\R^n : Ax=b, x\ge 0}$
    • if bounded : convex polyhedron
  • optimum : if feasible region bounded, there exists an extreme point optimum
    • proof : $x^$ optimal, any feasible point written as convex combination of extreme points $x^{(i)}$, $x^=\sum_i\lambda_i x^{(i)}$, $c$ vector objective, $c^\top x^=c^\top(\sum_i\lambda_i x^{(i)})=\sum_i\lambda_i c^\top x^{(i)}$ implies existence of $x^{(i)}$ s.t. $c^\top x^{(i)}\ge c^\top x^$
  • maximum weight bipartite perfect matching
    • input : undirected graph $G=(V, E)$ partitioned into $A$ and $B$ with edge-weight $w\to\R$
    • output : perfect matching $M$ that maximize $w(M)=\sum_{e\in M} w(e)$
    • LP : maximize $\sum_{e\in E} x_ew_e$ s.t. $\sum_{e=(a,b)\in E}x_e=1~~\forall a\in A$ and $\sum_{e=(a,b)\in E} x_e=1~~\forall b\in B$ with $x_e\ge 0~~\forall e\in E$
    • any extreme point is intergral
      • $G=(V_1,V_2, E)$, suppose contradiction $E_f={e\in E: 0< x_e^* < 1}\not=\emptyset$ must contain cycle $e_1,\ldots,e_{2k}$, let $y_e=\cases{x_e^+\epsilon~~e\in{e_1,\ldots,e_{2k-1}}\ x_e^-\epsilone\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ and $z_e=\cases{x_e^-\epsilone\in{e_1,\ldots,e_{2k-1}}\ x_e^+\epsilon~~e\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ with $\epsilon =\min{x_e^,(1-x_e^):e\in E_f}$, $x^*=\frac{1}{2}(y+z)$ contradicts extreme point, need to show $y,z$ are feasible solution (with respect to constraint)
  • $k$-regular graph : degree of each vertex equals $k$
  • bipartite vertex cover
    • input : undirected graph $G=(V, E)$ with node-weight $w:V\to\R$
    • output : cover s.t. $e\cap C\not=\emptyset~~\forall e\in E$ that minimizes $w(C)=\sum_{v\in C}w(v)$
    • LP : minimize $\sum_{v\in V}x_v w(v)$ s.t. $x_u+x_v\ge 1~~\forall{u, v}\in E$ and $0\le x_v\le 1~~\forall v\in V$
    • any extreme point is intergral : $A_f=V_f\cap A$ and $B_f=V_f\cap B$, same arguments on those sets
  • duality
    • primal : minimize $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } $ s.t. $\sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \geq b _ { j } ~~\forall j = 1,\dots ,m ~~ x\ge 0$
    • dual : maximize $\sum _ { j = 1} ^ { m } b _ { j } y _ { j }$ s.t. $\sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \leq c _ { i } ~~ \forall i = 1,\dots ,n~~ y\ge 0$
    • maximization problem : remplace $x_i$ with $-x_i$ in constraints and objective function
    • primal-feasible : $x$ feasible solution to primal
    • weak : $x$ primal-feasible and $y$ dual-feasible implies $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } \geq \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$
    • strong : $x$ primal optimal solution and $y$ dual optimal solution implies $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } = \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$
    • unbounded primal/dual : dual/primal infeasible
    • examples : max-flow = min-cut
    • complementarity slackness : $x$ primal optimal solution and $y$ dual optimal feasible solution iff $x _ { i } > 0\Rightarrow c _ { i } = \sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \quad \forall i = 1,\dots ,n$ and $y _ { j } > 0\Rightarrow b _ { j } = \sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \quad \forall j = 1,\ldots ,m$
    • short notation
      • primal : $\min \left{ c ^ { T } x : A x \geq b ,x \geq 0\right}$
      • dual : $\max \left{ b ^ { T } y : A ^ { T } y \leq c ,y \geq 0\right}$
  • bipartite maximum cardinality matching
    • input : $G=(A\cup B, E)$
    • output : matching $M$
    • primal : maximize $\sum _ { e \in E } x _ { e }$ s.t. $\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall a \in A$, $\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall b \in B$ and $x_e\ge 0$
    • dual : vertex cover relaxation $C$, minimize $\sum _ { u \in A \cup B } y _ { v }$ s.t. $y _ { a } + y _ { b } \geq 1\quad \text{ for } ( a ,b ) \in E$ and $y_v\ge 0$
    • weak duality : $| M | \leq | C |$
    • Konig theorem : both problem integral, $| M ^ { \star } | = | C ^ { \star } |$
  • minimum cost bipartite perfect matching : continued
    • min-cost : minimize $\sum _ { e \in E } c ( e ) x _ { e }$ s.t. $\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } = 1\quad \forall a \in A$ and $\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } = 1\quad \forall b \in B$ with $x _ { e } \geq 0\quad \forall e \in E$
      • solution : inefficient, take dual to get unweighted graph solvable with augmenting paths
      • rewrite : minimize $\sum _ { e \in E } c ( e ) x _ { e }$ s.t $\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall a \in A$, $- \sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall a \in A$, $\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall b \in B$ and $-\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall b \in B$ with $x _ { e } \geq 0\quad \forall e \in E$
      • dual : maximize $\sum _ { a \in A } \left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \sum _ { b \in B } \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right)$ s.t. $\left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right) \leq c ( e ) \quad \forall e = ( a ,b ) \in E$ with $u _ { a } ^ { + } ,u _ { a } ,v _ { b } ^ { + } ,v _ { b } \geq 0\quad \forall a \in A ,b \in B$
      • simplified dual : replace $u _ { a } ^ { + } - u _ { a }^{-}$ by $u_a$ and same for $v$
  • Hall theorem : $n$-by-$n$ bipartite graph $G=(A\cup B, E')$ has perfect matching iff $| S | \leq | N ( S ) |\forall S\subseteq A$ with $N(S)={b\in B : \exists a\in S(a,b)\in E'}$ denotes neighborhood of vertices in $S$
  • hungarian algorithm
    • intuition : reduce weighted problem to unweighted one by maintain a dual solution feasible at all times
    • minimum cost perfect matching existence : iff feasible dual solution s.t. $u _ { a } + v _ { b } = c ( e )~~\forall e=(a,b)\in M$
    • algorithm : $O(n^3)$ not trivial
      • initialization : $v _ { b } = 0\quad u _ { a } = \min _ { b \in B } c _ { a b }$
      • consider $G ^ { \prime } = \left( A \cup B ,E ^ { \prime } \right)$ with set of tight edge $E ^ { \prime } = \left{ e = ( a ,b ) \in E : u _ { a } + v _ { b } = c ( e ) \right}$
      • find maximum cardinality matching in $G'$ : if perfect matching, stop, else find $S\subseteq A$ s.t. $\abs{S}>\abs{N(S)}$
      • choose $\epsilon > 0$ and improve $u _ { a } ^ { \prime } = \left{ \begin{array} { l l } { u _ { a } + \epsilon } & { \text{ if } a \in S } \ { u _ { a } } & { \text{ if } a \notin S } \end{array} \right.$ and $v _ { b } ^ { \prime } = \left{ \begin{array} { l l } { v _ { b } - \epsilon } & { \text{ if } b \in N ( S ) } \ { v _ { b } } & { \text{ if } b \notin N ( S ) } \end{array} \right.$ which remains dual feasible
        • edges in $S\times N(S)$ : unchanged $+\epsilon -\epsilon$
        • edges in $(A\setminus S)\times(B\setminus N(S))$ : unchanged
        • edges in $(A\setminus S)\times N(S)$ : decreased by $\epsilon$, stop being tight
        • edges in $S\times (B\setminus N(S))$: increased by $\epsilon$ but were not tight (not in $E'$)
      • dual value increase : by $(\abs{S} - \abs{N(S)})\epsilon$
      • to get tight edge choose $\epsilon$ as large as possible for which dual remains feasible : $\epsilon =\min _{e = ( a ,b ) \in S \times ( B\setminus N(S) ) } c ( e ) - u _ { a } - v _ { b } > 0$
  • NP-hard : finding solution likely to be intractable, no algorithm that satisfies following
    • efficient : polynomial time
    • reliable : works for any input instance
    • optimality : find an optimal solution
  • relax reliability : heuristics design to work well in practice
  • relax optimality : approximation algorithms
  • $\alpha$-approximation : runs in polytime and outputs a solution $S$ s.t.
    • minimization : $\frac {\operatorname{cost} ( S ) } { \operatorname{cost} ( \text{Optimal solution) } } \leq \alpha$
    • maximization : $\frac { p r o f i t ( S ) } { p r o f i t ( \text{Optimal solution) } } \geq \alpha$
    • lower bound : because optimum hard to compute, $\frac { \operatorname{cost} ( S ) } { \operatorname{cost} (\text{Optimal solution} ) } \leq \frac { \operatorname{cost} ( S ) } { \text{lower bound on opt } } \leq \alpha$
  • LP popular framework : given formulation as integer LP (binary $x_i\in{0,1}$), relax to LP ($x_i\in[0,1]$), solve LP as $x^*$ and round solution without losing too much mage-20180315220001
  • vertex cover
    • input : $G=(V,E)$, vertices weights $w:V\to\R_+$
    • output : set $C\subseteq V$ of minimum weights s.t. $u\in C$ or $v\in C$ $\forall {u, v}\in E$
    • relax : $x_v\in[0,1]$
    • ILP : minimize $\sum _ { v \in V } w ( v ) x _ { v }$ s.t $x _ { u } + x _ { v } \geq 1\quad\forall{u, v}\in E$ with $x_v\in{0,1}$
    • $2$-approximation LP : $x_v\in[0,1]$ with rounding $C = \left{ v \in V : x _ { v } ^ { * } \geq \frac { 1} { 2} \right}$ which is feasible
      • weight of $C$ : at most twice the optimal $\sum _ { v \in C } w ( v ) = \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } w ( v ) \leq \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } 2x _ { v } ^ { * } w ( v ) = 2\sum _ { v \in V } x _ { v } ^ { * } w ( v ) = 2L P _ { O P T } \leq 2V C _ { O P T }$
  • integrality gap : $g = \max _ { I \in \mathcal { I } } \frac { O P T ( I ) } { O P T _ { L P } ( I ) }$ with $\mathcal I $ set of all instances of given problem, bound power of linear relaxation
    • vertex cover : $n$ vertices, $g \geq \frac { n - 1} { \frac { n } { 2} } = 2- \frac { 2} { n }$
  • set cover
    • input : universe $\mathcal { U } ={e_1,\ldots, e_n}$, family of subsets $\mathcal T={S_1,\ldots, S_m}$ and cost function $c:\mathcal T\to\R_+$
    • output : find collection $C$ of subsets of minimum cost that cover all elements
    • ILP : minimize $\sum _ { i = 1} ^ { m } x _ { i } \cdot c \left( S _ { i } \right)$ s.t. $\sum _ { S _ { i } : e \in S _ { i }} x _ { i } \geq 1\quad\forall e\in\mathcal U$ with $x_i\in{0,1}$
    • LP : $x_i\in[0,1]$ with $C={S_i:x_i^*\ge \frac{1}{f}}$ with each elements belonging to at most $f$ sets, $f$ approximation factor
    • randomized rounding : better
      • algorithm
        • solve LP : get optimal solution $x^*$
        • choose positive integer constant $d$ : start with empty result set $C$, repeat $4 d\ln(n)$ times
        • for $i=1,\ldots, m$ : add set $S_i$ to solution $C$ with probability $x_i^*$ choosing independently for each set
      • expected cost of all sets added : $E[\text{rounded cost}]= \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) P r \left[ S _ { i } \text{ is added } \right] = \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x _ { i } ^ { * } = L P _ { O P T }$
      • expected cost after $d\ln(n)$ executions : at most $d \cdot \ln ( n ) \cdot \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x ^ { * } \leq d \cdot \ln ( n ) \cdot L P _ { O P T } \leq d \cdot \ln ( n ) \cdot O P T$
      • probability constraint unsatisfied after single execution : at most $\frac{1}{e}$ as $x_1+\cdots+x_k\ge 1$ with $1-x\le e^{-x}$
      • probability of feasible solution : at least $1-\frac{1}{n^{d-1}}$, using union bound
      • output feasible solution of cost at most $4d\ln(n)OPT$ : probability greater than $\frac{1}{2}$, use expected cost and Markov inequality to bound cost $Pr[cost > 4\mu]\le\frac{1}{4}$, bound infeasability by $\frac{1}{n^{d-1}}\le \frac{1}{n}$, we have $1-\frac{1}{4}-\frac{1}{n}$ (if disjoint)

Simplex & multiplicative weight update

  • weighted majority algorithm : sequencial game between omniscient adverary and aggregator advised by $N$ expert
    • algorithm with $\frac{1}{2}$
      • assign each expect $i$ weight $w_i^{(1)}=1$
      • for every $t$ : predict yes/non based on weighted majority
      • for every mistaken expert : $w_i^{(t+1)}=w_i^{(t)}/2$
      • any sequence of duration $T$ and expert $i$ : $#\text{majority mistakes}\le 2.41 (#\text{of i's mistakes} + \log_2 N)$
    • potential function : $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
    • $\epsilon$ update rule : $w_i^{(t+1)}=w_i^{(t)}/(1-\epsilon)$ achieve $#\text{majority mistakes}\leq 2(1+\epsilon)(#\text{ of i's mistakes})+O(\log N/\epsilon)$
      • lowerbound : $\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = ( 1- \epsilon ) ^ {\text{ # of } i \text{'s mistakes }}$
      • upperbound : $\Phi ^ { ( T + 1) } \leq \Phi ( 1) \cdot ( 1- \epsilon / 2) ^ { \text{# of WM mistakes} }$, at last half of weights gets halved
    • at worst : twice as many mistakes as best expert
  • hedge
    • setup
      • each expert $i$ gives somes advice
      • allocator picks distribution : $\vec { p } ^ { ( t ) } = \left( p _ { 1} ^ { ( t ) } ,\dots ,p _ { N } ^ { ( t ) } \right)$ over the experts
      • adversary determines cost vector : $\vec { m } ^ { ( t ) } = \left( m _ { 1} ^ { ( t ) } ,\dots ,m _ { N } ^ { ( t ) } \right) \in [ - 1,1] ^ { N }$, negative number means profitable to follow the specific expert
      • allocator suffer : $\vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } = \sum _ { i \in [ N ] } p _ { i } ^ { ( t ) } m _ { i } ^ { ( t ) }$
    • algorithm
      • assign each expert $i$ : $w_i^{(1)}=1$
      • pick distribution : $p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$with $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
      • observe cost : $w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$
    • bounds : $\sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + \frac { \ln N } { \epsilon } + \epsilon T$
    • lowerbound : $\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = \exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)$
    • upperbound : $\Phi ^ { ( t + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \cdot \exp \left( - \epsilon m _ { j } ^ { ( t ) } \right)\leq \Phi ^ { ( 1) } \cdot \exp \left( \epsilon ^ { 2} T - \epsilon \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ as $\exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)\leq \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \left( 1- \epsilon m _ { j } ^ { ( t ) } + \epsilon ^ { 2} \right)\leq \Phi ^ { ( t ) } \cdot \exp \left( \epsilon ^ { 2} - \epsilon \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ using taylor expansion $e ^ { x } \leq 1+ x + x ^ { 2}$ for $x\in[-1,1]$ and $\left( m _ { j } ^ { ( t ) } \right) ^ { 2} \leq 1$
    • generalisation : cost between $[-p,p]^N$ with width $p$
      • if $T \geq \left( 4\rho ^ { 2} \ln N \right) / \epsilon ^ { 2}$ : for any expert $\frac { 1} { T } \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \frac { 1} { T } \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + 2\epsilon$
  • covering LP : associate expert with each constraint of LP and rewrite $\sum _ { i = 1} ^ { n } d _ { i } x _ { i } \geq b$ the weighted sum of all constraints, assign maximum value to variables that have highest ratio constraint coefficient/objective coefficient
    • reduced LP : oracle, minimize $\sum _ { j = 1} ^ { n } c _ { j } x _ { j }$ s.t. $\left( \sum _ { i = 1} ^ { m } p _ { i } A _ { i } \right) \cdot x \geq \sum _ { i = 1} ^ { m } p _ { i } b _ { i }$for $1\geq x \geq 0$,
      • fractional knapsack
        • sort and relabel $x$ s.t. $d_1/c_1\ge\cdots\ge d_n/c_n$
        • let : $k = \min \left{ j : \sum _ { i = 1} ^ { j } d _ { i } \geq b \right}$ and $\ell = b - \sum _ { i = 1} ^ { k - 1} d _ { i }$
        • set : $x_i=1$ for $i=1,\ldots,k-1$, $x_k=\ell/d_k$, $x_i=0$ for $i=k+1,\ldots,n$
    • algorithm
      • assign each constraint $i$ : $w _ { i } ^ { ( 1) }=1$
      • pick distribution : $p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$ with $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
      • cost vector
        • solution to reduced LP : $x^{(t)}$ noting $c ^ { T } x ^ { ( t ) }$ at most cost of optimal solution to LP
        • cost constraint $i$ : $m _ { i } ^ { ( t ) } = \sum _ { j = 1} ^ { n } A _ { i j } x _ { j } - b _ { i } = A _ { i } x - b _ { i }$ (positive weighted if constrained satisfied, to descrease)
      • update : $w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$
      • output : average of constructed solutions $\overline { x } = \frac { 1} { T } \sum _ { t = 1} ^ { T } x ^ { ( t ) }$, almost feasible
    • cost : $c ^ { T } \overline { x } = c ^ { T } \left( \frac { 1} { T } \sum _ { t \in [ T ] } x ^ { ( t ) } \right)$ at most cost of optimal solution since each $x^{(t)}$ no higher cost than optimal solution (properties of reduced LP)
    • parameters : $T = \left( 4n ^ { 2} \ln m \right) / \epsilon ^ { 2}$ leads $\overline x$ to satisfy $\sum _ { e \in S } x _ { e } \geq 1- 2\epsilon$ and obtain feasible approximately optimal solution with $x ^ { * } = \frac { 1} { 1- 2\epsilon } \overline { x }$
  • simplex method : fast in practice, at worst exponential image-20180411012330502
    • slack variables : $s_1,\ldots,s_m$ that compensate for equalities, non-negative image-20180411012306620
    • current objective function : $z$
    • simplex tableau : all constraints and object function
      • non-zero variables : left handside image-20180411012255554
    • maximize one variable with positive coefficient while respecting constraint : pivot between $s_1\leftrightarrow x _ { 2}$ image-20180411012540729
    • each constraint can enforce an upperbound (without having a negative variable), give no upperbound (multiple of the same kind) or say nothing (missing current variable) image-20180411013055374
    • until no increase possible : ignore value of slack variables image-20180411014629695
    • technicalities
      • unboundness : infinite solution
      • degeneracy : might need to pivot two variables to increase one even though current value is zero, avoid to cycle through this swapping by lexicographic ordering
      • initial vertex : phase I to find initial starting point, might require another solver
      • infeasibility : should be detected by phase I

Randomized algorithm

  • randomized algorithms : in addition to input, take random source of bits and produce output
    • correctness probability : $p>0$, run $k$ times lead to $1-(1-p)^k$ (boost)
    • approximation : for small $p$ consider $1-p$ as $e^{-p}$
  • Las Vegas algorithms : randomized that give always correct but polytime in expectation
  • minimum cut problem
    • input : $G = ( V ,E )$ with $n = | V |$
    • output : paritition into two subsets joined with minimum number of edges $\min _ { \emptyset \subset S \subset V } | E ( S ,\overline { S } ) |$ with $E ( S ,\overline { S } ) : = { ( u ,v ) : u \in S ,v \notin S }$
    • contraction procedure : $(u,v)$ in $G$, merge both endpoints to create new node $uv$, might lead to parallel edges
    • hand-shake lemma : in any graph $\sum _ { v \in V } d ( v ) = 2| E |$ with $d(v)$ degrees of $v$
    • Krager : $O(n^4\log n)$ by boosting $O(n^2\log n)$ times as raw $O(n^2)$ with success $O(1/n^2)$ to reach $1-\frac{1}{n}$ success image-20180411155112053
      • size of minimum cut : $| E \left( S ^ { * } ,\overline { S ^ { * } } \right) |=k$
      • uniform random edge in paritition : at most $\mathbb { P } \left[ e \in E \left( S ^ { * } ,\overline { S ^ { * } } \right) \right] = \frac { k } { | E | } \leq \frac { k } { n k / 2} = \frac { 2} { n }$ by hand-shake as $d ( v ) = | E ( { v } ,\overline { { v } } ) | \geq k$
      • idea : choose $n/4$ edges, with probability $1/2$ none of them in min-cut, contract them, new graph will have at most $3n/4$ vertices, recurse, after $O(\log n)$ steps, two super nodes and probability at least $(1/2)^{O(\log n)}$ no edges in min-cut contracted
      • for each contraction, size of minimum cut does not decrease : contract only one edge by on edge to avoid paralel contraction
      • probability of returning min-cut : at least $\frac{2}{n(n-1)}$
        • $A_i$ event that edge picked in step $i$ not in $E \left( S^* ,\overline { S ^ { * } } \right)$
        • by Bayes rules : $\mathbb { P } \left[ A _ { 1} ,\ldots ,A _ { n - 2} \right] = \mathbb { P } \left[ A _ { 1} \right] \mathbb { P } \left[ A _ { 2} | A _ { 1} \right] \mathbb { P } \left[ A _ { 3} | A _ { 1} ,A _ { 2} \right] \ldots \mathbb { P } \left[ A _ { n - 2} | A _ { 1} ,A _ { 2} ,\ldots ,A _ { n - 3} \right]$ as $\mathbb { P } \left[ A _ { i } | A _ { 1} ,\ldots ,A _ { i - 1} \right] \geq 1- \frac { 2} { n - i + 1}$
        • lowerbound : $\mathbb { P } \left[ A _ { 1} ,\dots ,A _ { n - 2} \right] \geq = \frac { 2} { n ( n - 1) } = 1/ \left( \begin{array} { l } { n } \ { 2} \end{array} \right)$
      • collorary : any graph at most $n\choose 2$ min-cuts
    • Krager-Stein : $O(n^2\log^3 n)$ by boosting $\log^2n$ times as raw $O(n^2\log n)$ with success $\Omega(1/\log n)$ to reach $1-1/n$ success image-20180411161211443
      • probability of returning min-cut : at least $1/(2\log n)$
        • probability of not contracting any edge for $n-n/\sqrt{2}$ steps : at least $\frac { r - 2} { r } \cdot \frac { r - 3} { r - 1} \cdots \frac { r / \sqrt { 2} - 2} { r / \sqrt { 2} } \approx \frac { ( r / \sqrt { 2} ) ^ { 2} } { r ^ { 2} } = 1/ 2$
        • show : $\frac { 1} { 2} \left( p + p - p ^ { 2} \right) = p - p ^ { 2} / 2\geq \frac { 1} { 2\log n }$ with $p$ probability of success of any of the two copies, in worst case $p = 1/ ( 2\log ( n / \sqrt { 2} ) )$
  • master method : $T(n)=aT(\frac{n}{b})+f(n)$
    • $f ( n ) = O \left( n ^ { \log _ { b } a - \epsilon } \right)$ : $T ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$
    • $f ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$ : $T ( n ) = \Theta \left( n ^ { \log _ { b } a } \log n \right)$
    • $f ( n ) = \Omega \left( n ^ { \log _ { b } a + \epsilon } \right)$ and $a \cdot f ( n / b ) \leq c \cdot f ( n )$ : $T ( n ) = \Theta ( f ( n ) )$
  • polynomial identity testing : given polynomial $p$, $q$ defined over common set of variables $x$, determine whether $p(x)-q(x)=0$ for all value with oracle access (no individual term known, but can evaluate at specific input)
    • naive : does not handle determinant which cost exponential
    • monomial : product of powers of variables with non negative exponents
      • degree : sum of all exponents involved
    • polynomial : sum of monomials, each reffered as a term
      • degree : largest degree of each term
    • determinant : $\operatorname{det} ( A ) = \sum _ { \sigma : [ n ] \rightarrow [ n ] } \operatorname{sgn} ( \sigma ) \prod _ { i = 1} ^ { n } A _ { i ,\sigma ( i ) }$ with permutation $\sigma$
    • Schwartz-Zippdel lemma : $p \left( x _ { 1} ,\dots ,x _ { n } \right)$ nonzero polynomial of $n$ variables with degree $d$, let $S$ finite subset of $\mathbb R$ with at least $d$ elements in it, if we assign $x_1,\ldots,x_n$ values from $S$ independently and uniformly at random $\mathbb { P } \left[ p \left( x _ { 1} ,\ldots ,x _ { n } \right) = 0\right] \leq \frac { d } { | S | }$
      • base case : $n=1$, univariate case as martix identity testing
      • strong induction : assume lemma holds for less than $n$ variables
        • fix $x _ { 1} ,\dots ,x _ { n - 1}$ arbitrarily : $p \left( x _ { n } \right) = a _ { k } x _ { n } ^ { k } + a _ { k - 1} x _ { n } ^ { k - 1} + \ldots + a _ { 1} x _ { n } ^ { 1} + a _ { 0}$ becomes a univariate polynomial of $x_n$ and degree $k\leq d$
        • thus : $\mathbb { P } \left[ p \left( x _ { n } \right) = 0\right] \leq \frac { k } { | S | } \leq \frac { d } { | S | }$ but need to argue that probability unaffected by choice of $x_1,\ldots, x_{n-1}$ which is not the case
        • argue that adverserial scenario rare
  • matrix identity testing : whether $AB=C$ without matrix multiplication
    • randomized : build random vector $x\in\mathbb R^n$ by choosing independently and uniformly from S, $x _ { i } \sim \text{ Uniform } ( S )$, test $ABx=Cx$ to conclude AB=C
      • false positive : $\mathbb { P } _ { x _ { i } \sim S} [ A B x \neq C x ] \geq 1- \frac { 1} { | S | }$ as only $x_j$ value would satisfy $x _ { j } \left( a _ { i j } - c _ { i j } \right) = - \sum _ { k \neq j } \left( a _ { i k } - c _ { i k } \right) x _ { k }$ and as independent other $x$ have no effect $\mathbb { P } _ { x \sim S } \left[ \left\langle a _ { i } ,x \right\rangle = \left\langle c _ { i } ,x \right\rangle \right] \leq \frac { 1} { | S | }$
      • principle of deferred decision : random choices made only when relevant, careful when relaxing fixed assumptions, must hold regardless of their values
      • long division for polynomials : $p ( x ) = d ( x ) q ( x ) + r ( x )$with $p(x)$ degree $d$, divisor $d(x)$ degree $k\le d$, quotient $q(x)$ degree at most $d-k$ and remainder $r(x)$ degree at most $k-1$
      • largest degree $k$ in all monomial division : $p \left( x _ { 1} ,\dots ,x _ { n } \right) = x _ { n } ^ { k } q \left( x _ { 1} ,\dots ,x _ { n - 1} \right) + r \left( x _ { 1} ,\dots ,x _ { n } \right)$
      • reuse principle of deferred decision : $\mathbb { P } _ { x _ { 1} ,\ldots ,x _ { n - 1} \sim S} \left[ q \left( x _ { 1} ,\ldots ,x _ { n - 1} \right) = 0\right] \leq \frac { d - k } { | S | }$ and if $q\not = 0$ then $p(x_1,\ldots, x_n)$ is univariate in $x_n$ and $x_n^k$ nonzero $\mathbb { P } _ { x _ { n } \sim S } [ p = 0| q \neq 0] \leq \frac { k } { | S | }$
      • finish : $\mathbb { P } [ p = 0] = \mathbb { P } [ p = 0| q = 0] \cdot \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0] \mathbb { P } [ q \neq 0]\leq \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0]\leq \frac { d - k } { | S | } + \frac { k } { | S | } = \frac { d } { | S | }$
  • perfect bipartite graph matching existence
    • perfect matching : involve every vertex in matching
    • deterministic algorithm : $O ( | E | \sqrt { | X | + | Y | } )$
    • randomized : assume squared adjacency matrix $A_{ij}=x_ij$ if $x_i$ and $y_j$ connected, existence of perfect matching iff determinant of $A$ different from $0$
      • $\Rightarrow$ : mapping as bijection can be seen as permutation on integer set $\prod _ { i = 1} ^ { n } A _ { i ,f ( i ) } = \prod _ { i = 1} ^ { n } x _ { i ,f ( i ) }$ monomial similar to determinant when $\sigma=f$
      • $\Leftarrow$ : suppose determinant different from $0$, at least one permutation with all terms non zero, which is a bijection
      • correctness probability : $1-1/n$ for $| S | \geq n ^ { 2}$
      • find the matching : choose $| S | \gg n$ with $x _ { i j } = 2^ { w _ { i j } }$ independently and uniformly at random from $S$, with high probability, $\operatorname{det} ( A ) = 2^ { w ( M ) } ( \pm 1+ [ \text{ even } \text{ number } ] )$ with sum of weight of edges of minimum weight perfect matching, for every $(x_i,y_j)$ delete edge and test whether weight decreases to $w(M)-w_{i,j}$ (then the edge belong to the mapping)
  • general graph matching : exists iff determinant of $A$ nonzero
    • skew-symmetric matrix : Tutte matrix $A_{ij}=x_{ij}$ if vertices $v_i$ and $u_j$ connected with edge $i<j$ or $-x_{ij}$ for $i\ge j$ else $0$

Sampling & concentration inequalities

  • Markov inequality : non-negative, $\mathbb { P } [ X \geq k \cdot \mathbb { E } [ X ] ] \leq \frac { 1} { k }$ or $P(X\ge k)=\frac{E[X]}{k}$ as $\mathbb { E } [ X ] = \sum _ { i } \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } i \cdot \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } k \cdot \mathbb { P } [ X = i ] = k \cdot \mathbb { P } [ X \geq k ]$
    • law of large number : $\sqrt { n } \left( \frac { 1} { n } \sum _ { i = 1} ^ { n } X _ { i } - \mu \right) \rightarrow \mathcal { N } ( 0,1)$ as $n\to\infty$
  • Chebyshev inequality : $\mathbb { P } [ | X - \mathbb { E } X | > \epsilon ] < \frac { \operatorname{Var} ( X ) } { \epsilon ^ { 2} }$ or $\mathbb { P } [ | X - \mathbb { E } [ X ] | > k \sigma ] \leq \frac { 1} { k ^ { 2} } $ as Markov can be applied to variance
    • variance : $\operatorname{Var} ( X ) = \mathbb { E } \left[ ( X - \mathbb { E }[ X] ) ^ { 2} \right] =\mathbb { E } \left[ X ^ { 2} \right] - \mathbb { E } [ X ] ^ { 2}$
    • 2nd moment at least 1st moment squared : as variance always positive
    • standard dev : $\sigma = \sqrt { \operatorname{Var} ( X ) }$
    • pairwise independent : $\mathbb { E } \left[ X _ { i } X _ { j } \right] = \mathbb { E } \left[ X _ { i } \right] \mathbb { E } \left[ X _ { j } \right]$ for all $i,j$ and $\operatorname{Var} \left( X _ { 1} + \cdots + X _ { n } \right) = \operatorname{Var} \left( X _ { 1} \right) + \cdots + \operatorname{Var} \left( X _ { n } \right)$
  • Chernoff inequality : $P(X\ge a)\le e^{-ta}E[e^{tX}]$, avergage of mutually independent random variable
    • Bernouilli sum : $X = \sum _ { i = 1} ^ { n } X _ { i }$, $\mu = \mathbb { E } [ X ] = \sum _ { i = 1} ^ { n } p _ { i }$
      • upper tail : $\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2+ \delta } \mu } \text{ for all } \delta > 0$
      • lower tail : $\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2} \mu } \text{ for all } \delta > 0$
      • application : $\mathbb { P } \left[ | \frac { \sum X _ { i } } { n } - p | \geq \epsilon \right]\leq \mathbb { P } \left[ \sum X _ { i } \geq ( 1+ \epsilon / p ) p n \right] + \mathbb { P } \left[ \sum X _ { i } \leq ( 1- \epsilon / p ) p n \right] \leq e ^ { - \frac { \epsilon ^ { 2} n } { 3} } + e ^ { - \frac { \epsilon ^ { 2} n } { 2} } $
    • bounded sum : $a \leq X _ { i } \leq b \text{ for all } i$, $X = \sum _ { i = 1} ^ { n } X _ { i }$, $\mu = \mathbb { E } [ X ]$
      • upper tail : $\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { 2\delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$
      • lower tail : $\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$
    • proof : $\mathbb { P } \left[ e ^ { s X } \geq e ^ { s a } \right]\leq \frac { \mathbb { E } \left[ e ^ { s X } \right] } { e ^ { s a } }$, independence use to expand $\mathbb { E } \left[ e ^ { s X } \right] = \mathbb { E } \left[ e ^ { s \left( X _ { 1} + X _ { 2} + \cdots + X _ { n } \right) } \right] = \prod _ { i = 1} ^ { n } \mathbb { E } \left[ e ^ { s X _ { i } } \right]$ which can be bounded

Hashing

  • hashing : $h : U \rightarrow { 1,2,\dots N }$ with $N \ll | U |$ and small subset $S\subset U$, uniform and independent mapping
    • table : $T [ h ( x ) ]$, insert, delete, query, with linked list for collisions
    • no worst case guarantee : for fixed hash, use randomly $h$ from family
    • indicator collision : $I_y$ if $h(y)=h(x)$ else $0$
    • collision lenght : $L _ { x } = 1+ \sum _ { y \in S : y \neq x } I _ { y }$ with $\mathbb { E } \left[ L _ { x } \right] = 1+ \frac { | S | - 1} { N }$
    • uniform hash function : record need $| \mathcal { U } | \log N$ bits
  • $2$-universal hash families : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \frac { 1} { N }$ for $x\not=y\in U$, can be relaxed to a small constant
    • design : choose prime $p \in { | U | ,\ldots ,2| U | }$ with field $f _ { a ,b } ( x ) = a x + b \mod p \quad ( a ,b \in [ p ] ,a \neq 0)$ and $h _ { a ,b } ( x ) = f _ { a ,b } ( x ) \mod N$
    • lemma : $a x + b = s \mod p$ and $a y + b = t \mod p$ for $x\not =y$ and $s\not = t$ only have one solution $a = ( x - y ) ^ { - 1} ( s - t )$ and $b = s - a x$ result into $\mathbb P [ f_{a,b}( x ) = s \wedge f_{a,b}( y ) = t ] = \frac { 1} { p ( p - 1) }$ diffrent hash function
      • $2$-universal : $\mathcal { H } = \left{ h _ { a ,b } : a ,b \in [ p ] \wedge a \neq 0\right}$ as $\mathbb { P } \left[ h _ { a ,b } ( x ) = h _ { a ,b } ( y ) \right] = \sum _ { s ,t \in [ p ] : s \neq t } 1 _ { s = t \mod N } \cdot \mathbb { P } \left[ f _ { a ,b } ( x ) = s \wedge f _ { a ,b } ( y ) = t \right]\leq \frac { 1} { p ( p - 1) } \frac { p ( p - 1) } { N }$
      • storage : $O(\log|U|)$ bits
    • expected # collision : $\sum _ { x \neq y \in S } \mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \left( \begin{array} { c } { | S | } \ { 2} \end{array} \right) / N$
    • two-layer tables : $s_i$ collisons at 1st layer, 2nd layer size $s_i^2$, $\mathbb { E } \left[ \sum _ { i } s _ { i } ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i } s _ { i } \left( s _ { i } - 1\right) \right] + \mathbb { E } \left[ \sum _ { i } s _ { i } \right] = \frac { | S | ( | S | - 1) } { N } + | S | \leq 2| S |$
    • table size : adapted as it grows
  • pairwise independence : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ,h ( y ) = t ] = \frac { 1} { N ^ { 2} }$for $x\not=y$ and $s\not= t$
    • $1$-wise : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ] = \frac { 1} { N }$
    • $k$-wise : $f _ { a _ { 0} ,\dots ,a _ { k - 1} } ( x ) = a _ { k - 1} x ^ { k - 1} + \ldots + a _ { 1} x + a _ { 0}$, from the Vandermonde matrix, stored in $O ( k \log | U | )$
    • load balancing
      • balls and bins : $\mathbb { P } [ \text{ bin i gets more than k elements } ] \leq \left( \begin{array} { l } { n } \ { k } \end{array} \right) \cdot \frac { 1} { n ^ { k } } \leq \frac { 1} { k ! }$
      • Stirling formula : $k ! \approx \sqrt { 2n k } \left( \frac { k } { e } \right) ^ { k }$
      • choose : $k = O \left( \frac { \log n } { \log \log n } \right)$ implies $\frac { 1} { k ! } \leq \frac { 1} { n ^ { 2} }$ and $\mathbb { P } [ \exists \text{ bin } \geq k \text{ balls } ] \leq n \cdot \frac { 1} { n ^ { 2} } = \frac { 1} { n }$ giving $\max \operatorname{load} \leq O \left( \frac { \log n } { \log \log n } \right)$ with probability $1-1/n$
    • power of two choices : pick two random bins, choose the one with fewer balls, maximal load drops to $O ( \log \log n )$

Streaming algorithms

  • streaming algorithm
    • stream : $\sigma = \left\langle a _ { 1} ,a _ { 2} ,\dots ,a _ { m } \right\rangle$ with value from universe $[ n ] = { 1,\dots ,n }$
    • from left to right
    • small space : $k$ randomaccess memory bits, want $s = O ( \min { m ,n } )$, holy grail at $s = O ( \log m + \log n )$
    • passes : $p=1$ ideally
  • frequent item deterministrically : $f_1+\cdots+f_n=m$
    • majority problem : output $j$ if $f_j>m/2$ exists, but deterministic space $\Omega ( \min { m ,n } )$
    • frequent problem with parameter $k$ : output set $\left{ j : f _ { j } > m / k \right}$, but deterministic space $\Omega ( \min { m ,n } )$
    • Misra-Gries algorithm : $p=1$, quality $k$
      • algo
        • initialize : empty associate array
        • process : if $j \in k e y s ( A )$, $A [ j ] = A [ j ] + 1$, else if $| k e y s ( A ) | < k - 1$, $A [ j ] = 1$ else foreach $\ell \in k e y s ( A )$, $A [ \ell ] = A [ \ell ] - 1\text{ if } A [ \ell ] = 0$ and remove $0$ counts
        • output : answer questions, report $\hat { f } _ { a } = A [ a ]$
      • space : at most $k-1$ pairs, $O ( k ( \log n + \log m ) )$
      • analysis : $f _ { j } - \frac { m } { k } \leq \hat { f } _ { j } \leq f _ { j }$ as at most $m/k$ decrement, can do a 2nd pass to get desired result
  • number of distinct elements : $d(\sigma)=| \left{ j : f _ { j } > 0\right} |$, proved impossible deterministically
    • guarantee seeked : $\operatorname{Pr} [ d ( \sigma ) / 3\leq A ( \sigma ) \leq 3d ( \sigma ) ] ] \geq 1- \delta$ with space $O ( \log ( 1/ \delta ) \log n )$
    • lemma : exists pairwise independent hash family with $h$ sampled by picking and calculated in $O(\log n)$ random bits
    • zero function : $\operatorname{zeros} ( p ) = \max \left{ i : 2^ { i } \text{ divides } p \right}$, # zeros in binary representation
    • algo
      • initialize : choose $h$ and $z=0$
      • process : $\operatorname{zeros} ( h ( j ) ) > z$ implies $z = \operatorname{zeros} ( h ( j ) )$
      • output : $2^{z+1/2}$
    • analysis : $X_{r,j}$ event for "$\operatorname{zeros} ( h ( j ) ) \geq r$", $Y _ { r } = \sum _ { j : f _ { j } > 0} X _ { r ,j }$, $t$ end value
      • restate : $Y _ { r } > 0\Leftrightarrow t \geq r$ into $Y _ { r } = 0\Leftrightarrow t \leq r - 1$
      • uniformly distributed : $\mathbb { E } \left[ X _ { r ,j } \right] = \operatorname{Pr} [ \operatorname{zeros} ( h ( j ) ) \geq r ] = \operatorname{Pr} \left[ 2^ { r } \operatorname{divides} h ( j ) \right] = \frac { 1} { 2^ { r } }$
      • estimate mean and variance : $\mathbb { E } \left[ Y _ { r } \right] = \frac { d } { 2^ { r } }$ and $\text{Var} \left[ Y _ { r } \right] = \sum _ { j : f _ { j } > 0} \left( \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] - \mathbb { E } \left[ X _ { r ,j } \right] ^ { 2} \right)\leq \sum _ { j : f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] = \sum _ { j \cdot f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } \right] = \frac { d } { 2^ { r } }$
      • inequality : $\operatorname{Pr} \left[ Y _ { r } > 0\right] = \operatorname{Pr} \left[ Y _ { r } \geq 1\right] \leq \frac { \mathbb { E } \left[ Y _ { r } \right] } { 1} = \frac { d } { 2^ { r } }$ and $\operatorname{Pr} \left[ Y _ { r } = 0\right] \leq \operatorname{Pr} \left[ | Y _ { r } - \mathbb { E } \left[ Y _ { r } \right] | \geq \frac { d } { 2^ { r } } \right] \leq \frac { \operatorname{Var} \left[ Y _ { r } \right] } { \left( d / 2^ { r } \right) ^ { 2} } \leq \frac { 2^ { r } } { d }$
      • estimate : $\hat { d } = 2^ { t + 1/ 2}$
      • upper : $a$ smallest integer s.t. $2^ { a + 1/ 2} \geq 3d$, $\operatorname{Pr} [ \hat { d } \geq 3d ] = \operatorname{Pr} [ t \geq a ] = \operatorname{Pr} \left[ Y _ { a } > 0\right] \leq \frac { d } { 2^ { a } } \leq \frac { \sqrt { 2} } { 3}$
      • lower : $b$ largest integer s.t. $2^ { b + 1/ 2} \leq d / 3$, $\operatorname{Pr} [ \hat { d } \leq d / 3] = \operatorname{Pr} [ t \leq b ] = \operatorname{Pr} \left[ Y _ { b + 1} = 0\right] \leq \frac { 2^ { b + 1} } { d } \leq \frac { \sqrt { 2} } { 3}$
      • conclusion : high failure bound, increase approximation or use median trick
    • median trick : run $k$ copies in parallel, mutually independent random hash outputting medians
      • chernoff : expect $\le k\frac{\sqrt{2}}{3}$ to exceed $3d$ with probability $\leq 2^{-\Omega(k)}$, choose $k=\Theta(\log(1/\delta))$ with proability at most $\delta$
    • space : hashing $O(\log n)$, store $z$ $O(\log\log n)$, thus $O(\log(1/\delta)\log n)$
  • frequency vector : $\mathbf { f } = \left( f _ { 1} ,\dots ,f _ { n } \right)$ where $f _ { i } = | \left{ j : a _ { j } = i \right} |$ and $f _ { 1} + f _ { 2} + \dots + f _ { m } = m$
    • second moment : $F _ { 2} = \sum _ { i = 1} ^ { n } f _ { i } ^ { 2}$
  • downsampling : to size at most $O ( \sqrt { n } )$ not expected to get a better estimate than $\sqrt { n }$ of the truth
  • k-wise independend : familiy of hash $H = { h : [ n ] \rightarrow U }$ if for any $k$ distinct $\left( x _ { 1} ,\cdots ,x _ { k } \right) \in U ^ { k }$ and any $\left( u _ { 1} ,\cdots ,u _ { k } \right)$, $P r _ { h \in H } \left[ h \left( x _ { 1} \right) = u _ { 1} \wedge \cdots \wedge h \left( x _ { k } \right) = u _ { k } \right] = \left( \frac { 1} { | U | } \right) ^ { k }$
  • AMS $\ell_2$ frequency vector norm
    • algo
      • init : pick 4-wise independent hash function $h : [ n ] \rightarrow { - 1,+ 1}$, let $\sigma_i=h(i)$ so $\sigma \in { - 1,1} ^ { n }$ and $Z=0$
      • process : element of value $i$ as $Z = Z + \sigma _ { i }$
      • output : $Z^2$ (we have $Z = \sum _ { i = 1} ^ { n } f _ { i } \sigma _ { i }$)
    • analysis
      • unbiased : $\mathbb { E } \left[ Z ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i \in [ n ] } \sigma _ { i } ^ { 2} f _ { i } ^ { 2} \right] + \mathbb { E } \left[ \sum _ { i \neq j \in [ n ] } \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right]= \mathbb { E } \left[ \sum _ { i \in [ n ] } f _ { i } ^ { 2} \right] + \sum _ { i \neq j \in [ n ] } \mathbb { E } \left[ \sigma _ { i } \right] \mathbb { E } \left[ \sigma _ { j } \right] f _ { i } f _ { j }= | f | _ { 2} ^ { 2}$
      • $Z ^ { 4} = \left( \sum _ { i \in [ n ] } \sigma _ { i } f _ { i } \right) \left( \sum _ { j \in [ n ] } \sigma _ { j } f _ { j } \right) \left( \sum _ { k \in [ n ] } \sigma _ { k } f _ { k } \right) \left( \sum _ { l \in [ n ] } \sigma _ { l } f _ { l } \right)= \sum _ { i \in [ n ] } f _ { i } ^ { 4} + 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$
        • all indexes equal : $\sum {i \in [ n ]}\sigma i ^4 f _ { i } ^ { 4} = \sum i \in [ n ] f _ { i } ^ { 4}$
        • indexes matched 2 by 2 : $\left( \begin{array} { l } { 4} \ { 2} \end{array} \right) \sum _ { i < j } \left( \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right) ^ { 2} = 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$
        • single unmatched multiplier : 4-wise independent give $0$ as $\mathbb { E } \left[ \sigma _ { i } \right] = 0$ for any $1\le i\leq n$
      • bounded variance : $\operatorname{Var} \left[ Z ^ { 2} \right] = \sum _ { i } f _ { i } ^ { 4} + 6\sum _ { i &lt; j } f _ { i } ^ { 2} f _ { j } ^ { 2} - \sum _ { i } f _ { i } ^ { 4} - 2\sum _ { i &lt; j } f _ { i } ^ { 2} f _ { j } ^ { 2} = 4\sum _ { i &lt; j } f _ { i } ^ { 2} f _ { j } ^ { 2}\leq 2\left( \sum f _ { i } ^ { 2} \right) ^ { 2}= 2| \text{f} | _ { 2} ^ { 4}$
    • improve precision : repeat iid $t = \frac { 6} { \epsilon ^ { 2} }$ copies with output $Z _ { 1} ^ { 2} ,Z _ { 2} ^ { 2} \cdots Z _ { t } ^ { 2}$ and average $\tilde { Z } ^ { 2} = \frac { 1} { t } \sum _ { i = 1} ^ { t } Z _ { i } ^ { 2}$
      • same mean by linearity
      • smaller variance : $\frac { \operatorname{Var} \left( Z ^ { 2} \right) } { t } \leq \frac { 2} { t } | \mathbf { f } | | _ { 2} ^ { 4}$, by chebyshev $P r \left[ | \tilde { Z } ^ { 2} - | \mathbf { f } | _ { 2} ^ { 2} | &gt; \epsilon | | \mathbf { f } | | _ { 2} ^ { 2} \right]\leq \frac { \left( \frac { 2} { t } \right) \cdot | | \text{f} | | _ { 2} ^ { 4} } { \epsilon ^ { 2} | f | _ { 2} ^ { 4} }\leq \frac { 2} { t \epsilon ^ { 2} }\leq \frac{1}{3}$

Locality-sensitive hashing & approximate nearest neighborhood search

  • nearest neighbor search problem : NNS, set of $n$ points $P \subset \mathbb { R } ^ { d }$, find $\min _ { p \in P } \operatorname{dist} ( p ,q )$ given $q \in \mathbb { R } ^ { d }$
    • distance function : l2
    • sublinear objective : $O(\log n)$ time, $O(n)$ memory
    • $k$-d trees : partition space using coordinate-aligned planed, fail to beat naive looping when $d=\Omega(\log n)$
    • curve of dimensionality : cube of side length $a$ intersects $2^d$ other cells
    • reduction : approximate $\operatorname{dist} ( p ,q ) \leq c \cdot \min _ { s \in P } \operatorname{dist} ( s ,q )$ with $c&gt;1$
      • $ANNS(c,r)$ : build data structure s.t. for any $q$, given $\exists p$ with $\operatorname{dis} t ( p ,q ) \leq r$ returns $\operatorname{dist} \left( p ^ { \prime } ,q \right) \leq c \cdot r$
      • solve $ANNS(c(1-\epsilon),r)$ for $r\in[\delta ,( 1+ \epsilon ) \delta ,( 1+ \epsilon ) ^ { 2} \delta ,\dots ,1]$ with minimum possible distance $\delta$ bit precision $1/\delta$, report minimal value, $O(\log\frac{1}{\delta})$ time and space overhead
    • locally sensitive hash LSH : hash function sensitive to distance, high probability to map to same point if $\operatorname{dist} ( p ,q ) \leq r$ and different point if $\operatorname{dist} ( p ,q ) &gt; c \cdot r$
      • family of functions : $\mathcal { H } \text{ is } \left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right) - \operatorname{L} S H$
      • $\mathcal { H }$ called $\left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right)$-LSH : if for all $p$, $q$ $\operatorname{dist} ( p ,q ) \leq r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \geq p _ { 1}$ and $\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2}$, ideally $p _ { 1} \gg p _ { 2}$
      • boost : $p_1$ close to $1$ and $p_2$ to $\frac{1}{n}$
      • example : L1 and $h _ { i } ( p ) = p _ { i }$ $i$th bit of $p$, $\mathbb { P } [ h ( p ) = h ( q ) ] = \frac { # \text{ bits in common } } { \text{ total bits } } = \frac { d - | p - q | _ { 1} } { d } = 1- \frac { | p - q | _ { 1} } { d }$ is $\left( c ,c \cdot r ,e ^ { - r / d } ,e ^ { - c \cdot r / d } \right)$-LSH
      • make $p_2$ small : $k$ independent hash function, $\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2} ^ { k }$ for $h ( p ) = \left[ h _ { 1} ( p ) ,\ldots ,h _ { k } ( p ) \right]$, choose $\ell$ independent copies of $h$ $f _ { \ell } ( p ) = \left[ h _ { \ell ,1} ( p ) ,\dots ,h _ { \ell ,k } ( p ) \right]$ s.t. $\mathbb { P } \left[ \exists i | f _ { i } ( p ) = f _ { i } ( q ) \right]\geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell }$, choose $k$ s.t. $p_2^k=1/n$, choose $\ell \propto n ^ { \rho } \ln n$
        • time overhead : $O(\ell)$ maps to same value as $\mathbb { E } \left[ # p : \operatorname{dist} ( p ,q ) &gt; c \cdot r ,f _ { i } ( p ) = f _ { i } ( q ) \right] \leq n \cdot p _ { 2} ^ { k } \leq 1$
      • make $p_1$ close to $1$ : if $\operatorname{dist} ( p ,q ) \leq r$, $\mathbb { P } \left[ \exists i : f _ { i } ( p ) = f _ { i } ( q ) \right] \geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell } = 1- \left( 1- p _ { 2} ^ { \rho k } \right) ^ { \ell } = 1- \left( 1- n ^ { - \rho } \right) ^ { \ell } \approx 1- e ^ { \ell n ^ { - \rho } } = 1- 1/ n$
      • assume : $p _ { 1} = p _ { 2} ^ { \rho }$ with $\rho &lt;1$ determines running time/memory
      • algo image-20180501112215131
      • complexity : memory $O \left( n ^ { 1+ \rho } \right)$ and time $O \left( n ^ { \rho } \right)$
        • space : maintains $O(\ell)$ hash tables, store $n = | P |$ hash value of $k$ dimensional vector, $O ( \ell \cdot n \cdot k ) = O \left( n ^ { 1+ \rho } \ln n \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right)$
        • time : compute $f_i(q)$, $dist(p,q)$, $O ( d ( \ell + | O | ) + \ell \cdot k ) = O \left( n ^ { \rho } \ln ( n ) \left( d + \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right) + d \right)$ with $|O|$ number of points at distance $c\cdot r$ from $q$

Submodular function optimization

  • set function : $f : 2^ { N } \rightarrow \mathbb { R }$ assume real value to every subset $S \subseteq N$
    • marginal contribution $u$ to $S$ : $f ( u | S ) = f ( S \cup { u } ) - f ( S )$
    • submodular : $f ( A ) + f ( B ) \geq f ( A \cup B ) + f ( A \cap B )$ for all $A ,B \subseteq N$ or iff $f ( u | A ) \geq f ( u | B )$ for all $A \subseteq B \subseteq N$ and each $u \in N \setminus B$ , diminishing returns, discrete derivation non-increasing analog to concavity
      • $\Rightarrow$ : $f ( A \cup { u } ) + f ( B ) \geq f ( A \cup { u } \cup B ) + f \left( \left( A \cup { u } \right) \cap B \right)=f ( { u } \cup B ) + f ( A )$
      • $\Leftarrow$ : $f ( D ) - f ( C \cap D ) = \sum _ { i = 1} ^ { h } f \left( ( C \cap D ) \cup D _ { i } \right) - f \left( ( C \cap D ) \cup D _ { i - 1} \right) \geq \sum _ { i = 1} ^ { h } f \left( d _ { i } | \left( C \cup D _ { i - 1} \right) \right) = f ( C \cup D ) - f ( C )$ with $C ,D \subseteq N$, $D \backslash C = \left{ d _ { 1} ,d _ { 2} ,\dots ,d _ { h } \right}$ and $D _ { i } = \left{ d _ { j } : 1\leq j \leq i \right}$
    • examples
      • size of graph cut : $\delta ( S ) = | { ( u ,v ) : u \in S ,v \in V\setminus S } |$ with $ S\subseteq V$, $\delta ( v | S ) = E ( v ,V | S \cup { v } ) ) - E ( v ,S )$
      • union count : $c ( S ) = | \cup _ { i \in S } T _ { i } |$ with collection of sets $T _ { 1} ,T _ { 2} \dots ,T _ { n }$ s.t. $T _ { i } \subseteq \mathbb { N }$, $c ( i | S ) = | T _ { i } \cup \cup _ { j \in S } T _ { j } |- | \cup _ { j \in S } T _ { j } | =| T _ { i } \backslash \cup _ { j \in S } T _ { j } |$
      • maximal independent matroid : $r ( A \cup B ) + r ( A \cap B ) = | D | + | C | \leq | D \cap ( A \cup B ) | + | D \cap ( A \cap B ) | = | D \cap B | + | D \cap A | \leq r ( B ) + r ( A )$
      • text summaries
      • influence maximization
      • entropy : $H(A)=- \sum _ { a } \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \log \left( \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \right)$
    • submodular minimization : $f ( S ) = \min _ { S ^ { \prime } \subseteq N } f \left( S ^ { \prime } \right)$
      • ellipsoid method : decide whether $x$ optimal or compute subgradient $\nabla g ( x )$ in polytime subject to poly many inequality
    • Lovasz extension : $\hat { f } ( z ) = \mathbb E _ { \lambda \sim [ 0,1] } \left[ f \left( \left{ i : z _ { i } \geq \lambda \right} \right) \right]$ uniformly for every $z \in [ 0,1] ^ { n }$
      • explicit : $z _ { 0} = 1\geq z _ { 1} \geq z _ { 2} \geq \dots \geq z _ { n } \geq 0= z _ { n + 1}$, $S _ { 0} = \emptyset \subseteq S _ { 1} \subseteq S _ { 2} \subseteq \cdots \subseteq S _ { n } = [ n ]$, if $\lambda \in [ z_i ,z_{i + 1})$ (probability equal to $z _ { i } - z _ { i + 1}$) for $i \in [ n ] \cup { 0}$ then $\left{ j | z _ { j } \geq \lambda \right} = S _ { i }$ so $\hat { f } ( z ) = \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right)$
      • assuming constant time evaluation oracle
      • $\hat f$ convex iff $f$ submodular : only "if" part by LP duality
        • assume : $f ( 0) = 0$ or define $f ^ { \prime } = f - f ( 0 )$
        • primal : $\hat f(z)=g ( z ) = \max _ { x } z ^ { T } x$ s.t. $\sum _ { i \in S } x _ { i } \leq f ( S )$ $\forall S \subseteq N$ and $\sum _ { i \in N } x _ { i } = f ( N )$ with $g(z)$ convex
        • maximum of linear function over convex set always convex : $g \left( \lambda z _ { 1} + ( 1- \lambda ) z _ { 2} \right) = \max _ { x } \left( \lambda z _ { 1} ^ { T } x + ( 1- \lambda ) z _ { 2} ^ { T } x \right) \leq \lambda \max _ { x } z _ { 1} ^ { T } x + ( 1- \lambda ) \max _ { x } z _ { 2} ^ { T } x =\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$
        • dual : $\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$ s.t. $\sum _ { S \geq i } y _ { S } = z _ { i }$ $\forall i \in N$ and $y _ { 5} \geq 0$ $\forall S \subset N$
        • weak duality : enforce $z ^ { T } x \leq \sum _ { S \subset N } y _ { S } f ( S )$
        • want to guess optimal solutions : verify optimality as any feasible optimal guess satisfies $z ^ { T } x = \sum _ { S \subset N } y _ { S } f ( S )$
        • assume : $z_0=1\geq z _ { 1} \geq z _ { 2} \geq \ldots \geq z _ { n } \geq 0= z _ { n + 1}$, $S_i={ 1,2,\dots ,i }$
        • define : $x _ { i } ^ { * } = f \left( S _ { i } \right) - f \left( S _ { i - 1} \right)$ and $y _ { S } ^ { * } = \left{ \begin{array} { l l } { z _ { i } - z _ { i + 1} } & { \text{ for } S = S _ { i } \text{ with } i \in [ n ] } \ { 0} & { \text{ otherwise } } \end{array} \right.$
        • claim
          • $z ^ { T } x ^ { * } = \hat { f } ( z )= \sum _ { i = 1} ^ { n } z _ { i } \cdot \left( f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) \right)= \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right) = \sum _ { S \subseteq N } y _ { S } ^ { * } f ( S )$
          • $y^\star$ dual feasible : all non-negative entries $\sum _ { S : S \geq i } y _ { S } ^ { * } = \sum _ { j \geq i } y _ { S _ { j } } ^ { * } = \left( z _ { i } - z _ { i + 1} \right) + \left( z _ { i + 1} - z _ { i + 2} \right) + \cdots + \left( z _ { n - 1} - z _ { n } \right) + \left( z _ { n } - z _ { n + 1} \right) = z _ { i } - z _ { n + 1} = z _ { i }$
          • $x^*$ dual feasible : need $\sum _ { i \in S } x _ { i } ^ { * } \leq f ( S )$ by induction rearrange $f ( S ) + f \left( S _ { i - 1} \right) \geq f \left( S \cup S _ { i - 1} \right) + f ( S \cap S _ { i - 1} ) = f \left( S _ { i } \right) + f ( S \setminus { i } )$ into $f ( S ) \geq f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) + f ( S \backslash { i } ) = x _ { i } ^ { * } + f ( S \backslash { i } ) \geq \sum _ { i \in S } x _ { i } ^ { * }$
      • convex closure : by strong duality, $f ^ { - } : [ 0,1] ^ { n } \rightarrow \mathbb { R }$ s.t if $D ^ { - } ( z ) \text{ for } z \in [ 0,1] ^ { n }$ is distribution on subsets with margin probabilities given by z ($P r _ { S \sim D - ( z )} [ u \in S ] = z _ { u }$) then $f ^ { - } ( z ) = \mathbb { E }_ { S \sim D ^ -{ ( z ) } } [ f ( S ) ] $ which is minised
  • cardinality constrained monotone maximization
    • assume : all functions $f : 2^ { N } \rightarrow \mathbb { R }$ non-negative, monotone $f ( S ) \leq f ( T );\forall S \subseteq T \subseteq N$ and normalized $f ( \emptyset ) = 0$
    • constrained : find set $S\subseteq N$ of size at most $k$
    • generalize greedy to submodular : choose $u\not\in S$ with maximal marginal gain $f(u\mid S)$, suboptimal image-20180607191853388
      • if $f ( S ) = \sum _ { e \in S } w _ { e }$ : standard greedy
      • constant factor approximation : $f(S)\ge ( 1- 1/ e ) f ( O) \approx 0.632f ( O)$
        • claim : $\sum_{e \in T \backslash S} f ( e | S ) \geq f ( T \cup S ) - f ( S )$ for $S ,T \subseteq N$ as elements of $T \backslash S$ can be ordered $e _ { 1} ,e _ { 2} ,\dots ,e_{| T \setminus S |} $ and let $T_i$ set containg first $i$ elements of this ordering, we have $T _ { 0} = \emptyset$, $T _ { | T | S | } = T \backslash S$, and $\sum _ { e \in T \backslash S } f ( e | S ) = \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \right) \geq \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \cup T _ { i - 1} \right)= \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) =\ \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) $
        • proof : since $f​$ monotone, assume $| O | = k​$ as we can add elements without decreasing value, let $S_i​$ be $\left{ u _ { j } : j \leq i \right}​$ containing first $i​$ elements selected, we have $S _ { 0} = \emptyset​$ and $S _ { k } = S​$, thus $k \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq | O \setminus S _ { i - 1} | \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq \sum _ { e \in O \setminus S _ { i - 1} } f \left( e | S _ { i - 1} \right) \geq f \left( S _ { i - 1} \cup O \right) - f \left( S _ { i - 1} \right) \geq f ( O ) - f \left( S _ { i - 1} \right)​$ which give $f \left( S _ { i } \right) \geq \left( 1- \frac { 1} { k } \right) f \left( S _ { i - 1} \right) + \frac { 1} { k } f ( O )​$ (induction) and $\left( 1- \frac { 1} { k } \right) ^ { k } \leq e ^ { - 1}​$ since $1+ x \leq e ^ { x }​$
    • can be generalized to any matroid constraints
  • unconstrained submodular maximization : greedy from two sides image-20180607233845503
    • $1/3$-approximation
      • claim : for every $1\leq i \leq n $ $a _ { i } + b _ { i } \geq 0$ as $\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cap \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = X _ { i - 1}$ and $\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cup \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = Y _ { i - 1}$ for $X _ { i - 1} \subseteq Y _ { i - 1}$ and $u _ { i } \in Y _ { i - 1}$ thus $a _ { i } + b _ { i } = f \left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) + f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \ge 0$
      • claim : $[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) ] - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \geq f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right)$ for $O P T _ { i } = \left( O P T \cup X _ { i } \right) \cap Y _ { i }$, we have in case $a_1\ge b_i $ $X _ { i } = X _ { i - 1} \cup \left{ u _ { i } \right}$, $Y _ { i } = Y _ { i - 1}$ with $f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$, when $f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$ give $f \left( O P T _ { i } \right) - f \left( O P T _ { i - 1} \right) = f \left( u _ { i } | O P T _ { i - 1} \right) \geq f \left( u _ { i } | ( Y _ { i - i } \setminus \left{ u _ { i } \right} \right) )= f \left( Y _ { i - 1} \right) - f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = - b _ { i } \geq - a _ { i }$
      • proof : $\sum _ { i = 1} ^ { n } \left[ f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right) \right] \leq \sum _ { i = 1} ^ { n } \left[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) - f \left( X _ { i - 1} \right) - f \left( Y _ { i - 1} \right) \right]$ give $f \left( O P T _ { 0} \right) - f \left( O P T _ { n } \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right) - f \left( X _ { 0} \right) - f \left( Y _ { 0} \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right)$ which results in $f ( O P T ) \leq 3\cdot f \left( X _ { n } \right)$

Online algorithms

  • difference from streaming algo : online algo need a decision before seen all the stream
  • competitive ratio : ratio of cost to best cost assuming knowledge of everything in advance $\max _ { I } \frac { A L G ( I ) } { O P T ( I ) }$, for maximization $\min _ { I } \frac { A L G ( I ) } { O P T ( I ) }$
    • alternative definition : $A L G ( I ) \leq r \cdot O P T ( I ) + c$ with $c$ constant and $r$ competitive ration, when $c=0$, $r$ called strong competitive ration
  • rent or buy skis : $R$ or $B$, rent until $B$ reached, then buy, give competitive ration of $2$
  • monotone submodular function maximizations with cardinality constraint : $poly(k)$ memory independent of the stream length
    • examples : document summarization ( $k$ out of many documents), influence maximization (give away $k$ product)
    • assuming : know $\text {OPT} = \max _ { S \subseteq N : | S | \leq k } f ( S )$, we have $\mathrm { OPT } / 2 \leq f ( S )$ image-20180610110945461
    • proof : ordered elements $e _ { 1 } , e _ { 2 } , \ldots , e _ { n }$, intermediate set $S_i$ and $S=S_n$
      • case $|S|=k$ : select only items with marginal contribution greater, $f ( S ) = f \left( \left{ e _ { i _ { 1 } } , e _ { i _ { 2 } } , \dots , e _ { i _ { k } } \right} \right)\geq k \cdot \frac { \mathrm { OPT } } { 2 k } = \frac { \mathrm { OPT } } { 2 }$
      • case $|S|&lt;k$ : items not selected, $f ( e | S ) &lt; O P T / ( 2 k )$, optimal solution ordering $\left{ o _ { 1 } , o _ { 2 } , \ldots , o _ { k ^ { \prime } } \right} = O \backslash S$, by monotonicity $\mathrm { OPT } \leq f ( S \cup O )=f ( S ) + f ( O | S ) = f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \cup \left{ o _ { 1 } , \ldots , o _ { i - 1 } \right} \right)\leq f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \right)\leq f ( S ) + k ^ { \prime } \cdot \frac { \mathrm { OPT } } { 2 k } $
  • caching
    • assume : cache miss bring memory page in cache
    • deterministic : $k$ cache size
      • least recently used (LRU) : competitive ratio $k$, divide request sequence into phases, phase àià beings at first time $k$-th disinct page seen after phase $i-1$ began, $OPT$ makes at least one cache miss per new phase, LRU at most $k$ per phase
      • first in first out (FIFO) : competitive ratio $k$
      • least frequently used (LFU) : unbounded competitive ratio
      • last in first out (LIFO) : unbounded competitive ratio
      • cannot do better than $k$ deterministic : look at cache content and request another page, this page will be request no sooner that $k$ request
    • randomized : marking algo, each page has 1-bit marked when page recently used image-20180610120925528
      • competitive ratio : $2 H _ { k } = 2 \cdot \left( \frac { 1 } { 1 } + \frac { 1 } { 2 } + \ldots + \frac { 1 } { k } \right) = O ( \log k )$
      • no randomized algo has better than $H_k$
    • secretary problem : random arrival order, $n$ unique candidates, hire or not at arrival (1 position only)
      • good strategy : exploring first $n/2$ candidates, take first better than first half, $Pr[\text { hiring best candidate } ] \geq \text { Pr } [ \text { Second best among first } n / 2 ] \cdot \operatorname{Pr} [ \text { best candidate in } 2 \text { nd half } ] \geq ( 1 / 2 ) ( 1 / 2 ) = 1 / 4$
      • bad strategy : hiring first, $\operatorname{Pr} [ \text { hiring best candidate } ] = 1 / n$
      • optimal strategy : start selecting after $r-1$ candidates, $\operatorname{Pr} [ \text { hiring best candidate } ] = \sum _ { i = 1 } ^ { r - 1 } 0 + \frac { 1 } { n } \sum _ { i = r } ^ { n } \mathrm { P } \mathrm { r } \text { [2nd best of the first i applicants is in first } r - 1 | i \text { best } ]= \frac { 1 } { n } \sum _ { i = r } ^ { n } \frac { r - 1 } { i - 1 }$, optimizing $\frac { r - 1 } { n } \cdot \sum _ { i = r } ^ { n } \frac { 1 } { i - 1 } \approx \frac { r } { n } \int _ { r } ^ { n } \frac { 1 } { x } d x = \frac { r } { n } \cdot \ln ( n / r )$ gives $r=n/e$ with prob $1/e$

Spectral graph theory

  • spectral graph theory : studies how eigenvalues and eigenvectors of adjaceny matrix relate to combinatorial properties
  • adjency matrix : $G=(V,E)$ of $|V|=n$ is matrix $\mathbb{R}^{n\times n}$ with $A_{ij}=1$ iff ${i,j}\in E$
  • $d$-regular : degree of each vertex is $d$
  • eigenvector $v$ with eigenvalue $\lambda$ : $M v = \lambda v$, basis
  • normalized adjency of $d$-regular : $\frac{1}{d}A$, call random walk matrix
    • symmetric : $n$ distinct real eigenvalues $\lambda _ { 1 } \geq \lambda _ { 2 } \geq \dots \geq \lambda _ { n }$, orgonal
    • signal $x$ on graph : $y=Mx$ give $y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d }$ which is average value according to $x$ of $v$'s neighbours
      • $\lambda_1\le 1$ : $y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d } \leq \sum _ { { i , j } \in E } \frac { x ( i ) } { d } = x ( i )$ for $x(i)$ maximized
      • $\lambda _ { 2 } = 1 \Leftrightarrow \text { G is disconnected}$
        • suppose disconnected : define $v _ { 2 } ( i ) = \left{ \begin{array} { l l } { 1 / | S | } & { \text { if } i \in S } \ { - 1 / | V \backslash S | } & { \text { if } i \in V \backslash S } \end{array} \right.$ perpendicular to all $1$'s vector $v_1$, we have $y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( i ) = v _ { 2 } ( i )$ hence $M v _ { 2 } = v _ { 2 }$
        • suppose connected : $v _ { 2 } \perp v _ { 1 }$ but at least $v _ { 2 } ( i ) \neq v _ { 2 } ( j )$ hence one $j^\star$ s.t. $v _ { 2 } ( i ) &gt; v _ { 2 } \left( j ^ { * } \right)$ and $y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) \leq \frac { 1 } { d } \left( \sum _ { { j , i } \in E : j \neq j ^ { * } } v _ { 2 } ( i ) + v _ { 2 } \left( j ^ { * } \right) \right) &lt; v _ { 2 } ( i )$
      • $\lambda _ { n } = - 1 \Leftrightarrow \text { one component of } G \text { is bipartite }$
    • number of connected component : $| \left{ i | \lambda _ { i } = 1 \right} |$
  • mixing time : $M^kp$, connected or never reach uniform distribution, in case of bipartite again same issue
    • convergence : $| M ^ { k } p - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq o \left( \frac { 1 } { n ^ { 2 } } \right)$ if $\max \left( | \lambda _ { 2 } | , | \lambda _ { n } | \right) \leq 1 - \epsilon$ after $k = \frac { c } { \epsilon } \log n$ or $O \left( \frac { 1 } { \epsilon } \log n \right)$ steps at any vertex with prob $\approx \frac { 1 } { n }$
    • proof : basis $p = \sum _ { i = 1 } ^ { n } \alpha _ { i } v _ { i }$ with $\alpha _ { i } = \left\langle p , v _ { i } \right\rangle$ and $\alpha _ { 1 } = \left\langle p , v _ { 1 } \right\rangle = \frac { 1 } { \sqrt { n } }$ we have $| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } = | \sum _ { i = 2 } ^ { n } \alpha _ { i } \lambda _ { i } ^ { k } v _ { i } | _ { 2 }= \sum _ { i = 2 } ^ { n } | \lambda _ { i } | ^ { k } | \alpha _ { i } v _ { i } | _ { 2 }\leq ( 1 - \epsilon ) ^ { k } \sum _ { i = 2 } ^ { n } | \alpha _ { i } v _ { i } | _ { 2 }= ( 1 - \epsilon ) ^ { k } | | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | | _ { 2 }$ since $| \lambda _ { i } | \leq ( 1 - \epsilon )$, take $k = \frac { c } { \epsilon } \log n$ and $A = | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | _ { 2 } \leq 1$ give $| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq A ( 1 - \epsilon ) ^ { \frac { \log n } { \epsilon } c } \leq A \left( \frac { 1 } { e } \right) ^ { c \log n } \leq \frac { 1 } { n ^ { c } }$
  • connectivity : more balanced cut might be better than optimal silly one
  • conductance
    • cut : cut $S\subseteq V$, $h ( S ) = \frac { | \delta ( S ) | } { d \cdot \min { | S | , | V \setminus S | } }$ with set of edges crossing $\delta(S)$, rough measures fraction of edges that leaves $S$
    • graph : $h ( G ) = \min _ { S \subset V } h ( S )$
  • Cheeger's inequalities : $\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G ) \leq \sqrt { 2 \left( 1 - \lambda _ { 2 } \right) }$
    • algo spectral-partitioning : very fast $O ( | V | \log | V | + | E | )$, $v_2$ can be approximated image-20180610160430453
    • Rayleigh quotient : $\frac { x ^ { T } M x } { x ^ { T } x }$
      • first eigenvalue : $\lambda _ { 1 } = \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$
        • $\lambda _ { 1 } \leq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ : $\frac { v _ { 1 } ^ { T } M v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \frac { v _ { 1 } ^ { T } \lambda _ { 1 } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 } \frac { v _ { 1 } ^ { T } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 }$
        • $\lambda _ { 1 } \geq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ : $\frac { y ^ { T } M y } { y ^ { T } y } = \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } \lambda _ { i } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } \leq \lambda _ { 1 } \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } = \lambda _ { 1 }$
      • second eigenvalue : $\lambda _ { 2 } = \max _ { x \in \mathbb { R } ^ { n } : x \perp x _ { 1 } } \frac { x ^ { T } M x } { x ^ { T } \cdot x }$
    • proof $\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G )$ : $1 - \lambda _ { 2 } = \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ give Laplacian with $= \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ looks like contiunous relaxation of cut problem, assume $| S | \leq | V | / 2$ and define $y ( i ) = \left{ \begin{array} { l l } { ( 1 - | S | / | V | ) } & { \text { if } i \in S } \ { - | S | / | V | } & { \text { if } i \notin S } \end{array} \right.$ we have $\sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } = \sum _ { { i , j } \in E } ( 1 - | S | / V | + | S | / | V | ) ^ { 2 } = | \delta ( S ) |$ and $y ^ { T } y = \sum _ { i \in V } y ( i ) ^ { 2 } = | S | ( 1 - | S | / | V | ) ^ { 2 } + ( | V | - | S | ) ( | S | / | V | ) ^ { 2 } = \frac { | S | | V | S | } { | V | } \geq | S | / 2 $ so $1-\lambda_2\leq \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y }$ and $\frac { 1 - \lambda _ { 2 } } { 2 } \leq \frac { 1 } { 2 } \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y } \leq \frac { 1 } { 2 } \frac { | \delta ( S ) | } { d \cdot | S | / 2 } = \frac { | \delta ( S ) | } { d \cdot | S | } = h ( S )$