Advanced Decomposition Methods Part I: all is one, one is all
Antonio Frangioni
Dipartimento di Informatica, Universit`a di Pisa
COST/MINO Ph.D. School on Advanced Optimization Methods
Roma — June 8, 2016
Outline
1 Block-Structured (Mixed-Integer NonLinear) Programs
2 Dual decomposition (Dantzig-Wolfe/Lagrangian/Column Generation)
3 Primal decomposition (Benders’/Resource)
4 The Integer Case
5 Example Applications (the Reformulation before the Reformulation)
6 Conclusions (for now)
Block-Structured Programs
Many applications of Mixed-Integer NonLinear Programming are large-scale: millions/billions of variables/constraints
Good news: (almost) all large-scale problems areblock-structured Usually several nested forms of structure, but two main ones:
E1
A
E1
block-diagonal staircase-structured
≡complicating constraints ≡ complicating variables
Relaxingconstraints /fixing variables yieldsindependent subproblems
=⇒ much easierbecause of size and/or structure (integrality, . . . )
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 3 / 42
Example I: Two-stage Stochastic (Linear) Programs
Problems involvingdecisions over timeanduncertainty First-stage (here-and-now) decisions x , constraints E0x ≤ b0
Set S of scenarios, realization known only after deciding x Recoursedecisions zs,differentfor each scenario s ∈ S , constraints E0sx + Eszs ≤ bs
Minimize here-and-now cost plus average cost of reserve actions min
n
c0x +P
s∈Sπscszs : E0x ≤ b0 , E0sx+ Eszs ≤ bs s ∈ S o Extends to multi-stage(structure repeats “fractally” into each Es) Oftenother structures inside E ,network a common one
Extends to nonlinear risk measures (CVaR, . . . ), integer variables, . . . Many applications: energy[1], water, logistics, telecom, finance, . . .
[1] Tahanan, van Ackooij, F., Lacalandra “Large-scale Unit Commitment under uncertainty” 4OR 2015
Example II: (Linear) Multicommodity Network Design
Graph G = (N, A),multicommodity network design model minP
k∈K
P
(i , j )∈Adkcijkxijk+P
(i , j )∈Afijzij (1)
X
(i , j )∈A
xijk − X
(j ,i )∈A
xjik =
( 1 if i = sk 1 if i = tk 0 otherwise
i ∈ N , k ∈ K (2) P
k∈Kdkxijk ≤ uijzij (i , j ) ∈ A (3)
xijk ∈ [0, 1] (i , j ) ∈ A , k ∈ K (4)
zij ∈ {0, 1} (i , j ) ∈ A (5)
K ≡ commodities ≡ (sk, tk, dk) (not completely generic) Pervasive structure in most of combinatorial optimization Many applications: logistic, transportation, telecom, energy, . . . Nonlinear objective function/constraints(energy, delay[2], . . . )
[2] F., Galli, Scutell`a “Delay-Constrained Shortest Paths: Approximation Algorithms and SOCP Models” JOTA, 2015
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 5 / 42
Dual decomposition, a.k.a.
Inner Approximation
Dantzig-Wolfe decomposition Lagrangian Relaxation
Column Generation
Block-diagonal Convex (Linear) Program
Block-diagonal program: convex X, n “complicating” constraints (Π) max { cx : Ax = b, x ∈ X }
e.g, X = { x : Ex ≤ d } =N
k∈K Xk = { xk : Ekxk ≤ dk} (|K | large =⇒ (Π) verylarge), Ax = b linking constraints We can efficiently optimize upon X, for different reasons:
a bunch of (many, much) smaller problems instead of a large one X has (the Xk have)structure (shortest path, knapsack, . . . ) (much more so than solving the whole of (Π), anyway)
In other words wecould efficiently solve (Π) if linking constraints were removed: how to exploit it?
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 7 / 42
Dantzig-Wolfe reformulation
Dantzig-Wolfe reformulation[3]: X convex =⇒ represent it by points X = x = Px ∈X¯ x θ¯ x¯ : P
¯
x ∈X θx¯= 1 , θx¯≥ 0 x ∈ X¯ then reformulate (Π)in terms of the convex multipliers θ
(Π)
max c P
¯
x ∈X x θ¯ x¯
A P
¯
x ∈X x θ¯ x¯
= b P
¯
x ∈X θ¯x = 1 , θ¯x ≥ 0 x ∈ X¯ only n + 1 rows (but how many columns?)
note that“¯x ∈ X ” is an index, not a constraint (θ is the variable)
A rather semi-infiniteprogram, but“only” ¯x ∈ ext X needed Not that this makes it any less infinite, unless
X is a polytope (compact polyhedron) =⇒finite set of vertices
[3] Dantzig, Wolfe “The Decomposition Principle for Linear Programs” Operations Research 1960
Dantzig-Wolfe reformulation (cont.d)
Could this ever be a good idea? Actually, it could:
polyhedra may have few facesandmany vertices . . . orvice-versa n-cube |xi| ≤ 1 ∀ i 2n faces 2n vertices n-co-cube P
i|xi| ≤ 1 2n faces 2n vertices Except, most oftenthe number of vertices is too large
AXe = b AXe = b ee = 1 ee = 1 Ax = b Ax = b
Ex ) d Ex ) d
AXe = b AXe = b ee = 1 ee = 1
a (linear) program with (exponentially/infinitely) many columns But,efficiently optimize over X =⇒ generate vertices (≡ columns)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 9 / 42
Dantzig-Wolfe decomposition ≡ Column Generation
B ⊂ X (small), solve restriction of (Π)with X → B, i.e.,
(ΠB)
max P
¯
x ∈B(c ¯x ) θ¯x
P
¯
x ∈B(A¯x ) θ¯x = b P
¯
x ∈B θx¯ = 1 , θx¯ ≥ 0 x ∈ B¯
“master problem” (B small, not too costly)
note how the parentheses have moved: linearityis needed (for now)
If B contains the “right” columns, x∗ =P
¯
x ∈B x θ¯ x∗¯ optimal for (Π) How do I tell if B contains the “right” columns? Use duality
(∆B)
min yb + v : v ≥ c ¯x − y (A¯x ) x ∈ B¯
= min fB(y ) = max { c ¯x + y (b − A¯x ) : ¯x ∈ B } one constraint for each ¯x ∈ B
Dantzig-Wolfe decomposition ≡ Lagrangian relaxation
Dual of (Π): (∆) ≡ (∆X) (many constraints) fB = lower approximation ofLagrangian function
(Πy) f (y ) = max { cx + y (b − Ax ) : x ∈ X } Assumption: optimizing over X is “easy” foreachobjective =⇒
obtaining ¯x s.t. f (y ) = c ¯x + y (b − A¯x ) is “easy”
Important: (Πy)Lagrangian relaxation[4],f (y ) ≥ v (Π) = v (∆) ∀y provided (Πy) is solved exactly(or at least af ≥ f (y )¯ is used) Thus, (∆B) outer approximation of theLagrangian dual
(∆) min f (y ) = max { cx + y (b − Ax) : x ∈ X }
[4] Geoffrion “Lagrangean relaxation for integer programming” Mathematical Programming Study 1974
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 11 / 42
Lagrangian duality vs. Linear duality
Note about the LP case (X = { x : Ex ≤ d }):
(∆) min yb + max { (c − yA)x : Ex ≤ d }
≡ min yb + min { wd : wE = c − yA , w ≥ 0 }
≡ min yb + wd : wE + yA = c , w ≥ 0
≡ exactly the linear dual of (Π)
y “partial” duals: duals w of Ex ≤ d “hidden” in the subproblem There is only one duality
Will repeatedly come in handy
Dantzig-Wolfe decomposition ≡ Dual row generation
Primal/dual optimal solution x∗/(v∗, y∗) out of (ΠB)/(∆B) x∗ feasible to (Π), so optimal ⇐⇒ (v∗, y∗) feasible to (∆)
⇐⇒ v∗≥ (c − y∗A)x ∀x ∈ X
⇐⇒ v∗≥ max { (c − y∗A)x : x ∈ X } In fact: v∗≥ (c − y∗A)¯x ≡ y∗b + v∗ ≥ f (y∗) =⇒
v (Π) ≥ cx∗ = y∗b + v∗≥ f (y∗) ≥ v (∆) ≥ v (Π) =⇒
x∗/(v∗, y∗) optimal
Otherwise, B = B ∪ { ¯x }: add new column to (ΠB) / row to (∆B), rinse, repeat
Clearly finite is ext X is, globally convergent anyway:
the cutting plane algorithm for convex programs[5] (applied to (∆))
[5] Kelley “The Cutting-Plane Method for Solving Convex Programs” J. of the SIAM, 1960
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 13 / 42
Geometry of the Lagrangian dual
y f x2
y*
fB x3
x4 x1
x5
x6
v*
v∗ = fB(y∗) lower bound on v (ΠB)
Optimal solutionx¯ gives separatorbetween (v∗, y∗) and epi f (c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y∗)
Geometry of the Lagrangian dual
y f x2
y*
fB x3
x4 x1
x5
x6
v∗ = fB(y∗) lower bound on v (ΠB)
Optimal solutionx¯ givesseparator between (v∗, y∗) and epi f
(c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y∗)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 14 / 42
Geometry of the Lagrangian dual
y f
x2 fB
x3
x4 x1
x5
x6
v∗ = fB(y∗) lower bound on v (ΠB)
Optimal solutionx¯ givesseparator between (v∗, y∗) and epi f (c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y∗)
Dantzig-Wolfe decomposition ≡ Inner Approximation
“Abstract” view of (ΠB): conv (B) inner approximation of X (ΠB) max { cx : Ax = b , x ∈ conv (B) } x∗ solves the Lagrangian relaxation of (ΠB) with y∗, i.e.,
x∗ ∈ argmax (c − y∗A)x : x ∈conv (B)
=⇒ (c − y∗A)x ≤ (c − y∗A)x∗ for each x ∈conv (B) ⊆ X (c − y∗A)¯x = max{ (c − y∗A)x : x ∈ X } ≥ (c − y∗A)x∗ Column x has¯ positive reduced cost
(c − y∗A)(¯x − x∗) = (c − y∗A)¯x − cx∗+ y∗b = (c − y∗A)¯x − v∗> 0
=⇒ ¯x∈/conv (B) =⇒ makes sense to add ¯x to B
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 15 / 42
Geometry of Dantzig-Wolfe/Column Generation
Ax = b Ax = b
c c < y
*A
c − y∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x∗
Thus, optimizing it allows finding new points (if any) Issue: conv (B) ∩ Ax = b must be nonempty
Geometry of Dantzig-Wolfe/Column Generation
Ax = b Ax = b
c c < y
*A
c − y∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x∗ Thus, optimizing it allows finding new points (if any)
Issue: conv (B) ∩ Ax = b must be nonempty
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 16 / 42
Geometry of Dantzig-Wolfe/Column Generation
Ax = b Ax = b
c c < y
*A
c − y∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x∗ Thus, optimizing it allows finding new points (if any)
Issue: conv (B) ∩ Ax = b must be nonempty
Extension I: the Unbounded Case
X unbounded ⇐⇒ rec X ⊃ {0} =⇒ f (y ) = v (Πy) =∞happens X = conv ( ext X = X0) +cone( ext rec X = X∞)
B = ( B0 ⊂ X0) ∪ (B∞⊂ X∞) = { points ¯x } ∪ { rays ¯χ } =⇒
(ΠB)
max c
P
¯
x ∈B0x θ¯ x¯+P
¯
χ∈B∞χθ¯ χ¯ A
P
¯
x ∈B0x θ¯ x¯+P
¯
χ∈B∞χθ¯ χ¯
= b P
¯
x ∈B0 θx¯ = 1
θx¯≥ 0 x ∈ B¯ 0 , θχ¯≥ 0 χ ∈ B¯ ∞
In (∆B),constraints y (A ¯χ) ≥ c ¯χ (a.k.a. “feasibility cuts”) (Πy∗) unbounded ⇐⇒ (c − y∗A) ¯χ > 0 for some ¯χ ∈ rec X (violated constraint) =⇒ B∞= B∞∪ { ¯χ}
(∆) = min{ f (y ) : y ∈ Y}, (Πy∗) provideseither subgradients of f (a.k.a. “optimality cuts”), orviolated valid inequalities for Y[5]
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 17 / 42
Extension II: the Nonlinear Case
Nonlinear case: c(·) concave, A(·) component-wise convex (Π) max c(x) : A(x)≤b , x ∈ X
(∆) max f (y ) = yb + max { c(x) − yA(x) : x ∈ X } : y ≥ 0 Any ¯x ∈ X still gives f (y ) ≥ c(¯x ) + y (b − A(¯x )), same(∆B) / (ΠB) c(P
¯
x ∈Bx θ¯ ¯x) ≥P
¯
x ∈Bc(¯x )θx¯ (c(·) concave), A(P
¯
x ∈Bx θ¯ x¯) ≤P
¯
x ∈BA(¯x )θx¯≤ b (A(·) convex) =⇒
(ΠB) safe inner approximation (v (ΠB) ≤ v (Π)) Basically everything keeps working, but you may need constraint qualification[6] (usually easy to get)
[6] Lemar´echal, Hiriart-Urrity “Convex Analysis and Minimization Algorithms” Springer, 1993
Primal decomposition, a.k.a.
Outer Approximation Benders’ decomposition Resource decomposition
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 19 / 42
Staircase-structured Convex (Linear) Program
Staircase-structuredprogram: convex X,“complicating” variables (Π) max { cx + ez : Dx + Ez ≤ d , x ∈ X } e.g, Dx + Ez ≤ d ≡ Dkx+ Ekzk ≤ dk k ∈ K (|K | large) =⇒
Z (x ) = { z : Ez ≤ d − Dx }
= N
k∈K Zk(x ) = { zk : Ekzk ≤ dk − Dkx } We can efficiently optimize upon Z (x ), for different reasons:
a bunch of (many, much) smaller problems instead of a large one Z (x ) has (the Zk(x ) have)structure(shortest path, knapsack, . . . ) (much more so than solving the whole of (Π), anyway)
In other words wecould efficiently solve (Π) if linking variables were fixed: how to exploit it?
Benders’ reformulation
Benders’ reformulation: define the convex value function
(B) max cx +v (x )= max{ ez : Ez ≤ d − Dx } : x ∈ X (note: clearly v (x ) =−∞ happens)
Clever trick[7]: use dualityto reformulate the inner problem
v (x ) = min w (d − Dx) : w ∈ W = { w : wE = e , w ≥ 0 } so that W does not depend on x
As usual, W = conv ( ext W = W0) + cone( ext rec W = W∞) =⇒
(B) max cx + v
v ≤ ¯w (d − Dx ) w ∈ W¯ 0 0 ≤ ¯ω(d − Dx ) ω ∈ W¯ ∞
x ∈ X
still very large, butwe can generate ¯w / ¯ω by computing v (x )
[7] Benders “Partitioning Procedures for Solving Mixed-Variables Programming Problems” Numerische Mathematik, 1962 A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 21 / 42
Benders’ decomposition
Select (small) B = ( B0 ⊂ W0) ∪ ( B∞⊂ W∞), solvemaster problem (BB) max cx + v
v ≤ ¯w (d − Dx ) w ∈ B¯ 0 0 ≤ ¯ω(d − Dx ) ω ∈ B¯ ∞
x ∈ X
= max cx + vB(x ) : x ∈ X ∩ VB , where
vB(x ) = min{ ¯w (d − Dx ) : ¯w ∈ B0} ≥ v (x), VB ⊇ dom v
Find (primal) optimal solution x∗, compute v (x∗), get either ¯w or ¯ω, update either B0 or B∞, rinse & repeat
Benders’ decomposition ≡ Cutting Plane approach to (B)[5]
Spookily similar to the Lagrangian dual, ain’t it?
Except, constraints are now attached todual objects w / ¯¯ ω
Benders is Lagrange . . .
Block-diagonal case
(Π) max { cx : Ax = b , Ex ≤ d }
(∆) min yb + wd : wE + yA = c , w ≥ 0 Think of y as complicating variables in (∆), you get
(Π) max { cx : Ax = b , Ey ≤ d }
(∆) min yb + min{ wd : wE = c − yA , w ≥ 0 }
= min yb + max{ (c − yA)x : Ex ≤ d } i.e., the Lagrangian dual of (Π)
The value function of (∆) is the Lagrangian function of (Π)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 23 / 42
. . . Lagrange is Benders . . .
Dual of (Π) (linear case X = { x : Ax = b })
(Π) max { cx + ez : Dx + Ez ≤ d , Ax = b }
(∆) min { yb + wd : yA + wD = c , wE = e , w ≥ 0 } Lagrangian dual of the dual constraints yA + wD = c (multiplier x ):
(∆) max min{ yb + wd + (c − yA + wD)x : wE = e , w ≥ 0 }
= max cx + min{ y (b − Ax) + w (d − Dx) : wE = e , w ≥ 0 }
= max cx + min{ y (b − Ax) } +
min{ w (d − Dx ) : wE = e , w ≥ 0 }
= max cx + min{ ez : Dx + Ez ≤ e } : Ax = b i.e., Benders’ reformulation of (Π)
The Lagrangian function of (∆) is the value function of (Π)
. . . and Both are the Cutting Plane Algorithm
Both Lagrange and Benders boil down to min φ(λ) : λ ∈ Λ
with Λ and φ convex,nondifferentiable, both only implicitlyknown by means of a (potentially costly) oracle that, given ¯λ, provides:
either φ(¯λ) < ∞ and ¯g ∈ ∂φ(¯λ) ≡ φ(λ) ≥ φ(¯λ) + ¯g (λ − ¯λ) or φ(¯λ) = ∞ and a valid cut for Λ violated by ¯λ
“Natural” algorithm: the Cutting Plane method[5]≡
revised simplex method with mechanized pricing in the discrete case Many other variants/algorithms possible (cf. Part II)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 25 / 42
The Nonlinear Case
Each f (x , ·) and G (x , ·) concave, Z convex:
(Π) max { f (x , z) : G (x , z) ≥ 0 , x ∈ X , z ∈ Z } (B) max v (x) : x ∈ X
where v (x ) = max{ f (x , z) : G (x , z) ≥ 0 , z ∈ Z } (B) ≡ (Π) without assumptionson f (·, z), G (·, z) and X (hard) Which dualitywould you use? Lagrangian[8], of course
v (x ) = min max{ f (x, z) + λG (x, z) : z ∈ Z } : λ ≥ 0 Under appropriate constraint qualification, two cases occur:
either ∃ ¯λ ≥ 0 , ¯z ∈ Z s.t. v (x∗) = f (x∗, ¯z) + ¯λG (x∗, ¯z) > −∞
or v (x∗) = −∞ =⇒ { z ∈ Z : G (x∗, z) ≥ 0 } = ∅ =⇒ ∃ ¯ν ≥ 0 , ¯z ∈ Z s.t. max{ ¯νG (x∗, z) : z ∈ Z } = ¯νG (x∗, ¯z) < 0
[8] Geoffrion “Generalized Benders Decomposition” JOTA, 1972
The Nonlinear Case (cont.d)
General form of the master problem (B) max v
v ≤max{ f (x, z) + ¯λG (x , z) : z ∈ Z } λ ∈ Λ¯ 0 0 ≤max{ ¯νG (x∗, z) : z ∈ Z } ν ∈ Λ¯ ∞
x ∈ X
Er . . . how on Earth do youmanage those nasty “max”?
Must be that the “max” can be done independently of x ! Example: f (zi) concave, univariate
max P
ixif (zi) : P
ixizi ≤ c , zi ≥ 0 , Ax ≤ b , x ≥ 0 v (x ) = minλP
imax{xi(f (zi) − λzi) : zi ≥ 0 + λc v (x ) ≤P
iximax{ (f (zi) − ¯λzi) : zi ≥ 0 + ¯λc can optimize on the z independently from the x =⇒
“normal” linear cuts
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 27 / 42
The Integer Case
Block-structured Integer Programs
What if X combinatorial(e.g. , X = { x ∈ Zn : Ex ≤ d })?
(Π) max { cx : Ax = b , x ∈ X } The Lagrangian dual is
(∆) min yb + max { (c − yA)x : x ∈ X }
nothing changesif we can still efficiently optimize over X , e.g. due to size (decomposition) and/or structure (integrality)
. . .except we are solving a different problem:
( ¯Π)
max c P
¯ x ∈X x θ¯ x¯
A P
¯ x ∈X x θ¯ x¯
= b P
¯
x ∈X θx¯ = 1 , θx¯≥ 0 x ∈ X¯
≡ max { cx : Ax = b , x ∈ conv (X ) } i.e., a (potentiallygood) relaxation of (Π)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 29 / 42
Block-structured Integer Programs (cont.d)
Good news: ( ¯Π)better(not worse) than continuous relaxation (conv (X ) ⊆ { x ∈ Rn : Ex ≤ d })
Bad news: if (Πy) “too easy” (conv (X ) = { x ∈ Rn : Ex ≤ d }, a.k.a. integrality property), then ( ¯Π) sameas continuous relaxation Trade-off: (Πy) must beeasy, but not too easy(no free lunch) Anyway, at best gives good bounds=⇒
Branch & Bound with DW/Lagrangian/CG ≡ Branch & Price Branching nontrivial: may destroy subproblem structure
=⇒ branch on x (but (ΠB) is onθ)
Lamentably little support from off-the-shelf tools: master problem gives a valid bound only at termination, although subproblem always gives one(but not associated to continuous feasible solution)
Digression: How to Choose your Lagrangian relaxation
There may bemany choices
(Π) max cx : Ax = b , Ex ≤ d , x ∈ Zn
(Π0y) max cx + y (b − Ax) : x ∈ X0 = { x ∈ Zn : Ex ≤ d } (Π00w) max cx + w (d − Ex) : x ∈ X00= { x ∈ Zn : Ax = b } Thebest between (∆0) and (∆00) depends on integrality of X0, X00:
ifboth have it, both (∆0) and (∆00) ≡ continuous relaxation ifonly one has it, the one that doesnot, but ifboth don’t have it?
Here comes Lagrangian decomposition[9] (scale by 1/2) (Π) ≡ max cx0+ cx00 : x0∈ X0 , x00 ∈ X00, x0 = x00
(Πλ) max { (c + λ)x0 : x0∈ X0} + max { (c − λ)x00 : x00∈ X00} ( ¯∆) ≡ ( ¯Π) max cx : x ∈ conv (X0) ∩ conv (X00)
better than both(but need to solve two hard subproblems)
[9] Guignard, Kim “Lagrangean decomposition: a model yielding stronger lagrangean bounds” Math. Prog., 1987
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 31 / 42
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation
Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints
Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 32 / 42
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part
Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints
Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 32 / 42
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part
Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints)
Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 32 / 42
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒
the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
Geometry of Lagrangian Decomposition
Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒
the grey partmore
But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 32 / 42
Digression: Alternative Good Formulations for conv (X )
(Under mild assumptions) conv (X ) is a polyhedron =⇒
conv (X ) = x ∈ Rn : E x ≤ ˜˜ d
There are good formulationsfor the problem
Except, practically all good formulations are too large
Ex ) d Ex ) d Ex ) d Ex ) d ~ ~ ~ ~ ~ ~ ~ ~ Ex ) d Ex ) d Ax = b Ax = b
Ax = b Ax = b
Very few exceptions (integrality property≈ networks) Good part: working in thenatural variable space But a few more variables do as a lot more constraints
Row generation/polyhedral approaches
The good news is: rows can be generated incrementally
Ax = b Ax = b
Ex ) d
Ax = b A
Ex ) d Ax A Ax A
c
Relevant concept: separator
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 34 / 42
Row generation/polyhedral approaches
The good news is: rows can be generated incrementally
Ax = b Ax = b
Ex ) d
Ax = b A
Ex ) d Ax A Ax A
c
Relevant concept: separator
Row generation/polyhedral approaches
The good news is: rows can be generated incrementally
E
1x ) d
1E
1x ) d
1Ax = b
E E
1x ) d d
1Ax = b c
Relevant concept: separator
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 34 / 42
Branch & Cut
R = (small) subset of row( indice)s, ERx ≤ dR reduced set Solve outer approximation to ( ¯Π)
( ¯ΠR) max { cx : Ax = b , ERx ≤ dR} feed the separator withprimal optimal solution x∗
Separator for (several sub-families of) facets of conv (X ) Several general approaches, countless specialized ones
Most often separators arehard combinatorial problemsthemselves (though using general-purpose MIP codes is an option[10])
May tail off,branchinguseful far before having solved ( ¯ΠX)
[10] Fischetti, Lodi, Salvagnin “Just MIP it!” MATHEURISTICS, Ann. Inf. Syst., 2009
Branch & Cut vs. Branch & Price
Which is best?
Row generation naturally allows multiple separators Very well integrated in general-purpose solvers (but harder to exploit “complex” structures)
Column generation naturally allows very unstructured separators Simpler to exploit “complex” structures
(but much less developed software tools) Column generation is row generation in the dual Then, of course, Branch & Cut & Price
(nice, but software issues remain and possibly worsen)
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 36 / 42
Staircase-structured Integer Programs
What if X = { x ∈ Zn : Ex ≤ d } combinatorial?
( ¯Π) max { cx + ez : Ax + Bz ≤ b , x ∈ X } Nothing changes. . . except(BB) now is combinatorial =⇒ hard However(BW) now is equivalent to ( ¯Π)=⇒ no branching needed (unless for solving (BB)) =⇒no Branch & Benders’
Conversely, everything breaks down if z ∈ Zm: there is no (workable, exact) dual of an Integer Program
Can do with “approximated” duals (strong formulations, RLT[11], . . . ) butequivalence lost =⇒ branching again
[11] Sen, Sherali “Decomposition with branch-and-cut approaches for two-stage stochastic MIP” Math. Prog., 2006
Example Applications, a.k.a.
the Reformulation
before the Reformulation
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 38 / 42
(Very) Classical decomposition approaches for (2SILP)
Here-and-now decisions are naturally complicating variables The (expected) value functiondecomposes by scenario
v (x ) = c0x +P
s∈Sπsmin cszs : Eszs ≤ bs− E0sx Alternative approach: split variables, introduce copy constraints
min c0x +P
s∈Sπscszs E0x ≤ b0
E0sxs+ Eszs ≤ bs , xs = x s ∈ S relax them in a Lagrangian fashion
Lagrangian approach chooses all variables for all scenarios (no unfeasibility), tries to make here-and-now agree by changing prices Difference more pronounced in multi-stage programs
Classical decomposition approaches for (MCND)
Design (z) variables are “naturally” linking / complicating What remains is flow/paths: convex even if integer
Benders’ cuts are metric inequalities[12]defining the multiflow feasibility Resource decomposition[13]: add artificial linking variables
dkxijk ≤uijk , P
k∈Kukij ≤ uij Different possible linking constraints:
(3): =⇒ flow (shortest path) relaxation (integrality property≡ “easy”) (2): =⇒ knapsack relaxation (only one integer variableper problem) different efficiency (algorithm-dependent[14,15]), others possible
[12] Costa, Cordeau, Gendron “Benders, metric and cutset inequalities for multicommodity capacitated network design”
Comput. Optim. Appl., 2009
[13] Kennington, Shalaby “An Effective Subgradient Procedure for Minimal Cost Multicomm. Flow Problems” Man. Sci. 1977 [14] Crainic, F., Gendron “Bundle-based relaxation methods for multicommodity capacitated fixed charge network design”
Disc. Appl. Math. 2001
[15] F., Gendron, Gorgone “On the computational efficiency of subgradient methods: a case study in combinatorial optimization”
Math. Prog. Comp. (submitted), 2015
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 40 / 42
Conclusions
(for now)
Conclusions (part I)
Structured (Integer) Programs are challenging, butstructure can be exploited: main tools are reformulation + duality
Two different approaches, “primal” and “dual”
Different twists, different conditions to work:
who is complicating(constraints vs. variables), buttricks(≡ other reformulations) can be used to create the desired structure who is reformulated(subproblem vs. master problem)
where integer/nonconvexitycan be (subproblem vs. master problem) where branching/cuttingis done (subproblem vs. master problem) where/which nonlinearities can be easily dealt with
(For linear programs)Lagrange is Benders’ in the dual, and vice-versa Both boil down to the 50+-years old Cutting Plane algorithm[5]
Has it aged well? We’ll see tomorrow
A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 42 / 42