Advanced Decomposition Methods Part I: all is one, one is all

(1)

Advanced Decomposition Methods Part I: all is one, one is all

Antonio Frangioni

Dipartimento di Informatica, Universit`a di Pisa

COST/MINO Ph.D. School on Advanced Optimization Methods

Roma — June 8, 2016

(2)

Outline

1 Block-Structured (Mixed-Integer NonLinear) Programs

2 Dual decomposition (Dantzig-Wolfe/Lagrangian/Column Generation)

3 Primal decomposition (Benders’/Resource)

4 The Integer Case

5 Example Applications (the Reformulation before the Reformulation)

6 Conclusions (for now)

(3)

Block-Structured Programs

Many applications of Mixed-Integer NonLinear Programming are large-scale: millions/billions of variables/constraints

Good news: (almost) all large-scale problems areblock-structured Usually several nested forms of structure, but two main ones:

E1

A

E1

block-diagonal staircase-structured

≡complicating constraints ≡ complicating variables

Relaxingconstraints /fixing variables yieldsindependent subproblems

=⇒ much easierbecause of size and/or structure (integrality, . . . )

A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 3 / 42

(4)

Example I: Two-stage Stochastic (Linear) Programs

Problems involvingdecisions over timeanduncertainty First-stage (here-and-now) decisions x , constraints E0x ≤ b0

Set S of scenarios, realization known only after deciding x Recoursedecisions z_s,differentfor each scenario s ∈ S , constraints E₀^sx + E_sz_s ≤ b_s

Minimize here-and-now cost plus average cost of reserve actions min

n

c0x +P

s∈Sπscszs : E0x ≤ b0 , E₀^sx+ Eszs ≤ b_s s ∈ S o Extends to multi-stage(structure repeats “fractally” into each Es) Oftenother structures inside E ,network a common one

Extends to nonlinear risk measures (CVaR, . . . ), integer variables, . . . Many applications: energy^[1], water, logistics, telecom, finance, . . .

[1] Tahanan, van Ackooij, F., Lacalandra “Large-scale Unit Commitment under uncertainty” 4OR 2015

(5)

Example II: (Linear) Multicommodity Network Design

Graph G = (N, A),multicommodity network design model minP

k∈K

P

(i , j )∈Ad^kc_ij^kx_ij^k+P

(i , j )∈Af_ijz_ij (1)

X

(i , j )∈A

x_ij^k − X

(j ,i )∈A

x_ji^k =

( 1 if i = s^k 1 if i = t^k 0 otherwise

i ∈ N , k ∈ K (2) P

k∈Kd^kx_ij^k ≤ u_ijz_ij (i , j ) ∈ A (3)

x_ij^k ∈ [0, 1] (i , j ) ∈ A , k ∈ K (4)

zij ∈ {0, 1} (i , j ) ∈ A (5)

K ≡ commodities ≡ (s^k, t^k, d^k) (not completely generic) Pervasive structure in most of combinatorial optimization Many applications: logistic, transportation, telecom, energy, . . . Nonlinear objective function/constraints(energy, delay^[2], . . . )

[2] F., Galli, Scutell`a “Delay-Constrained Shortest Paths: Approximation Algorithms and SOCP Models” JOTA, 2015

(6)

Dual decomposition, a.k.a.

Inner Approximation

Dantzig-Wolfe decomposition Lagrangian Relaxation

Column Generation

(7)

Block-diagonal Convex (Linear) Program

Block-diagonal program: convex X, n “complicating” constraints (Π) max { cx : Ax = b, x ∈ X }

e.g, X = { x : Ex ≤ d } =N

k∈K X^k = { x^k : E^kx^k ≤ d^k} (|K | large =⇒ (Π) verylarge), Ax = b linking constraints We can efficiently optimize upon X, for different reasons:

a bunch of (many, much) smaller problems instead of a large one X has (the X^k have)structure (shortest path, knapsack, . . . ) (much more so than solving the whole of (Π), anyway)

In other words wecould efficiently solve (Π) if linking constraints were removed: how to exploit it?

(8)

Dantzig-Wolfe reformulation

Dantzig-Wolfe reformulation^[3]: X convex =⇒ represent it by points X = x = P_{x ∈X}_¯ x θ¯ _x_¯ : P

¯

x ∈X θ_x_¯= 1 , θ_x_¯≥ 0 x ∈ X¯ then reformulate (Π)in terms of the convex multipliers θ

(Π)











max c P

¯

x ∈X x θ¯ x¯

A P

¯

x ∈X x θ¯ x¯

= b P

¯

x ∈X θ_¯_x = 1 , θ_¯_x ≥ 0 x ∈ X¯ only n + 1 rows (but how many columns?)

note that“¯x ∈ X ” is an index, not a constraint (θ is the variable)

A rather semi-infiniteprogram, but“only” ¯x ∈ ext X needed Not that this makes it any less infinite, unless

X is a polytope (compact polyhedron) =⇒finite set of vertices

[3] Dantzig, Wolfe “The Decomposition Principle for Linear Programs” Operations Research 1960

(9)

Dantzig-Wolfe reformulation (cont.d)

Could this ever be a good idea? Actually, it could:

polyhedra may have few facesandmany vertices . . . orvice-versa n-cube |x_i| ≤ 1 ∀ i 2n faces 2ⁿ vertices n-co-cube P

i|x_i| ≤ 1 2ⁿ faces 2n vertices Except, most oftenthe number of vertices is too large

AXe = b AXe = b ee = 1 ee = 1 Ax = b Ax = b

Ex ) d Ex ) d

AXe = b AXe = b ee = 1 ee = 1

a (linear) program with (exponentially/infinitely) many columns But,efficiently optimize over X =⇒ generate vertices (≡ columns)

(10)

Dantzig-Wolfe decomposition ≡ Column Generation

B ⊂ X (small), solve restriction of (Π)with X → B, i.e.,

(ΠB)







max P

¯

x ∈B(c ¯x ) θ¯x

P

¯

x ∈B(A¯x ) θ_¯_x = b P

¯

x ∈B θx¯ = 1 , θx¯ ≥ 0 x ∈ B¯

“master problem” (B small, not too costly)

note how the parentheses have moved: linearityis needed (for now)

If B contains the “right” columns, x^∗ =P

¯

x ∈B x θ¯ _x^∗_¯ optimal for (Π) How do I tell if B contains the “right” columns? Use duality

(∆B)

min yb + v : v ≥ c ¯x − y (A¯x ) x ∈ B¯

= min f_B(y ) = max { c ¯x + y (b − A¯x ) : ¯x ∈ B } one constraint for each ¯x ∈ B

(11)

Dantzig-Wolfe decomposition ≡ Lagrangian relaxation

Dual of (Π): (∆) ≡ (∆_X) (many constraints) fB = lower approximation ofLagrangian function

(Π_y) f (y ) = max { cx + y (b − Ax ) : x ∈ X } Assumption: optimizing over X is “easy” foreachobjective =⇒

obtaining ¯x s.t. f (y ) = c ¯x + y (b − A¯x ) is “easy”

Important: (Πy)Lagrangian relaxation^[4],f (y ) ≥ v (Π) = v (∆) ∀y provided (Π_y) is solved exactly(or at least af ≥ f (y )¯ is used) Thus, (∆B) outer approximation of theLagrangian dual

(∆) min f (y ) = max { cx + y (b − Ax) : x ∈ X }

[4] Geoffrion “Lagrangean relaxation for integer programming” Mathematical Programming Study 1974

(12)

Lagrangian duality vs. Linear duality

Note about the LP case (X = { x : Ex ≤ d }):

(∆) min yb + max { (c − yA)x : Ex ≤ d }

≡ min yb + min { wd : wE = c − yA , w ≥ 0 }

≡ min yb + wd : wE + yA = c , w ≥ 0

≡ exactly the linear dual of (Π)

y “partial” duals: duals w of Ex ≤ d “hidden” in the subproblem There is only one duality

Will repeatedly come in handy

(13)

Dantzig-Wolfe decomposition ≡ Dual row generation

Primal/dual optimal solution x^∗/(v^∗, y^∗) out of (ΠB)/(∆B) x^∗ feasible to (Π), so optimal ⇐⇒ (v^∗, y^∗) feasible to (∆)

⇐⇒ v^∗≥ (c − y^∗A)x ∀x ∈ X

⇐⇒ v^∗≥ max { (c − y^∗A)x : x ∈ X } In fact: v^∗≥ (c − y^∗A)¯x ≡ y^∗b + v^∗ ≥ f (y^∗) =⇒

v (Π) ≥ cx^∗ = y^∗b + v^∗≥ f (y^∗) ≥ v (∆) ≥ v (Π) =⇒

x^∗/(v^∗, y^∗) optimal

Otherwise, B = B ∪ { ¯x }: add new column to (ΠB) / row to (∆B), rinse, repeat

Clearly finite is ext X is, globally convergent anyway:

the cutting plane algorithm for convex programs^[5] (applied to (∆))

[5] Kelley “The Cutting-Plane Method for Solving Convex Programs” J. of the SIAM, 1960

(14)

Geometry of the Lagrangian dual

y f x2

y^*

f_B x3

x₄ x1

x5

x6

v^*

v^∗ = fB(y^∗) lower bound on v (ΠB)

Optimal solutionx¯ gives separatorbetween (v^∗, y^∗) and epi f (c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y^∗)

(15)

Geometry of the Lagrangian dual

y f x2

y^*

f_B x3

x₄ x1

x5

x6

Optimal solutionx¯ givesseparator between (v^∗, y^∗) and epi f

(c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y^∗)

(16)

Geometry of the Lagrangian dual

y f

x2 f_B

x3

x₄ x1

x5

x6

Optimal solutionx¯ givesseparator between (v^∗, y^∗) and epi f (c ¯x , A¯x ) = new row in (∆B) (subgradient of f at y^∗)

(17)

Dantzig-Wolfe decomposition ≡ Inner Approximation

“Abstract” view of (ΠB): conv (B) inner approximation of X (ΠB) max { cx : Ax = b , x ∈ conv (B) } x^∗ solves the Lagrangian relaxation of (ΠB) with y^∗, i.e.,

x^∗ ∈ argmax (c − y^∗A)x : x ∈conv (B)

=⇒ (c − y^∗A)x ≤ (c − y^∗A)x^∗ for each x ∈conv (B) ⊆ X (c − y^∗A)¯x = max{ (c − y^∗A)x : x ∈ X } ≥ (c − y^∗A)x^∗ Column x has¯ positive reduced cost

(c − y^∗A)(¯x − x^∗) = (c − y^∗A)¯x − cx^∗+ y^∗b = (c − y^∗A)¯x − v^∗> 0

=⇒ ¯x∈/conv (B) =⇒ makes sense to add ¯x to B

(18)

Geometry of Dantzig-Wolfe/Column Generation

Ax = b Ax = b

c c < y

^*

A

c − y^∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x^∗

Thus, optimizing it allows finding new points (if any) Issue: conv (B) ∩ Ax = b must be nonempty

(19)

Geometry of Dantzig-Wolfe/Column Generation

Ax = b Ax = b

c c < y

^*

A

c − y^∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x^∗ Thus, optimizing it allows finding new points (if any)

Issue: conv (B) ∩ Ax = b must be nonempty

(20)

Geometry of Dantzig-Wolfe/Column Generation

Ax = b Ax = b

c c < y

^*

A

c − y^∗A separatesconv (B) ∩ Ax = b from all x ∈ X better than x^∗ Thus, optimizing it allows finding new points (if any)

Issue: conv (B) ∩ Ax = b must be nonempty

(21)

Extension I: the Unbounded Case

X unbounded ⇐⇒ rec X ⊃ {0} =⇒ f (y ) = v (Π_y) =∞happens X = conv ( ext X = X₀) +cone( ext rec X = X∞)

B = ( B₀ ⊂ X₀) ∪ (B_∞⊂ X_∞) = { points ¯x } ∪ { rays ¯χ } =⇒

(ΠB)











max c

P

¯

x ∈B0x θ¯ _x_¯+P

¯

χ∈B∞χθ¯ _χ_¯ A

P

¯

x ∈B0x θ¯ x¯+P

¯

χ∈B∞χθ¯ χ¯

= b P

¯

x ∈B0 θx¯ = 1

θ_x_¯≥ 0 x ∈ B¯ ₀ , θ_χ_¯≥ 0 χ ∈ B¯ ∞

In (∆B),constraints y (A ¯χ) ≥ c ¯χ (a.k.a. “feasibility cuts”) (Πy^∗) unbounded ⇐⇒ (c − y^∗A) ¯χ > 0 for some ¯χ ∈ rec X (violated constraint) =⇒ B∞= B∞∪ { ¯χ}

(∆) = min{ f (y ) : y ∈ Y}, (Π_y^∗) provideseither subgradients of f (a.k.a. “optimality cuts”), orviolated valid inequalities for Y^[5]

(22)

Extension II: the Nonlinear Case

Nonlinear case: c(·) concave, A(·) component-wise convex (Π) max c(x) : A(x)≤b , x ∈ X

(∆) max f (y ) = yb + max { c(x) − yA(x) : x ∈ X } : y ≥ 0 Any ¯x ∈ X still gives f (y ) ≥ c(¯x ) + y (b − A(¯x )), same(∆B) / (ΠB) c(P

¯

x ∈Bx θ¯ _¯_x) ≥P

¯

x ∈Bc(¯x )θ_x_¯ (c(·) concave), A(P

¯

x ∈Bx θ¯ x¯) ≤P

¯

x ∈BA(¯x )θx¯≤ b (A(·) convex) =⇒

(ΠB) safe inner approximation (v (ΠB) ≤ v (Π)) Basically everything keeps working, but you may need constraint qualification^[6] (usually easy to get)

[6] Lemar´echal, Hiriart-Urrity “Convex Analysis and Minimization Algorithms” Springer, 1993

(23)

Primal decomposition, a.k.a.

Outer Approximation Benders’ decomposition Resource decomposition

(24)

Staircase-structured Convex (Linear) Program

Staircase-structuredprogram: convex X,“complicating” variables (Π) max { cx + ez : Dx + Ez ≤ d , x ∈ X } e.g, Dx + Ez ≤ d ≡ D_kx+ E_kz_k ≤ d_k k ∈ K (|K | large) =⇒

Z (x ) = { z : Ez ≤ d − Dx }

= N

k∈K Zk(x ) = { zk : Ekzk ≤ d_k − D_kx } We can efficiently optimize upon Z (x ), for different reasons:

a bunch of (many, much) smaller problems instead of a large one Z (x ) has (the Zk(x ) have)structure(shortest path, knapsack, . . . ) (much more so than solving the whole of (Π), anyway)

In other words wecould efficiently solve (Π) if linking variables were fixed: how to exploit it?

(25)

Benders’ reformulation

Benders’ reformulation: define the convex value function

(B) max cx +v (x )= max{ ez : Ez ≤ d − Dx } : x ∈ X (note: clearly v (x ) =−∞ happens)

Clever trick^[7]: use dualityto reformulate the inner problem

v (x ) = min w (d − Dx) : w ∈ W = { w : wE = e , w ≥ 0 } so that W does not depend on x

As usual, W = conv ( ext W = W0) + cone( ext rec W = W∞) =⇒

(B) max cx + v

v ≤ ¯w (d − Dx ) w ∈ W¯ ₀ 0 ≤ ¯ω(d − Dx ) ω ∈ W¯ ∞

x ∈ X

still very large, butwe can generate ¯w / ¯ω by computing v (x )

[7] Benders “Partitioning Procedures for Solving Mixed-Variables Programming Problems” Numerische Mathematik, 1962 A. Frangioni (DI — UniPi) Advanced Decomposition Methods I Roma 2016 21 / 42

(26)

Benders’ decomposition

Select (small) B = ( B₀ ⊂ W₀) ∪ ( B∞⊂ W∞), solvemaster problem (BB) max cx + v

v ≤ ¯w (d − Dx ) w ∈ B¯ ₀ 0 ≤ ¯ω(d − Dx ) ω ∈ B¯ ∞

x ∈ X

= max cx + v_B(x ) : x ∈ X ∩ VB , where

vB(x ) = min{ ¯w (d − Dx ) : ¯w ∈ B₀} ≥ v (x), V_B ⊇ dom v

Find (primal) optimal solution x^∗, compute v (x^∗), get either ¯w or ¯ω, update either B0 or B∞, rinse & repeat

Benders’ decomposition ≡ Cutting Plane approach to (B)^[5]

Spookily similar to the Lagrangian dual, ain’t it?

Except, constraints are now attached todual objects w / ¯¯ ω

(27)

Benders is Lagrange . . .

Block-diagonal case

(Π) max { cx : Ax = b , Ex ≤ d }

(∆) min yb + wd : wE + yA = c , w ≥ 0 Think of y as complicating variables in (∆), you get

(Π) max { cx : Ax = b , Ey ≤ d }

(∆) min yb + min{ wd : wE = c − yA , w ≥ 0 }

= min yb + max{ (c − yA)x : Ex ≤ d } i.e., the Lagrangian dual of (Π)

The value function of (∆) is the Lagrangian function of (Π)

(28)

. . . Lagrange is Benders . . .

Dual of (Π) (linear case X = { x : Ax = b })

(Π) max { cx + ez : Dx + Ez ≤ d , Ax = b }

(∆) min { yb + wd : yA + wD = c , wE = e , w ≥ 0 } Lagrangian dual of the dual constraints yA + wD = c (multiplier x ):

(∆) max min{ yb + wd + (c − yA + wD)x : wE = e , w ≥ 0 }

= max cx + min{ y (b − Ax) + w (d − Dx) : wE = e , w ≥ 0 }

= max cx + min{ y (b − Ax) } +

min{ w (d − Dx ) : wE = e , w ≥ 0 }

= max cx + min{ ez : Dx + Ez ≤ e } : Ax = b i.e., Benders’ reformulation of (Π)

The Lagrangian function of (∆) is the value function of (Π)

(29)

. . . and Both are the Cutting Plane Algorithm

Both Lagrange and Benders boil down to min φ(λ) : λ ∈ Λ

with Λ and φ convex,nondifferentiable, both only implicitlyknown by means of a (potentially costly) oracle that, given ¯λ, provides:

either φ(¯λ) < ∞ and ¯g ∈ ∂φ(¯λ) ≡ φ(λ) ≥ φ(¯λ) + ¯g (λ − ¯λ) or φ(¯λ) = ∞ and a valid cut for Λ violated by ¯λ

“Natural” algorithm: the Cutting Plane method^[5]≡

revised simplex method with mechanized pricing in the discrete case Many other variants/algorithms possible (cf. Part II)

(30)

The Nonlinear Case

Each f (x , ·) and G (x , ·) concave, Z convex:

(Π) max { f (x , z) : G (x , z) ≥ 0 , x ∈ X , z ∈ Z } (B) max v (x) : x ∈ X

where v (x ) = max{ f (x , z) : G (x , z) ≥ 0 , z ∈ Z } (B) ≡ (Π) without assumptionson f (·, z), G (·, z) and X (hard) Which dualitywould you use? Lagrangian^[8], of course

v (x ) = min max{ f (x, z) + λG (x, z) : z ∈ Z } : λ ≥ 0 Under appropriate constraint qualification, two cases occur:

either ∃ ¯λ ≥ 0 , ¯z ∈ Z s.t. v (x^∗) = f (x^∗, ¯z) + ¯λG (x^∗, ¯z) > −∞

or v (x^∗) = −∞ =⇒ { z ∈ Z : G (x^∗, z) ≥ 0 } = ∅ =⇒ ∃ ¯ν ≥ 0 , ¯z ∈ Z s.t. max{ ¯νG (x^∗, z) : z ∈ Z } = ¯νG (x^∗, ¯z) < 0

[8] Geoffrion “Generalized Benders Decomposition” JOTA, 1972

(31)

The Nonlinear Case (cont.d)

General form of the master problem (B) max v

v ≤max{ f (x, z) + ¯λG (x , z) : z ∈ Z } λ ∈ Λ¯ ₀ 0 ≤max{ ¯νG (x^∗, z) : z ∈ Z } ν ∈ Λ¯ ∞

x ∈ X

Er . . . how on Earth do youmanage those nasty “max”?

Must be that the “max” can be done independently of x ! Example: f (zi) concave, univariate

max P

ix_if (z_i) : P

ix_iz_i ≤ c , z_i ≥ 0 , Ax ≤ b , x ≥ 0 v (x ) = min_λP

imax{x_i(f (z_i) − λz_i) : z_i ≥ 0 + λc v (x ) ≤P

iximax{ (f (zi) − ¯λzi) : zi ≥ 0 + ¯λc can optimize on the z independently from the x =⇒

“normal” linear cuts

(32)

The Integer Case

(33)

Block-structured Integer Programs

What if X combinatorial(e.g. , X = { x ∈ Zⁿ : Ex ≤ d })?

(Π) max { cx : Ax = b , x ∈ X } The Lagrangian dual is

(∆) min yb + max { (c − yA)x : x ∈ X }

nothing changesif we can still efficiently optimize over X , e.g. due to size (decomposition) and/or structure (integrality)

. . .except we are solving a different problem:

( ¯Π)











max c P

¯ x ∈X x θ¯ x¯

A P

¯ x ∈X x θ¯ x¯

= b P

¯

x ∈X θ_x_¯ = 1 , θ_x_¯≥ 0 x ∈ X¯

≡ max { cx : Ax = b , x ∈ conv (X ) } i.e., a (potentiallygood) relaxation of (Π)

(34)

Block-structured Integer Programs (cont.d)

Good news: ( ¯Π)better(not worse) than continuous relaxation (conv (X ) ⊆ { x ∈ Rⁿ : Ex ≤ d })

Bad news: if (Πy) “too easy” (conv (X ) = { x ∈ Rⁿ : Ex ≤ d }, a.k.a. integrality property), then ( ¯Π) sameas continuous relaxation Trade-off: (Πy) must beeasy, but not too easy(no free lunch) Anyway, at best gives good bounds=⇒

Branch & Bound with DW/Lagrangian/CG ≡ Branch & Price Branching nontrivial: may destroy subproblem structure

=⇒ branch on x (but (ΠB) is onθ)

Lamentably little support from off-the-shelf tools: master problem gives a valid bound only at termination, although subproblem always gives one(but not associated to continuous feasible solution)

(35)

Digression: How to Choose your Lagrangian relaxation

There may bemany choices

(Π) max cx : Ax = b , Ex ≤ d , x ∈ Zⁿ

(Π⁰_y) max cx + y (b − Ax) : x ∈ X⁰ = { x ∈ Zⁿ : Ex ≤ d } (Π⁰⁰_w) max cx + w (d − Ex) : x ∈ X⁰⁰= { x ∈ Zⁿ : Ax = b } Thebest between (∆⁰) and (∆⁰⁰) depends on integrality of X⁰, X⁰⁰:

ifboth have it, both (∆⁰) and (∆⁰⁰) ≡ continuous relaxation ifonly one has it, the one that doesnot, but ifboth don’t have it?

Here comes Lagrangian decomposition^[9] (scale by 1/2) (Π) ≡ max cx⁰+ cx⁰⁰ : x⁰∈ X⁰ , x⁰⁰ ∈ X⁰⁰, x⁰ = x⁰⁰

(Π_λ) max { (c + λ)x⁰ : x⁰∈ X⁰} + max { (c − λ)x⁰⁰ : x⁰⁰∈ X⁰⁰} ( ¯∆) ≡ ( ¯Π) max cx : x ∈ conv (X⁰) ∩ conv (X⁰⁰)

better than both(but need to solve two hard subproblems)

[9] Guignard, Kim “Lagrangean decomposition: a model yielding stronger lagrangean bounds” Math. Prog., 1987

(36)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation

Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

But theintersection of convex hullsis larger (bad) than theconvex hull of the intersection

(37)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints

Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

(38)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part

Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

(39)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints

Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

(40)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part

Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

(41)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints)

Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒ the grey partmore

(42)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒

the grey partmore

(43)

Geometry of Lagrangian Decomposition

Intersection betweenredandblue≡ grey ≡ continuous relaxation Lagrangian relaxation of blueconstraints shrinks thered (=⇒ grey) part Lagrangian relaxation of redconstraints shrinks theblue(=⇒ grey) part Lagrangian decomposition (bothredandblueconstraints) shrinks both=⇒

the grey partmore

(44)

Digression: Alternative Good Formulations for conv (X )

(Under mild assumptions) conv (X ) is a polyhedron =⇒

conv (X ) = x ∈ Rⁿ : E x ≤ ˜˜ d

There are good formulationsfor the problem

Except, practically all good formulations are too large

Ex ) d Ex ) d Ex ) d Ex ) d ^~ ^~ ^~ ^~ ^~ ^~ ^~ ^~ Ex ) d Ex ) d Ax = b Ax = b

Ax = b Ax = b

Very few exceptions (integrality property≈ networks) Good part: working in thenatural variable space But a few more variables do as a lot more constraints

(45)

Row generation/polyhedral approaches

The good news is: rows can be generated incrementally

Ax = b Ax = b

Ex ) d

Ax = b A

Ex ) d Ax A Ax A

c

Relevant concept: separator

(46)

Row generation/polyhedral approaches

Ax = b Ax = b

Ex ) d

Ax = b A

Ex ) d Ax A Ax A

c

(47)

Row generation/polyhedral approaches

E

₁

x ) d

₁

E

₁

x ) d

₁

Ax = b

E E

₁

x ) d d

₁

Ax = b c

(48)

Branch & Cut

R = (small) subset of row( indice)s, E_Rx ≤ dR reduced set Solve outer approximation to ( ¯Π)

( ¯ΠR) max { cx : Ax = b , ERx ≤ dR} feed the separator withprimal optimal solution x^∗

Separator for (several sub-families of) facets of conv (X ) Several general approaches, countless specialized ones

Most often separators arehard combinatorial problemsthemselves (though using general-purpose MIP codes is an option^[10])

May tail off,branchinguseful far before having solved ( ¯Π_X)

[10] Fischetti, Lodi, Salvagnin “Just MIP it!” MATHEURISTICS, Ann. Inf. Syst., 2009

(49)

Branch & Cut vs. Branch & Price

Which is best?

Row generation naturally allows multiple separators Very well integrated in general-purpose solvers (but harder to exploit “complex” structures)

Column generation naturally allows very unstructured separators Simpler to exploit “complex” structures

(but much less developed software tools) Column generation is row generation in the dual Then, of course, Branch & Cut & Price

(nice, but software issues remain and possibly worsen)

(50)

Staircase-structured Integer Programs

What if X = { x ∈ Zⁿ : Ex ≤ d } combinatorial?

( ¯Π) max { cx + ez : Ax + Bz ≤ b , x ∈ X } Nothing changes. . . except(BB) now is combinatorial =⇒ hard However(B_W) now is equivalent to ( ¯Π)=⇒ no branching needed (unless for solving (BB)) =⇒no Branch & Benders’

Conversely, everything breaks down if z ∈ Z^m: there is no (workable, exact) dual of an Integer Program

Can do with “approximated” duals (strong formulations, RLT^[11], . . . ) butequivalence lost =⇒ branching again

[11] Sen, Sherali “Decomposition with branch-and-cut approaches for two-stage stochastic MIP” Math. Prog., 2006

(51)

Example Applications, a.k.a.

the Reformulation

before the Reformulation

(52)

(Very) Classical decomposition approaches for (2SILP)

Here-and-now decisions are naturally complicating variables The (expected) value functiondecomposes by scenario

v (x ) = c0x +P

s∈Sπsmin c_szs : Eszs ≤ b_s− E₀^sx Alternative approach: split variables, introduce copy constraints

min c₀x +P

s∈Sπ_sc_sz_s E₀x ≤ b₀

E₀^sx_s+ E_sz_s ≤ b_s , x_s = x s ∈ S relax them in a Lagrangian fashion

Lagrangian approach chooses all variables for all scenarios (no unfeasibility), tries to make here-and-now agree by changing prices Difference more pronounced in multi-stage programs

(53)

Classical decomposition approaches for (MCND)

Design (z) variables are “naturally” linking / complicating What remains is flow/paths: convex even if integer

Benders’ cuts are metric inequalities^[12]defining the multiflow feasibility Resource decomposition^[13]: add artificial linking variables

d^kx_ij^k ≤u_ij^k , P

k∈Ku^k_ij ≤ u_ij Different possible linking constraints:

(3): =⇒ flow (shortest path) relaxation (integrality property≡ “easy”) (2): =⇒ knapsack relaxation (only one integer variableper problem) different efficiency (algorithm-dependent^[14,15]), others possible

[12] Costa, Cordeau, Gendron “Benders, metric and cutset inequalities for multicommodity capacitated network design”

Comput. Optim. Appl., 2009

[13] Kennington, Shalaby “An Effective Subgradient Procedure for Minimal Cost Multicomm. Flow Problems” Man. Sci. 1977 [14] Crainic, F., Gendron “Bundle-based relaxation methods for multicommodity capacitated fixed charge network design”

Disc. Appl. Math. 2001

[15] F., Gendron, Gorgone “On the computational efficiency of subgradient methods: a case study in combinatorial optimization”

Math. Prog. Comp. (submitted), 2015

(54)

Conclusions

(for now)

(55)

Conclusions (part I)

Structured (Integer) Programs are challenging, butstructure can be exploited: main tools are reformulation + duality

Two different approaches, “primal” and “dual”

Different twists, different conditions to work:

who is complicating(constraints vs. variables), buttricks(≡ other reformulations) can be used to create the desired structure who is reformulated(subproblem vs. master problem)

where integer/nonconvexitycan be (subproblem vs. master problem) where branching/cuttingis done (subproblem vs. master problem) where/which nonlinearities can be easily dealt with

(For linear programs)Lagrange is Benders’ in the dual, and vice-versa Both boil down to the 50+-years old Cutting Plane algorithm^[5]

Has it aged well? We’ll see tomorrow

Advanced Decomposition Methods Part I: all is one, one is all