Primal stabilization in column generation

(1)

Università di Pisa

Corso di Laurea in Matematica

Tesi di Laurea Magistrale

Primal Stabilization

in Column Generation

Relatore:

Prof. Antonio Frangioni

Candidata: Angela Mastrocinque Controrelatore:

Prof.ssa Laura Galli

(2)

(3)

3

Introduction

Column generation techniques have proved to be very successful when dealing with industry scale instances requiring Integer Linear Programming. For such instances, a direct approach is infeasible, due to computational complexity and to the number of variables involved. Two typical problems whose solutions have shown significant improvements by the use of such technique are the Bin Packing Problem (BPP) and the Vehicle Scheduling Problem (VRP). In this thesis, we will focus on these two problems, and suggest a stabilization technique to improve the classical column generation algorithms.

The main problem of Column Generation is a slow convergence towards the optimum, caused by oscillations of the partial dual optima, which can get very far from their final best value. Dual stabilization has been proposed in various works, in order to prevent the dual optima to oscillate too much and to improve the convergence of the algorithm; this yielded positive results in various cases. In this thesis, we adopt a different, primal stabilization technique: criteria are added in order to generate as many “good columns” as possible, in the attempt of reaching the optimum in a more straightforward way. The concept of “good”, of course, is a key point: some heuristics have to be adopted to suggest the columns to be generated with higher priority. We will adopt a “greedy” approach for the Bin Packing, and a stabilization technique proposed by MAIOR for the VRP, which has given good results in real-world instances.

The problems we will study belong to the domain of Integer Linear Program-ming, and any theoretical improvement proposed must be supported by numerical experiments, for which we need a ILP solver. Many solvers available today, both commercial and open source, implement algorithms based on the most modern re-sults in this topic. The one we will use is SCIP, a solver developed in ZIB Berlin after a project of Tobias Achterberg, which is free for academical use. This will handle the integer programming part of the problem. As for the LP solver, we will use Cplex, distributed by IBM under academic licence.

SCIP is written in the C programming language, but it comes with C++ wrapper classes of which we will take advantage: some of the codes for reading the instances and calling the solvers will be written using the C++ programming language.

In Chapter 1 we recall the basic ideas of Column Generation and constraint de-composition. In particular, we will briefly explain the Dantzig-Wolfe Decomposition and its link with the Lagrangian Dual problem. We also mention the Branch&Price algorithms which are typically used when solving integer programs via column gen-eration.

In Chapter 2 we discuss about the stabilization problems of the Column Genera-tion algorithms. In particular, we formulate the idea of the “suggested items”, which is exploited in practice in the following chapters.

In Chapter 3 we explain in detail the Bin Packing Problem and our modified algorithm for its resolution. We do the same in Chapter 4 for the capacitated Vehicle Routing Problem.

(4)

In Chapter 5 we examine the basic features of the SCIP Optimization Suite, in particular related with its flexibility, which allows to directly modify its source code to achieve a user-defined behaviour. We will briefly outline the main functions of its interface, and the possibility of using it as a bash program.

In Chapter 6 we describe how to implement our enhanced algorithm in the SCIP Optimization Suite, both for the Bin Packing and for the Vehicle Routing problems. Finally, in Chapter 7 we collect the results of the numerical experiments conducted by the use of our algorithms, and draw the relative conclusions.

Acknowledgements

This thesis has been written in the A.Y. 2019/2020, for obtaining the Master’s Degree at the University of Pisa, under the supervision of Prof. Antonio Frangioni. I thank him for making this possible and for the patience he demonstrated, by constantly helping me advancing even when my working commitments were holding me back.

I would like to thank my Accenture colleagues, for all the support they gave me in the last year, for being a piece of my family now.

Particular acknowledgement must be given to my husband Simone who, besides always supporting me by staying by my side, also gave me a valuable practical help: without him, this thesis would have taken much more time to be written and would have never been such amazing! I would never be able to thank you as much as you deserve.

I also thank my mother and my father, for the support they gave me by encour-aging me never to abandon my goals, for beying the best parents that I could ever desire.

Lsst but not least, thank you to my awesome sisters, brothers and nephew: you are simply the best.

(5)

Column Generation,

Dantzig-Wolfe Reformulation,

Lagrangian Dual

Column Generation (CG) is a general technique that allows to efficiently solve linear programs with a very large (exponential) number of variables (columns in the con-straint matrix), provided that the corresponding pricing problem can be efficiently solved. This turns out to be very important for solving many practical problems, because many “tight” Integer Linear Programming (ILP) formulations for these prob-lems turn out to have a very large number of columns. This is in particular true for formulations deriving from the general Dantzig-Wolfe Reformulation technique, which applies to problems whose constraints can be separated into a set of “sim-ple” ones, upon which optimization is “easy”, and a set of “complicating” ones which cause the problem’s difficulty to rise significantly. In turn, the Dantzig-Wolfe Refor-mulation technique can be interpreted, from the dual viewpoint, as the solution of the Lagrangian Dual of the original formulation with respect to the “complicating” constraints. All these concepts will be introduced in this Chapter.

1.1 Column Generation

Column Generation is a general technique to solve Linear Programs of the form z∗:= min P

j∈Jcjλj (1.1.1)

s.t. Pj∈Jajλj > b (1.1.2)

λj > 0 j ∈ J (1.1.3)

were J is a very large (typically, exponential) set of variables/columns. Since solving the problem in one blow is impossible, CG is an iterative process that, at each step, considers a much smaller subset J0 _{⊆ J} _{of the variables and solves the Restricted}

(8)

Master Problem (RMP) z := min P

j∈J0c_jλ_j (1.1.4)

s.t. Pj∈J0a_jλ_j > b (1.1.5)

λj > 0 j ∈ J0 (1.1.6)

It is clear that the solution of the (1.1.4)–(1.1.6) is also feasible for the original problem (1.1.1)–(1.1.3). Moreover, we have the obvious bound z∗ _{≤ z}_{. Letting π}

i

for i = 1, . . . , n be the dual variables corresponding to the constraints (1.1.5), the dual of the RMP reads

π :=arg max πb (1.1.7)

s.t. πaj 6 cj j ∈ J0 (1.1.8)

π > 0 (1.1.9)

Since the constraints (1.1.8) only refer to the restricted set of variables J0_{, the dual}

optimal solution π will not, in general, be feasible for the dual of the original master problem. This happens precisely when there exists some j ∈ J\J0_{such that πa}

j > cj.

This motivates the definition of the reduced cost of a variable j ∈ J as cj := cj− πaj .

In particular, we will have cj > 0 for j ∈ J0, and a variable j ∈ J corresponds to a

violated constraint in the dual master problem if and only if cj < 0.

The idea is that a variable of J (i.e. a column) has negative reduced cost, it must be added (generated) to J0 _{in order to make the relative dual constraint feasible.}

In order to find a column with negative reduced cost, one must solve the pricing problem

j :=arg min{ cj : j ∈ J }

using a problem-specific algorithm generally called the pricer (or oracle). If cj < 0,

then the relative column is added to the RMP (J0 _{:= J}0 _{∪ { j }}_{), which is then}

re-solved. If cj ≥ 0, instead, then π is also feasible for the dual of (1.1.1)–(1.1.3): this

implies that an optimal solution of the problem has been found. Clearly, the process finitely terminates since J is finite. The practical efficiency depends on both the cost of the pricing problem, and on the total number of iterations performed (which impacts on the cumulative cost of all the RMP solved). Possibly with appropriate modifications (cf. Chapter 2), the approach can be effective.

1.2 The Dantzig-Wolfe Reformulation

One setting that produces “large” Linear Programs like those of the previous setting is that of the Dantzig-Wolfe Reformulation of a “structured” Integer Linear Program,

(9)

1.2. THE DANTZIG-WOLFE REFORMULATION 9 say of the form

z∗:= min cx (1.2.1)

s.t. Ex > e (1.2.2) Dx > d , x ∈ Zn (1.2.3) In this formulation, we consider (1.2.2) to be the “complicating” constraints; this means that if the constraints were not there, solving the optimization problem over the remaining constraints (1.2.3) would be much easier than solving (1.2.1)–(1.2.3) whole. One typical reason, present already in the original development by Dantzig and Wolfe [9], is that the matrix D is block-diagonal, which means that the problem decomposes into a (possibly, large) number of (thereby, much smaller) sub-problems. To connect the compact formulation (1.2.1)–(1.2.3) with the large-scale Linear Program (1.1.1)–(1.1.3) of the previous section, we develop its Dantzig-Wolfe re-formulation (DWR). This starts by defining the set of solutions of the subproblem X = { x ∈ Zn : Dx ≥ d }. For the sake of simplicity, we assume X to be a finite set, i.e., X = { xj}j∈J (the use of the same index set as in the previous section being

intended). This is always possible if Dx ≥ d is a bounded polyhedron, even if not all the variables x are integer, by taking appropriate vertices of the set. Also, the treatment can be extended to non-compact sets by making use of appropriate direc-tion, but we avoid delving further into it because all our applications have discrete feasible regions. The original problem can then be reformulated as

z∗ := minP j∈J(cxj)λj (1.2.4) s.t. Pj∈J(Exj)λj ≥ e (1.2.5) P j∈Jλj = 1 (1.2.6) λj > 0 j ∈ J (1.2.7) λj ∈ Z j ∈ J (1.2.8)

The continuous relaxation of (1.2.4)–(1.2.8), obtained by dropping the integrality constraints (1.2.8) (which force the variables λj to be binary), clearly has the form

required in the previous section, owing in particular to the fact that X typically has exponentially many points. The lower bound on z∗ _{computed by such continuous}

relaxation can be easily proven to be never worse, and in practice it is often sig-nificantly better, than that of the continuous relaxation of the compact formulation (1.2.1)–(1.2.3); this makes (1.2.4)–(1.2.8) a better formulation upon which designing exact and heuristic solution approaches, provided of course that the CG approach can be employed to efficiently solve its continuous relaxation. This is, however, clearly the case: in fact, for a fixed dual solution ( π , v ) (the last component being associated to the simplex constraint (1.2.6)), it is easy to see that the pricer now to solve

(10)

which has been assumed to be “easily solvable” from the start. Thus, the fact that (1.2.2) are “complicating” constraints precisely corresponds to the fact that CG can be used to solve the continuous relaxation of the DWR (1.2.4)–(1.2.8) of the original compact formulation (1.2.1)–(1.2.3). For future reference, let us record that given an optimal solution λ of the RMP, the solution

x =P

j∈Jλjxj (1.2.10)

is feasible for the continuous relaxation of (1.2.1)–(1.2.3). This makes it plain to see that the continuous relaxation of the DWR in fact corresponds to the problem

z∗:= min cx (1.2.11)

s.t. Ex > e (1.2.12) x ∈conv( y ∈ Zn _{: Dy > d )} (1.2.13) where the original “easy” constraints Dx > d are replaced by their convex hull; in turn, this immediately confirms that the corresponding bound cannot be worse than that of the compact formulation. At each iteration of the CG, the RMP provides both a dual optimal solution π and an optimal primal solution x, which at the end will be optimal for (1.2.11)–(1.2.13) (and its dual, to be analysed in the next section). It can be shown that by calling z the optimal value of the RMP and c the optimal value of the pricing problem, we have that z is a valid upper bound on the optimal value of (1.2.11)–(1.2.13) while z + c is a valid lower bound.

1.3 Lagrangian relaxation

The above development can be entirely re-interpreted from the dual viewpoint. In-deed, the pricing problem (1.2.9) is basically the Lagrangian relaxation of the original problem (1.2.1)–(1.2.3) w.r.t. the complicating constraints (1.2.2). Actually, the lat-ter is more properly defined as

L(π) := min{ cx + π(e − Ex) : Dx ≥ d , x ∈ Zn} (1.3.1) for a vector π ≥ 0 of Lagrangian multipliers. The term weighted by π in the objective penalizes the violation of the relaxed constraints Ex ≥ e and ensures that the value of the (concave, nondifferentiable) dual function L(π) always provides a lower bound on z∗ _{for all choices of π ≥ 0: L(π) 6 z}∗_{. This immediately suggest to the best such}

bound by solving the Lagrangian dual problem

L := max{ L(π) : π ≥ 0 } . (1.3.2) If such maximum is attained at π∗_{, and x}∗ _{∈ X} _{is one optimal solution to L(π}∗₎_,

then it can be shown that the pair ( x∗_{, π}∗₎ _{is the optimal solution if and only if}

(11)

1.4. BRANCH & PRICE 11 conditions π(e − Ex) = 0hold.

It is a classical result (e.g. [16]) that the optimal value of the Lagrangian dual (1.3.2) is equal to the one of convexified version (1.2.11)–(1.2.13), i.e., of the contin-uous relaxation of the DWR (1.2.1)–(1.2.3) of the original problem (1.2.4)–(1.2.8). Indeed, one can see that the solution of the DWR via CG is nothing else that the solution of (1.3.2) via Kelley’s Cutting Plane method [21].

The practical relevance of this correspondence lies in the fact that, in its by-the-book implementation, CG (equivalently, the Cutting Plane method) is known to suffer from serious computational issue. The main of those is the so-called instability of the dual iterates π. Since they are those that drive the process, most of the efforts to lessen the issue (i.e., to stabilize the CG) have focussed on the dual space. This is discussed in §2.1.

1.4 Branch & Price

Once the master problem has been solved, we have an optimal solution for the linear relaxation of the original ILP. One of the most common methods to obtain integer solutions is via Branch&Bound; which changes its name into Branch&Price (B&P) when at each node of the search tree the column generation algorithm described above is used. Assuming

x∗ := arg min cTx (1.4.1) subject to Ax > b, (1.4.2) x_{> 0,} (1.4.3) then the vector x∗ _{will not be, in general, an integer vector (if it is, the ILP would}

be solved): one then chooses a fractional entry xf of x∗ and branches over different

values of xf. The specific branching rule will be problem-specific.

Here is an example. Assume the ILP prescribes the entries of x to be binary, and that one has chosen in some way a fractional entry xf > 0of the optimal solution to

the master problem. Then two new problems are created by fixing xf = 0or xf = 1.

Both problems are then re-solved using the column generation algorithm, and the procedure repeats until no new subproblems have to be generated. At this point, a simple search through the leaf of the resulting tree gives the true optimum of the original ILP.

Assume we start with a minimization problem. Notice that if the optimal solu-tion at a given node is z∗_{, and the solutions to the relative children problems are}

z_A∗ and z_D∗, then both z_A∗ and z∗_D are larger than z∗; therefore, a lower bound is available at each node. If at some stage we have an integer solution z∗

I available (so

that a leaf of the tree has been found), this observation implies that we can also enforce an upper bound to the optimal solution at all nodes. This is implemented in practice via a cut-off condition for branches whose optimal solution is greater than z_I∗: the fathoming rule should then allow to improve the speed of convergence of

(12)

the algorithm. This approach is known as Branch&Cut&Price (B&C&P) (or some permutation of these three words): all algorithms we will use will have this type of branching rule implemented, although we will not examine it in detail.

We want to solve the Dantzig-Wolfe Reformulation (1.2.4)–(1.2.8) using this ap-proach. However, one must exercise care when branching in a column generation approach, since nothing assures that the price problem of the child node is simpler, or even as simple as, the one in the original node. The classical example, which is relevant in our case, is the shortest-path problem on a graph. Such problem is well-known to be solvable in polynomial time; however, if we branch on the arcs, the price problem of the child node will be a more complicated constrained shortest path problem, in which some of the arcs of the paths are obliged. This is a NP-hard problem, and a solution approach based on column generation becomes inconvenient. In order to avoid this issue when studying the Vehicle Routing Problem, we branch directly on the variable x by using the coupling equation (1.2.10); we are thus eliminating a node from the graph, a procedure which does not make the price problem of the child node more complicated. The subproblems appearing are then reformulated again using the Dantzig-Wolfe decomposition, and the entire process is repeated. In this way, the structures which make the subproblems tractable are preserved at each child node.

In the case of the Bin Packing Problem, we can directly branch on the variables λj, since this means fixing some objects to be already packed, leaving us with a

(13)

Chapter 2

Stabilization

It is well-known that, in many applications, the Column Generation algorithm may initially approaches a near optimal solution quickly enough, but then the rate of progress dramatically lessens. This may result in a very large number of columns continue to be generated without significantly improving the solution. This effect is often referred to as tailing off, a full theoretical explanation of which represents an active area of research.

One way of looking at the problem is that CG ends when “enough” columns that would be generated by the pricing problem at the (unknown) dual optimum π∗ _have

been collected. However, the columns produced at the actual iterates π in which (1.2.9) is called are of “bad quality”, which slows down the process. Thus, several (heuristics) techniques have been proposed to improve the quality of the columns generated at each step.

2.1 Dual stabilization

Most of the proposed techniques are done at the level of dual variables. For instance, it can be experimentally verified that, unlike the efficient algorithms for smooth optimization, the fact that one iterate π is very close to (or even coincident with) π∗

does not prevent the subsequent ones to be arbitrarily far from it [17]; in other words, the dual variables do not converge smoothly towards their optima, but oscillate chaotically. Thus, most approaches work by trying to stabilize the sequence of dual iterates, avoiding excessive oscillations. This is often done by forcing the solution to the dual master problem to lie in a region around a given point π. Of course, such value has to be chosen carefully in order to assure that the dual optimum does not lie too far from it: in practice one has to find a stability center (which may change dynamically during the CG process), and to add an explicit stabilizing term in the objective function of the dual problem, in order to penalize movements away from that center. One of the crucial, and less well-understood, parts of the process is the need to set and dynamically change a stabilization parameter which dictates how strong the stabilization effect need be. Depending on the choice of the explicit stabilizing term, various stabilizing methods acting on the dual variables exist: see

(14)

for example [6, 17] and the references therein.

While the approach has been repeatedly shown to be effective, one may argue that it obtains its aim only indirectly. Indeed, what one would want is to generate “good” columns; yet, what is done is to try to generate “good” dual iterates.

2.2 Primal stabilization

In this thesis, we adopt a different approach: we directly on the pricer, trying to force it to generate “good columns” at each iteration of the CG procedure, even for the same “unstable” sequence of dual iterates.

The general idea will be described for the pricing problem (1.2.9) (equivalently, (1.3.1)) of DWR, although it may be possible to apply the approach to different cases. The attempt is to use not only the dual information π from the RMP, but also the primal one x of (1.2.10), to define the following modified pricer

jπ,x:=arg min{ ( (c − πE )x) : Dx ≥ d , Eπ,xx ≥ dπ,x _{, x ∈ Z}n} (2.2.1) Note that we allow the dual point π that defines the reduced cost to be different from π. The idea is that the added constraints

Eπ,xx ≥ dπ,x (2.2.2) restrict the set of possible columns to be generated, hopefully avoiding “too bad” ones. The specific way to define (2.2.2) will depend on the particular problem, and it is for now left unspecified. However, since the extra constraints restrict the set of columns that can be generated, it is clear that (2.2.1) need be used carefully to avoid missing out on potentially useful columns. In the following we will consider a simple but general approach that still guarantees correctness of the CG process, corresponding to the Primal-Stabilised Column Ggeneration Algorithm (PSCGA):

1. Solve the RMP (1.1.4)–(1.1.6) for optimal primal and dual solutions x and π. 2. Call the modified pricer (2.2.1), let jπ,x

be the index of its optimal solution and cπ,x _{the corresponding minimal reduced cost.}

3. If cπ,x_{< 0} _{then J}0 _{:= J}0_{∪ { j}π,x_}_{and go to step 1.}

4. Call the standard pricer (1.2.9), let j be the index of its optimal solution and cthe corresponding minimal reduced cost.

5. If c < 0 then J0 _{:= J}0_{∪ { j }} _{and go to step 1.}

6. Stop and report π and x as primal and dual optimal solutions to (1.2.11)– (1.2.13).

(15)

2.3. THE SUGGESTED ELEMENTS APPROACH 15 Theorem 1. Let the initial J0 be such that the RMP has optimal solution(s); then, the PSCGA finitely terminates proviing as primal and dual optimal solutions to (1.2.11)–(1.2.13).

Proof. Since the algorithm can only stop at Step 6, when the solution of the dual RMP is also feasible for the full RMP, it is clear that the optimum at termination is the optimum of the master problem.

Since π is an optimal solution of the dual RMP, it is in particular dual feasible, which means that any column j ∈ J0 _{has nonnegative reduced cost. If the algorithm}

does not terminate, the new column added to J0 _{has negative reduced cost,}

irrespec-tively to if it has been obtained at Step 2 or Step 4. Indeed, although the constraints to (1.2.9) and (2.2.1) are different the objective is the same. Hence, even a column obtained by (2.2.1) (and, a fortiori, one obtained by (1.2.9)) that has negative re-duced cost makes π infeasible. This means that the dual solution of the RMP cannot be repeated, since columns are always added to J0_{, and never removed from it. Thus,}

at each iteration a new column is generated, and every column is generated at most once; therefore the algorithm must eventually reach the optimum, which must occur at Step 6.

As for the standard CP method, it might be possible to improve the algorithm by allowing careful elimination of columns from J0 _{without impairing convergence,}

see e.g. [16] for details. Also, dual-stabilised version of the CG approach may likely be used, e.g. along the lines of [2]. However, since this Thesis is, to the best of our knowledge, the first to propose the primal stabilization, we stick to the simplest possible version of CG, and therefore to the simple PSCGA.

Although the method is in principle independent on how the primal stabilising constraints (2.2.2) in (2.2.1) are defined, it is clear that for the approach to make computational sense these need be defined in a proper way. In the following we will propose a reasonably general scheme, based on suggested elements, that is appropriate for DWR of problems (1.2.1)–(1.2.3) where all variables are subject to integrality constraints. This is true in very many applications where the original problem is combinatorial in nature, among which the two ones tested in the thesis. Of course, other definitions might be possible that could be even better suitable for these and other types of problems.

2.3 The suggested elements approach

When X ⊆ Zn _{(in the original problem, and therefore in the pricing problem),}

each solution xj is “composed of n different elements, possibly in multiple copies.”

A simple way to define “promising solutions” (therefore “promising columns”) is to define the concept of “promising element”, and then to impose that the column is “mostly composed of promising elements”. The primal and dual information x and π can be used in different ways to define a particular element as “promising”.

(16)

2.3.1 Promising elements suggested by primal information

Since the primal continuous solution x from (1.2.10) has one element xi ∈ R for each

variable xi in the problem, the definition of “promising element” can be done in a

very simple way. In particular, one can, for instance:

• define “promising” each i such that xi ≥ γ for some fixed constant γ > 0,

and/or

• consider x ordered in non-increasing sense (largest values of xi first), with ties

broken arbitrarily, and define “promising” the first r indices i in the ordering for some fixed constant r ∈ N.

Of course, the choice of γ and/or r is crucial. In case the variables are actually binary γ might be chosen as a fixed number, otherwise γ might indicate a fraction of the maximum possible value that xi can have (assumed known). Similarly, r could

be expressed as a fraction of n (the total number of variables). The rationale is to construct columns that privilege the elements xi which have already been generated

to obtain new columns which only have few “new” elements. In this way one tries to make the convergence to the optimum better behaved, since relatively few new pieces appear at each iteration.

2.3.2 Promising elements suggested by dual information

Since the matrix E also has n columns, it is easy to compute the contribution of each variable x_i to the reduced cost. Indeed, the reduced cost vector is

c = c − πE = [ ci= ci− πEi]i=1,...,n

where Ei _{is the i-th column of E. We can therefore consider c}

i as the reduced cost of

the variable xi, and hence use it, similarly as before, to define “promising elements”:

• define “promising” each i such that ci ≤ γ for some fixed constant γ (likely

≤ 0), and/or

• consider c ordered in non-increasing sense (largest values of ci first), with ties

broken arbitrarily, and define “promising” the first r indices i in the ordering for some fixed constant r ∈ N.

As above, the choice of γ and/or r is crucial. It is easy to take r as a fraction of the total number of variables n, but other problem-specific ways could be considered (say, if the cardinality of the set of pieces to be used in an optimal solution was known or could be effectively estimated). As for γ, it could for instance be expressed as a fraction of the most-negative reduced cost found in the previous iteration. The rationale is to construct columns such that not only their reduced cost is negative, but also are made by summing low cost contributions.

(17)

2.3. THE SUGGESTED ELEMENTS APPROACH 17

2.3.3 Promising elements suggested by both information

Of course, nothing would prevent from combining primal and dual information, in different ways. For instance, one could use the ordering from one of the criteria, but use a threshold on the other γ to ensure that the selected elements are actually “interesting” from both viewpoints.

2.3.4 Overall suggested elements approach

Numerical experiments have confirmed the intuition that it is generally not conve-nient to impose to the pricer to only use suggested elements when looking for the new column. Instead, a reasonable approach is to require that “most” pieces are “interesting”, while leaving some freedom to introduce new ones at each iteration. In other words, the pricer is “suggested” which elements it should concentrate onto. That is, assume that I(x, π) is the set of “interesting” elements as defined with any of the criteria above, or any other suitable one: then, a simple way to define the primal stabilization constraints (2.2.2) is

P

i /∈I(x,π)xi ≤ Γ (2.3.1)

for some fixed constant Γ. For instance, if the variables xi are actually binary,

Γ counts the maximum number of non-suggested elements that can be used in a solution, forcing all the other ones to be chosen among suggested (interesting) ones. Again, of course the setting of Γ is going to be crucial: for instance, Γ = 0 corresponds to form columns which are only composed of suggested elements. At the other opposite, setting Γ = n reproduces the “standard” pricer (1.2.9) where we tolerate any number of non-suggested pieces. It can be expected that the interesting behaviour will be observed in between these two extrema.

Note that we are considering the crucial constants appearing in our approach (γ, r, Γ, . . . ) as fixed throughout all the PSCGA. While this is how we have dealt with them during our experiments, this need not necessarily be so, and dynamic rules for choosing them could be devised. Indeed, one can consider the standard pricer as a special case of the stabilised one (2.2.1) for Γ = n: hence, the difference between Step 2 and Step 4 in the algorithm could be construed to be only a difference in the parameter Γ, which is dynamically changed. Yet, imposing extra constraints on an optimization problem may significantly change its complexity, and the practical cost of this solution may be increased. While this is true in general, the cardinality-like constraint (2.3.1) is one of the “simplest” constraints that can be considered; for many optimization problems, cardinality constraints can be added with a relatively minor cost and/or change in the solution method of the pricer. This will be discussed in detail for each of our two applications, to be discussed next.

(18)

(19)

Chapter 3

The Bin Packing Problem

The (one-dimensional) Bin Packing Problem (BPP) is one of the most famous prob-lems in combinatorial optimization. We are given a set of objects, each one with its weight, and a set of bins with a given capacity; we assume for simplicity that the capacity is the same for all bins. The objective of the problem is to find the packing pattern of the objects in the bins, such that the number of bins needed is minimum. One of the first works appeared about this problem is the seminal paper of Kan-torovich [20], written in 1939. The historical importance of this problem relies, among the other things, on being one of the problems in which the column gen-eration algorithm has been applied with most success. Such algorithm applied to the BPP is due to Gilmore and Gomory in their two papers [19], following ideas of Dantzig and Wolfe in [9] and Ford and Fulkerson in [13].

Along with this, a closely related problem, the Cutting Stock Problem (CSP) is currently widely encountered in industry. It consists of finding the best way to cut a set of rods into sticks of given lengths, in order to minimize the number of rods needed and the overall cost of cutting. Assuming for simplicity that all cuttings have the same cost, such problem is theoretically equivalent to the BPP, although in typical real-life situations there is just a small number of different stick lengths, while many pieces are required by the clients. This brings to a BPP with many repetitions in the set of weights. See [5] and [11] for excellent treatments for this and related problems.

Since the BPP is NP-hard, an efficient algorithm and effective heuristics are of capital importance for its tractability. We will see how Gilmore-Gomory’s column generation approach dramatically lowers the time needed for its solution. We will see that this algorithm relies on the solution of another NP-hard problem, the Knapsack Problem (KP), for which many heuristics are available to find good putative solu-tions; applying one such heuristics to the Gilmore-Gomory approach, we get what we called the “Suggested Objects” algorithm, which leads to good admissible solutions even faster than the classical algorithm. The difference of the total speed of the two algorithms will be one of the numerical studies carried out in this thesis, and will be the subject of a following section.

(20)

3.1 The Kantorovich formulation

We assume we have an unlimited amount of bins labeled by integers i ∈ Z, all of capacity C ≥ 0, and we need to pack n objects with positive weights w1, . . . , wn. We

define binary variables yi and xij with i ∈ Z and j = 1, . . . , n such that

yi=

(

1 if bin i is used in the solution 0 otherwise

xij =

(

1 if object j is packed in bin i 0 otherwise. minX i yi (3.1.1) Subject to n X j=1 wjxij 6 Cyi, for all i (3.1.2) X i xij = 1for j = 1, . . . , n (3.1.3)

This formulation is known as the Kantorovich model for the BPP. The first constraint imposes that the capacity of each bin is not exceeded, while the second one states that each object is included in precisely one bin.

This program can then be passed to any solver of Integer Linear Programming to solve it directly, for example through a Branch&Bound algorithm. Although very simple to formulate and implement, the outcoming algorithm is found to be rather inefficient in dealing with large instances. This is a reason to introduce the Gilmore-Gomory formulation and column generation.

3.2 The Gilmore-Gomory formulation

We call a packing an admissible way to pack the objects into a bin. Therefore we can describe a packing as a binary array

p = (a1p, . . . , anp) such that n

X

i=1

aipwi ≤ C.

In this way, letting P be the set of all packings; we can define two sets of binary variables θp, aip for p ∈ P and i = 1, . . . , n, in the following way:

θp =

(

1 if packing p is used in the solution 0 otherwise.

(21)

3.2. THE GILMORE-GOMORY FORMULATION 21

aip=

(

1 if object i is included in the packing p 0 otherwise.

Then the BPP can then be reformulated as follows: minX p∈P θp (3.2.1) Subject to X p∈P aipθp > 1, i = 1, . . . , n. (3.2.2)

The constraint imposes that each object is used at least once, i.e. all objects are packed. Although this formulation has a very simple form, it is evident that the set P of all packings is enormous even for moderate-size instances, therefore this problem is untractable when defined in this way. However, we can get over this difficulty using the idea of column generation.

Assume we are given a smaller set P0 _{⊆ P} _{of feasible packings, found heuristically}

or using the trivial ones when only one object is put in each bin. We form the Master Problem (MP) minX p∈P0 θp (3.2.3) Subject to X p∈P0 aipθp> 1, i = 1, . . . , n (3.2.4) θp ≥ 0, for all p ∈ P0 (3.2.5)

Notice that we are not assuming anymore θp to be binary, but just real

non-negative. Thus, the MP is a linear relaxation of a restricted problem. Indeed, the column-generation procedure will just provide a solution to the linear relaxation of the original problem, while a final integer solution must be found by Branch&Bound; more precisely, when the column generation is used at every node of the search tree, the overall algorithm is called Branch&Price. We will not go into the details of this algorithm in this thesis, being mostly concentrated on column generation.

The MP is quickly solved by an LP solver, as long as its dual DMP: let πj be the

dual variable corresponding to the j-th constraint (1.7) at its optimal value. Since πj must satisfy the constraints of the dual problem, we have

cp:= 1 − n

X

j=1

ajpπj ≥ 0, for p ∈ P0.

(22)

p ∈ P. From this we immediately see that, if we find a packing p ∈ P such that cp < 0 (necessarily p /∈ P0), then the corresponding constraint in the dual of the

original problem is violated. Therefore, we add the packing p into P0 (in other words, we “generate a column”, from which the name of the algorithm), and repeat the procedure with a new restricted Master Problem.

In order to find a packing with negative reduced cost, we solve a maximization problem called the Slave Problem or Price Problem

max n X j=1 πjvj (3.2.6) Subject to n X j=1 wjvj ≤ C, (3.2.7)

vj ≥ 0, and integer, for all j = 1, . . . , n. (3.2.8)

This is an unbounded Knapsack Problem, in which we have a knapsack of capacity C and n objects of weigths wj and value πj: we want to put as much value as possible

into our knapsack without exceeding its capacity. Here vj is the number of times

object j is put into the knapsack, and the “unbounded” specification refers to the fact that we do not put any upper bound to vj.

This problem is widely encountered in financial markets, when we must optimize the yield of a portfolio being bound to some constraints related, for example, to total asset value, risk and volatility. We will treat such problem in more detail in the next section, since the “Suggested Objects” algorithm is way to speed up the solution of the Price Problem.

Now, if the solution to the Price Problem (3.2.6) has value greater than 1, the reduced cost of the corresponding packing is negative, thus we can add the column and restart at the Master Problem. Conversely, if the solution has value smaller than 1, then being a maximization problem we can infer that all packings have positive reduced costs. This means that all constraints in the dual of the original problem are satisfied, so that the restricted optimum we found is the optimum of the original problem as well. Since this optimum theoretically exists, the algorithm must stop at some point.

Once the linear relaxation is solved, as anticipated, a branching is created by fixing some of the solutions to be integers. Specifically, we use the Ryan-Foster branching rule in which the pair of objects whose values are closer to 0.5 are packed together or separately in the two branches; this has the advantage of creating a fairly balanced search tree. Of course, other branching rules are possible: see [4] for an extensive treatment.

The method of column generation is then applied to every node of the search tree, thus giving the justification for the name “Branch&Price”.

(23)

3.3. THE PRICE PROBLEM 23

3.3 The Price Problem

Let us examine more closely the Price Problem (1.9)-(1.11). We use the classical algorithm of dynamic programming to solve it. The idea is to explore all possible packings of the knapsack, “throwing away” the ones which are worse than the ones already found. Here is a sketch of the algorithm.

Recall we are given a set of n possible weights w1, . . . , wn and positive values

π1, . . . , πn; we also assume we have an unlimited amount of such objects. We want

to pack them into one single knapsack with capacity C, in order to maximize the value of the packing yet meeting the capacity constraint.

The goal is then to describe the set S of all admissible sums S =    r X j=1 wij ≤ C : r ≥ 0, 1 ≤ i1 < . . . < ir≤ m   

and to assign a predecessor label pi,σ for i = 1, . . . , m and σ ∈ S, which is a boolean

value such that pi,Pr j=1wij = (true if i ∈ {il}rl=1∧ Pr l=1πil = max{ Pt l=1πjl : Pt l=1ajl= Pr l=1ail} false otherwise.

In words, pi,σ is true if and only if the object labeled i lies in a packing of total

weight Pr

j=1wij, whose total value is maximal among the packings of the same

weight. Moreover, we want to take account only of best possible values when the sum is fixed. In order to do this, we want to define

πσ := max    r X j=1 πij : r X j=1 wij = σ   

In order to achieve this algorithmically, we proceed as follows in m steps: 1. We define S0= ∅.

2. For l ≥ 1 we define Sl = {Prj=1wij ≤ C : r ≥ 0, 1 ≤ i1 < . . . < ir ≤ l}. For

all σ ∈ Sl\ Sl−1, set pl,s =true. For σ ∈ Sl−1, choose a representative packing

σ =

r

X

i=1

wij, r ≥ 0, 1 ≤ i1 < . . . < ir ≤ l − 1

If there a new packing with the object of weight wl such that

σ =

s

X

j=1

(24)

then we set pl,σ = (false if Pr j=1πij ≥ Ps j=1πkj + πl true otherwise.

Moreover, if we have set pl,σ =true, we also update πσ, while if pl,σ =false,

the best possible πσ is already stored so we don’t update it.

3. Pass to l + 1 and loop until l = n.

It is easy to see that at the end of such loop, the predecessor labels are the correct ones, and since S = Sn=

Sn

l=1Sl, all possible sums are considered.

Finally, the optimal value is

π = max{πσ : σ ∈ S}

which is quickly found since the vector of πσ has already been stored. Letting

the optimal value be πσopt, the packing is defined as the set of labels l such that

pl,σopt = true. Of course, one can easily enumerate all admissible solutions just by

probing the vector of πσ’s and the corresponding predecessor labels.

3.4 The “Suggested Objects” algorithm

In practice, most of the execution time for a column generation algorithm is taken by the high number of price problems to be solved. We now implement the Suggested Objects approach of Section 2.3 to the case of the BPP. In particular, we will use the “dual information” approach, i.e. our objects will be suggested or not based on the dual values πi. We will then use a cardinality-like constraint as in (2.3.1), imposing

that only a maximum number Γ of non-suggested objects can be used in a packing. The parameter Γ is defined at the beginning of the program.

One of the most classical heuristics to solve the knapsack problem is Dantzig’s greedy algorithm. For each item of value πiand weight wi, we define the value per unit

weight (v.p.u.w.) as Vi = πi/wi. We rearrange the items so that V1 ≥ V2≥ . . . ≥ Vn.

The idea is that it is convenient to put in a knapsack an object with high Vi, since

it will have high value yet having small weight. Therefore, simply putting the first m objects, where m is the smallest index such that w1+ . . . + wm+1> C, will yield

a good solution, or at least a good lower bound to the optimum.

We need to implement this idea to the pricer. Actually it can be shown that just imposing the pricer to only skim through this solution brings to no significant improvement whatsoever. What we do instead is to give a set of “good” objects, called suggested objects, and impose that every packing cannot contain more than a certain number, which we will call the tolerance, of non-suggested objects. This means that the admissible packings will be fewer than in the original case, and if we chose the way to suggest the objects in the appropriate way we can have a good lower bound when this process ends.

(25)

3.4. THE “SUGGESTED OBJECTS” ALGORITHM 25 Definition 1. Given weights w1, . . . , wnand values π1, . . . , πn, we say that the object

with label l is suggested if

πl wl ≥ 1 n n X i=1 πi wi

In other words, if its v.p.u.w. is at least the arithmetic average of the v.p.u.w of the other objects. Notice that both the suggested objects and their number change at each column generation process, since the values πi form the optimal solution to

the dual master problem, which changes at each iteration. We say that this is a dynamic way to suggest the objects.

Notice that this is a rather arbitrary definition: we could have decided, for exam-ple, that the objects with the k highest v.p.u.p., with k a given integer, are suggested; the number k might then be static or dynamic. Many other ways to suggest the ob-jects are possible.

It is easy to modify the algorithm described in Section 1.3 to include the suggested objects. We define, for each label l, a counter sl∈ {0, 1}which is 1 if and only if the

object labeled l is not suggested. Let T be the tolerance. Then we define S =    r X j=1 wij ≤ C : r ≥ 0, 1 ≤ i1 < . . . < ir≤ m, r X j=1 sij ≤ T   

Then the algorithm proceeds just like the one in the classical case, except for the fact that the domination condition becomes

pl,σ = (false if Pr j=1πij ≥ Ps j=1πkj + πl and P r j=1sij ≤ Ps j=1skj+ sl true otherwise. Indeed, if we had Pr j=1sij > Ps

j=1skj+ sl, we cannot cutoff this packing, since

the packing with indices ij could reach the tolerance constraint before the one with

indices kj, thus eventually becoming worse than the latter.

As final remark, notice that nothing assures that the optimum value of the mod-ified algorithm is the same as the one found with the classical algorithm, since the latter might be reached with a packing which does not satisfy the tolerance con-straint. However, recall that it is not strictly necessary to find the optimum of the price problem: a successful call of the pricer consists of finding any column with negative reduced cost, so that it can be added to the master problem. Therefore, we can call the modified pricer as many times as we can, until no columns with negative cost is found. Only at this moment, since we still cannot be sure about the existence of some other column with negative cost, we call the classical pricer to find the last columns. If the modification is effective, only a relatively small amount of extra columns is needed to reach the optimum of the relaxed original problem. By capturing the times when the modified pricer stops, and by storing the relative optima, we will have a good measure of the effectiveness of this modified algorithm.

(26)

(27)

Chapter 4

The Vehicle Routing Problem

Suppose to be hired by a transport company operating on a city, the purpose is to optimize the service organization process, it means to minimize the total cost respecting all the constraints.

The company owns a finite number of vehicles, such as busses, trams..., stored in one or more depots. It is necessary to allocate an unknown number of them to run trips around the streets of the city by carrying passengers from one stop to the other. The trips must be organized such that all stops must be served, and the cost necessary to providing the service is minimized. This cost may include different factors: cost related to the vehicle, such as fuel consumption, engine wear, and all the variables connected to the machine; another part of the cost is related to the street such as tolls.

4.1 The Multicommodity Flow formulation

This situation can be mathematically represented using an oriented graph G = (N, A)where the nodes i ∈ N = {0, 1, 2, . . . , n} are the stops and the arcs (i, j) ∈ A are the routes between the stops. A node, labeled by 0 ∈ N, is fixed at the beginning of the problem, and each bus is required to start and end its trip at the node 0. We call such node the depot.

The company owns K buses stored at the depot 0. For an arc (i, j) ∈ A and a vehicle k ∈ {1, . . . , K}, we let xk

ij be a decisional variable which is equal to 1 or 0

depending on whether vehicle k passes through the arc (i, j) or not.

For each node i ∈ N we let dibe its demand, which can be seen as the net number

of passengers wanting to use the stop i; we have by convention di > 0 if there are

more passengers wanting to get on rather than off a bus at stop i. Each bus has capacity u > 0. Let cij ∈ R+ be the cost of passing through the arc (i, j) ∈ A.

Then the problem reads

min K X k=1 X (i,j)∈A cijxkij (4.1.1) 27

(28)

Subject to K X k=1 n X j=1 xk_ij = 1, for i = 1, . . . , n (4.1.2) K X k=1 n X i=1 xk_ij = 1, for j = 1, . . . , n (4.1.3) C := K X k=1 n X j=1 xk_0j = K X k=1 n X i=1 xk_i06 K (4.1.4) n X i=1 n X j=1 dixkij 6 u, for k = 1, . . . , C (4.1.5) X i∈Q X j∈Q xk_ij 6 |Q| − 1, for all Q ⊆ {1, . . . , n}, |Q| > 2, k = 1, . . . , K (4.1.6) K X k=1 xk_ij 6 1, for (i, j) ∈ A (4.1.7) xk_ij ∈ {0, 1}. (4.1.8)

Constraints (2.2) − (2.3) ensure that each node has exactly one entering and one exiting bus. Constraints (2.4) − (2.5) tell that all buses should depart from and end their trip at node 0 (the depot). Constraint (2.6) assures that the capacity of each bus k is never exceeded by the demand met in its trip. Constraint (2.7) ensures that the solution will not contain any cycle outside the depot or, which is the same, that all tours are connected to the depot. Finally, constraint (2.8) tells that at most one bus is allowed to pass through arc (i, j) ∈ A.

The formulation of the VRP made in this way is called a Multicommodity Flow (MF); historically, this has been the first formulation of problems of this type. Al-though such formulation has the advantage of being very simple and prone to an easy implementation on a computer, a direct approach to such problem using LP methods is not in general desirable. Indeed, the number of constraints in (2.6) is 2n− n₂ − n − 1_{, which is exponential in n. Because of this we prefer the Set} Par-titioning formulation, which also has the theoretical advantage of allowing a more flexible definition of the costs of routes and of being algebraically simpler.

4.2 The Set Partitioning formulation

We denote by P the set of all feasible paths of the problem; explicitly, an element of P will be a route which starts and finishes at the depot, visits at most once each stop, respects the capacity constraints and leaves each stop it visits.

We can describe any path as a vector

(29)

4.2. THE SET PARTITIONING FORMULATION 29 where bijpis a decisional variable which is 1 if and only if arc (i, j) is crossed by path

p. Only the paths which satisfy the conditions stated above are elements of the set P. In particular, we define

aip=

X

(i,j)∈A

bijp, i = 1, . . . , n

which is a binary variable, with aip= 1 if and only if node i is visited by path p.

For each p ∈ P we set a binary variable θp to be 1 if and only if path p is taken

by one of the buses, and we let cp> 0 be the cost of path p. Then the problem reads

min X p∈P cpθp (4.2.1) Subject to X p∈P aipθp = 1, for i = 1, . . . , n (4.2.2) X p∈P θp 6 K (4.2.3) θp ∈ {0, 1}, for p ∈ P. (4.2.4)

Constraint (2.10) assures that each node is visited by exactly one route, while constraint (2.11) tells that at most K buses enter the network.

Of course, we could define cp in an arc-oriented way, by taking it to be equal to

the sum of all costs of the arcs which p is made of: cp=

X

(i,j)∈p

cij.

This would then match the definitions made in the Multicommodity Flow formula-tion; also, this is what we will consider in our implementation. However, we mention here that more general definitions of cp can be made; for example, we can assign

a higher cost to an arc when it is visited after some other given arc, so that more flexible time constraints can be implemented. This observation makes the Set Parti-tioning formulation more versatile than the MF one; we mention here that modifying in this way the calculation of path costs would amount to only tiny modifications on the algorithm we will illustrate in the following section.

Notice that the number of variables involved in the Set Partitioning formulation is much bigger than the one in the MFP formulation. As a payoff, the constraints are considerably simpler, and the powerful method of column generation can be used to solve the problem formulated in this way. In this setting it consists in finding the solution of the linear relaxation (P) and of its dual (D):

min {X p∈P θp, X p∈P aipθp ≥ 1, θp ≥ 0, i = 1, . . . , r}, (P )

(30)

max{ r X i=1 πi, 1 − r X i=1 aiππi ≥ 0, π ∈ P }. (D)

and to try to transform it into an (integer) solution of the original problem. Letting (θ∗_{, π}∗₎ _{respectively the optimal solutions of (P}

B) and (DB), we define

the reduced cost

cp := 1 − n

X

i=1

aipπ∗i

of each path p ∈ P . Then the solution θ∗ _{is feasible for the original problem if and}

only if all reduced costs are non-negative. Therefore, the Price Problem consists in finding the minimum of the reduced costs cp over all p ∈ P : if the solution is

negative, then the corresponding column must be added to the Master Problem and the procedure must be repeated.

This is a form of the Shortest Path Problem, which is typically solved via a Dijkstra algorithm.

Eventually, when a positive minimal reduced cost is achieved, we will have the optimal solution of the linear relaxation (P ), therefore there is no guarantee such a solution will be integral. In order to extract the optimal integral solution, we have to continue in our Branch&Price algorithm as it was illustrated for the Bin Packing Problem. Typically, from the optimal solution found we fix heuristically some values of the variables, and then “branch” on the search tree to find the other variables. Each time, a new problem is defined which must be solved from the beginning, and the entire process is repeated until all variables are found.

4.3 Solving the price problem: the Dijkstra algorithm

One of the most used and long-known algorithms to solve the price problem is the Dijkstra algorithm. The algorithm consists of placing at each node a set of labels, which represent all paths checked so far to reach the node. In this way, we can easily create all feasible paths by passing from one node to the other, each time checking the data included in the relative labels. We describe this algorithm in the case of interest.

We let S = (N, A) be the graph of our problem, which we assume to be non-oriented. For each i ∈ N we let di ≥ 0 be the demand of the node, for each arc

(i, j) ∈ A we let cij ∈ R be the cost relative to the arc (for the sake of simplicity

of notations, we write cij instead of cij, but it is crucial to recall that the values we

use in the price problem will be the reduced costs, which can be negative). We call 0 ∈ N the distiniguished node representing the depot, and we label the other nodes by the integers from 1 to n, so that |N| = n + 1. For each k ∈ N, we define the set

(31)

4.3. SOLVING THE PRICE PROBLEM: THE DIJKSTRA ALGORITHM 31 of cumulative demands Dk =    l X j=1 dij : i1= 0, il= k, (i1, . . . , il) is a feasible path    ⊆ R, a cumulative cost function fk: Dk→ R such that

fn(d) = min    l−1 X j=1 cij,ij+1 : l X j=1 dij = d   

and a predecessor function pk: Dk→ {−1, . . . , n} such that

p0(0) = −1, pk(d) = il−1 where fk(d) = l−1

X

j=1

cij,ij+1.

Notice that this is not a good definition of the predecessor function, since there could be many tours arriving at node k with cumulative demand d. This will be a serious issue when enhancing the algorithm with the “suggested arcs” conditions described in the next subsection; at this point though, this is not a big deal, since two tours arriving at k with the same cumulative demand and cost, can be considered totally equivalent. Thus, the value of pk(d)in an algorithm will depend on its actual

computer implementation, but all such implementations will give the same correct results.

If fk(d1) ≤ fk(d2), we say that the path used to reach the node k with cumulative

demand d1is shorter than the respective one for cumulative demand d2. Thus, pk(d)

is the node we we have left when entering k by the shortest possible path which gives cumulative demand d at node k. The initial definition p0(0) = −1 just labels the

start of the search.

The sets Dk are of course enormous, however the idea of the algorithm is to

recursivel construct only a small subset of each of them, by ruling out paths whose cumulative demand or cumulative cost is “too big” to be useful. Therefore, in the computer implementation, we define Dk as dynamic arrays which are updated, be

adding and deleting elements, at each node checked. We will give more details about the implementation in Section 5, here we will just describe the algorithm in mathematical terms.

A key object in this algorithm is the priority queue Q, which is a set of pairs of the type (i, d) with i ∈ N and d ≥ 0, which determines the order we should visit the nodes and when to stop the algorithm. This is also a dynamic object, which is updated many times along the algorithm.

1. Start the algorithm at node 0 ∈ N with D0 = {0}, f0(0) = 0, Q = {(0, 0)},

and Di = ∅for all i 6= 0.

(32)

queue such that

d = min{e : (j, e) ∈ Qfor some j ∈ N},

and if there are more such elements (ij, d), choose the one with minimal fij(d).

If still there are more such nodes, we adopt a strategy of “first-in-first-out”: the less recent datum added to Q has priority. Store the values i and d, and delete the element (i, d) from Q. If i = 0 and f0(d) < 0, then go to point 4;

otherwise if i = 0 and d 6= 0, then repeat point 2. Otherwise pass to point 3, whose parameters will be i and d found at this point.

3. For all j such that (i, j) ∈ A, check the followning conditions: If d + dj > C, skip node j.

If ∃e ∈ Dj : e ≤ d + dj and fj(e) ≤ fi(d) + cij, skip node j.

If ∃e ∈ Dj : e ≥ d + dj and fj(e) ≥ fi(d) + cij, delete e from Dj.

Here by “skip node j” we simply mean that node j should not be ckecked at this stage of the algorithm. For all j not skipped, add d + dj to Dj and define

fj(d + dj) = fi(d) + cij, pj(d + dj) = i.

Finally, add the pair (j, d + dj) to Q and go to point 2.

4. Let l0 := min f0 =: f0(d0) and construct the tour T = (0, i1, . . . , ir, 0) such

that i1= p0(d0), and for all j = 2, . . . , r,

ij = pij−1(d0− di1− . . . − dij−1) 6= 0.

5. Store the tour T and return L = Pr

j=0cij,ij+1 (where we write i0 = ir+1 = 0).

Notice that the way we defined dynamically the Di’s assure that the tour T at

point 4 is well defined. Moreover, since points 2 and 3 form a loop, we notice that the algorithm gets to point 4 if and only if either at point 2 we are checking i = 0 with f0(d) < 0, or Q = ∅. In the former case, we have found a negative cost tour,

which is then stored at point 5 and its cost L is returned; in the latter case, we can deduce that there is no negative cost tour, thus the tour which is generated at point 4 is the global optimum of the price problem. If this is the case, the returned cost L will be positive and the price problem is solved.

4.4 The “Suggested Arcs” algorithm for the VRP

Similarly to what has been done with the BPP, we proppose here a primal sta-bilization for the VRP. This algorithm has been suggested by MAIOR, which is successfully using it in real-world situations. Just as in the BPP case, the idea is to

(33)

4.4. THE “SUGGESTED ARCS” ALGORITHM FOR THE VRP 33 identify, at each call of the pricer, a set of “good” arcs, the suggested arcs, and to impose the pricer to create tours which has at most a fixed amount (the tolerance) of non-suggested arcs. Indeed, in this case as well it is experimentally proved that just imposing that all arcs of each tour should be suggested (a method called tour aggregation) brings to low-quality solutions at the end of the suggested arcs routine; thus, the possibility of setting a positive tolerance is of key importance to achieve a decent improving.

We can still recognize here a cardinality-like way to implement the constraint, as in (2.3.1). Moreover, in this case also, the arcs will be suggested through the use of the dual information, i.e. via their reduced costs.

First of all, we need a rule to suggest the arcs. In this problem, following MAIOR’s idea, we fix a parameter m at the beginning of the problem, which is the number of suggested arcs exiting every node. For every node i ∈ N, we arrange the reduced costs of each arc (i, j) ∈ A in non-decreasing order:

ci,j1 ≤ ci,j2 ≤ . . . ≤ ci,jr.

In the case ci,jl = ci,jl+1 for some l, we assume jl < jl+1 to make this a total order.

We then state that the arcs ci,j1, . . . , ci,jm are suggested, while the others are not.

In the case m ≥ r, we state that all arcs exiting node i are suggested. For an arc (i, j) ∈ A, we define li,j = 0 if the arc is suggested, and li,j = 1 if it is not. Let

τ ∈ Z≥0 be the tolerance.

For any path (i1, . . . , ir), we define its level to be

l(i1, . . . , ir) = ]{non-suggested arcs in {(ij, ij+1)}r−1j=1} = r−1

X

j=1

lij,ij+1.

To modify correctly the Dijkstra algorithm in this case, we now have to deal with the issue of the ill definition of the predecessor function. Indeed, assuming we have two tours having the same cumulative demand and cost, nothing assures us that they also have the same level. Since the new condition will impose the pricer to only generate tours up to certain level, we cannot consider the two tours as equivalent. To implement this idea, we update the definition of the data Dk attached to the node:

now they will be subsets of R+× Z≥0 defined as

Dk=      r X j=1 dij, l(i1, . . . , ir)  : i1= 0, il= k, (i1, . . . , il) is a feasible path   

so that the level of a path is stored in the second component of (d, l) ∈ Dk. We

modify the cumulative cost function fk: Dk → R accordingly

fk(d, l) = min    r−1 X j=1 cij,ij+1 | r X j=1 dij = dand l(i1, . . . , ir) = l    .

(34)

In this way, we can define the predecessor function pk : Dk→ {−1, . . . , n} such that p0(0, 0) = −1, pk(d, l) = ir−1 where fk(d, l) = r−1 X j=1 cij,ij+1.

Of course, if two paths have same cumulative demand, cost and level, they will be considered equivalent, so as far as this algorithm is concerned, we can use any definition of the predecessor function which meets the conditions written above.

Reflecting the new definition of the data sets Dk, the new priority queue Q is a

set of triples (i, d, l) with i ∈ N, d ≥ 0 and l ∈ Z≥0.

1. Start the algorithm at node 0 ∈ N with D0 = {(0, 0)}, f0(0, 0) = 0, Q =

{(0, 0, 0)}, and Di = ∅ for all i 6= 0.

2. If Q = ∅, go to point 4. Otherwise, let (i, d, l) ∈ Q be the element of the priority queue such that

d = min{e : (j, e, l) ∈ Qfor some j ∈ N and l ∈ Z≥0};

if there are more such elements (ij, d, lj), choose the one with minimal fij(d, lj).

If still there are more such nodes, choose the one with lowest level l, and if there are still more we adopt a strategy of “first-in-first-out”: the less recent datum added to Q has priority. Store the values i, d and l, and delete the element (i, d, l)from Q. If i = 0 and f0(d, l) < 0, then go to point 4; otherwise if i = 0

and d 6= 0, then repeat point 2. Otherwise pass to point 3, whose parameters will be i, d and l found at this point.

3. For all j such that (i, j) ∈ A, check the followning conditions: If d + dj > C or l + li,j > τ, skip node j.

If ∃(e, k) ∈ Dj : e ≤ d + dj, fj(e, k) ≤ fi(d, l) + cij and k ≤ l, skip node j.

If ∃(e, k) ∈ Dj : e ≥ d+dj, fj(e, k) ≥ fi(d, l)+cij and k ≥ l, delete (e, k) from Dj.

Here by “skip node j” we simply mean that node j should not be ckecked at this stage of the algorithm. For all j not skipped, add (d + dj, l + li,j) to Dj.

Then define

fj(d + dj, l + li,j) = fi(d, l) + cij, pj(d + dj, l + li,j) = i.

Finally, add the pair (j, d + dj, l + li,j) to Q and go to point 2.

4. Let min f0 =: f0(d0, l0) and construct the tour T = (0, ir, . . . , i1, 0) such that

i1 = p0(d0, l), and for all j = 2, . . . , r,

(35)

4.4. THE “SUGGESTED ARCS” ALGORITHM FOR THE VRP 35 5. Store the tour T and return L = Pr

j=0cij,ij+1 (where we write i0 = ir+1 = 0).

A similar argument as in the classical case shows that this algorithm does ter-minate, either finding a negative-cost tour, or by finding the shortest tour T such that its level is l(T ) ≤ τ. In the former case, the column corresponding to T must be added to the master problem. In the latter case, however, we are not sure to ac-tually have found the optimal tour in the graph, which might have level larger than τ. Thus, analogously to what has been done in the BPP, we must run the classical algorithm until no more columns can be added. Notice again that, if we chose the number of suggested arcs m and the tolerance τ in a good way, a large number of columns will have already been generated at the first run of the classical algorithm, thus we can expect that relatively few columns will be missing at that point. All this should be considered when setting the parameters to be captured in the numerical experiments.

(36)

(37)

Chapter 5

The SCIP Optimization Suite

As already mentioned, we are using the software SCIP to implement and solve the two problems. The approaches for the BPP and VRP are different in the way SCIP is called. In particular, the latter is a straightforward console program, while the former opens the SCIP interface in the shell, and includes the relevant plugins at the moment the problem is read.

SCIP has many features which make it a valid tool for Operational Research, but we will mostly use it as a framework for Branch-and-Price algorithms and Column Generation.

5.1 The core structure of SCIP

The core structure of any SCIP program is the definition and solution of the problem. This means that the following tasks must be done, in order. We will not give too many details about the syntax and parameters of the functions mentioned here, and refer to the SCIP Doxygen documentation or to the comments written in the source code provided.

• Create the SCIP pointer via SCIPcreate. • Create the problem via SCIPcreateProb.

• Create the variables via SCIPcreateVar and add them to the problem via SCIPaddVar.

• Create the constraints via SCIPcreateCons and add them to the problem via SCIPPaddCons.

• Solve the problem via SCIPsolve.

• Release variables and constraints via SCIPreleaseVar and SCIPreleaseCons. • Deinitialize SCIP via SCIPfree.

(38)

We will refer to the previous steps as the core structure of SCIP.

Any other procedure or algorithm to solve a problem must be inserted into this basic framework. The way to do this in practice is to write functions which must be passed in SCIP via SCIPinclude. The precise type of the code to include depends on the specific tasks we want to execute. The functions included are called plugins, and the sentence we will take as a mantra is “in SCIP, everything which is not the core structure is a plugin” (since this is freely taken from [24], I will refer to it as Schwarz mantra). Actually, even if we want to just solve an LP problem using only the SCIP core structures, we cannot avoid to at least include the default plugins via the command SCIPincludeDefaultPlugins.

Thus, in order to create more refined programs which implement our algorithms, all we have to know is how to define plugins, how to smoothly include them in our core structure, and how to pass the relevant parameters to them.

SCIP itself is written in the C programming language, although it comes with wrapper classes which allow to program with it using C++. In our case, since we have wanted to modify the already available code for the BPP and the VRP, we have chosen to write the former in C and the latter in C++.

5.2 The SCIP shell

When SCIP is built and compiled, a binary file is created. When such file is executed in command line, the SCIP shell is opened up. This can be seen as a prompt in which some commands can be called: the list of such commands can be printed via the command help. Arguably the most important ones are read and optimize.

When the read command is called, a file must be prompted by the user, together with one of the supported extensions. This is an important point: when we say “supported extensions” we are hiding the fact that some plugins have been defined in order to deal with those files. Although at this point no user-defined plugin is included, we should keep in mind that all the necessary plugins have been included by the distributed version of the SCIP Optimization Suite, so that Schwarz mantra is respected.

After that, a file with a supported extension is read and the optimize command is prompted, SCIP uses the plugins corresponding to the given extension to solve the problem and eventually do all the side tasks.

When writing a program and user-defined plugins, we can open up a SCIP shell via SCIPprocessShellArguments(scip). Prior to this function, we include the plu-gins defined by us via SCIPinclude (see the following paragraphs for further details), so that the read and optimize commands will do what we expect.

5.3 SCIP as an LP solver

The SCIP Optimization Suite as of version 6.0.1 comes with the LP solver SoPlex, which stands for “Sequential object-oriented simPlex”. This is an optimization

Primal stabilization in column generation

Università di Pisa

Primal Stabilization

in Column Generation

Introduction

Acknowledgements

Contents

Chapter 1

Column Generation,

Dantzig-Wolfe Reformulation,

Lagrangian Dual

1.1

Column Generation

1.2

The Dantzig-Wolfe Reformulation

1.3

Lagrangian relaxation

1.4

Branch & Price

Chapter 2

Stabilization

2.1

Dual stabilization

2.2

Primal stabilization

2.3

The suggested elements approach

Chapter 3

The Bin Packing Problem

3.1

The Kantorovich formulation

3.2

The Gilmore-Gomory formulation

3.3

The Price Problem

3.4

The “Suggested Objects” algorithm

Chapter 4

The Vehicle Routing Problem

4.1

The Multicommodity Flow formulation

4.2

The Set Partitioning formulation

4.3

Solving the price problem: the Dijkstra algorithm

4.4

The “Suggested Arcs” algorithm for the VRP

Chapter 5

The SCIP Optimization Suite

5.1

The core structure of SCIP

5.2

The SCIP shell

5.3

SCIP as an LP solver