33 4.2 The “Standard Model” of IMRT Optimization

(1)

Physical Optimization

Uwe Oelfke, Simeon Nill, Jan J. Wilkens

4

Contents

4.1 Introduction . . . . 31

4.1.1 What Is an ‘Optimal’ Treatment Plan? . . . . . 32

4.1.2 Physical Parameters and Indicators . . . . 32

4.1.3 Clinical Compromises and Steerability . . . . . 33

4.2 The “Standard Model” of IMRT Optimization . . . . . 33

4.2.1 What is Optimized? . . . . 34

4.2.2 Dose Constraints and Quality Indicators . . . 34

Dose and DVH Constraints . . . . 34

Dose Volume Constraints . . . . 34

Quality Indicators for Physical Constraints . . 34

Objective Functions . . . . 35

Technical Criteria – Plan Degeneracy . . . . . 35

4.2.3 Optimization Algorithms . . . . 36

Deterministic Approaches . . . . 36

Stochastic Methods . . . . 37

Summary . . . . 39

4.3 Optimization of Other Degrees of Freedom . . . . 39

4.3.1 Optimization of Beam Directions . . . . 40

Techniques for the Optimization of Beam Directions . . . . 41

4.3.2 Aperture-based Optimization . . . . 41

Contour-based Optimization . . . . 42

Direct Aperture Optimization . . . . 42

References . . . . 43

4.1 Introduction

One of the two prerequisites for the clinical application of IMRT was the development of inverse planning strategies – simply because the available forward planning strategies could not be applied to the optimization of the enormous number of treatment parameters suddenly required for the efficient delivery of intensity modulated treatment fields. The concept of ‘physical optimization’ was the first strategy implemented in commercial inverse planning systems and still currently is the ‘work- ing horse’ of most available planning platforms. Even the modifications of the original concept often referred

to as ‘biological optimization’, described below, basically keep the same logical structure of the optimization while only the mathematical formulation of the objectives of the optimization is modified. One common factor of both approaches is the selection of the energy fluence profiles for a pre-selected set of beam ports as the only treatment parameters to be optimized in the planning process. We consider the respective physical optimization as the current ‘standard model’ of IMRT optimization and review some of its detailed features later. The more recent extensions of this concept that attempt to include further physical degrees of freedom in the optimization process are described. The reader should be aware that the following brief discussion cannot aim to reflect all aspects of the physical optimization approach for IMRT. A by far more complete review about inverse planning and IMRT optimization and their details can be found in the papers of T. Bortfeld [1, 2] and S. Webb [3, 4].

Before we start to review the methods mentioned above, we want to discuss briefly a few of the uncer- tainties or ‘weak’ conceptual points of current treatment plan optimization strategies, which mostly can be traced back to our quite limited knowledge of the correlation between delivered dose patterns and their induced clinical impact. First, there arises the obvious difficulty of what should be considered the optimal clinical treatment plan. IMRT allows for an enormous variety of achievable dose patterns whose merits have to be derived from mostly a few more or less clearly defined clinical endpoints. This fact requires a reduction of our complete 3D dose distributions to a few ‘global’ indicators which are assumed to represent the quality of its related treatment plans. This aspect will shortly be addressed. Finally, the lack of detailed knowledge of the correlation between dose and clinical response allows and demands for a certain ‘subjectivity’ about what is the best clinical compromise achievable with IMRT, i.e., each treatment plan optimization has to introduce a certain

‘steerability’ for the physician to achieve his preferred plan. New strategies related to new approaches like multi-criteria optimization [5–7] are brieﬂy mentioned later.

(2)

4.1.1 What Is an ‘Optimal’ Treatment Plan?

The task of IMRT optimization, or more precisely the task to generate an optimal clinical treatment plan for an IMRT treatment of a speciﬁc patient, is a conceptu- ally and to some extent also mathematically challenging problem. Ideally, the goal of the optimization effort should be uniquely deﬁned and be quite clear before the development of respective optimization strategies.

Unfortunately, the definition of clinical optimal is by no means clear for the detailed level of dose painting achievable with IMRT. As soon as one goes beyond the paradigm of 3D-conformal therapy – maximum dose concentration within the tumor combined with minimal doses in organs at risk – one has to face far more difficult decisions about the optimal clinical plan. Subtle decisions on clinical compromises between tumor dose distributions and doses in several partially irradiated organs at risk require additional knowledge about the importance of dose homogeneity in the target, dose volume effects in organs at risk, radio-sensitivity of patient specific tissues and many more so called ‘biological’ parameters. The current lack of a solid ‘a priori’ knowledge of respective therapy relevant parameters is one of the intrinsic problems of IMRT optimization.

One key issue is the correlation of current clinical experience with a set of physical treatment parameters responsible for the observed clinical effect. Considering that usually the number of clinically relevant indicators for the assessment of success, failure or speciﬁc risks of a treatment is fairly small compared to the number of employed treatment parameters, this process naturally involves quite a reduction of the available physical information. For instance sets of complete 3D-dose distributions have to be reduced to a few, mostly organ averaged, quantities which serve as indicators for the quality of an achieved treatment plan. One of the ad- vantages of ‘physical optimization’ concepts is that they are based on quality indicators derived completely from veriﬁable physical quantities like dose levels and irradiated volumes while the ‘biological approach’ assumes additional knowledge – mostly in terms of phenomeno- logical parameters – to characterize an observed clinical effect. Both strategies are successfully implemented in clinical practice. While the ‘biological optimization’ approach can be considered as an extension to the purely

‘physical concept’ both of them naturally suffer from the fact that our detailed knowledge about the response of complexly organized tissues to radiation ﬁelds with complicated variations in space and time is quite limited.

4.1.2 Physical Parameters and Indicators

Let us now consider the basic physical parameters–

quantities which can be determined by a well-deﬁned

measurement without any additional assumptions – and discuss their role for the optimization process. The primary physical quantity available for characterizing a radiation treatment is the three-dimensional dose distribution for a patient anatomy speciﬁed by its electron densities obtained from a CT scan. This simple physical representation of the anatomy is further reduced by specifying volumes of interest (VOI), whose deﬁnitions especially for tumor targets is of crucial importance for the optimization. Current routinely employed di- agnostic procedures usually cannot specify any further spatial discrimination of the VOIs with respect to their functionality or radio-sensitivity.

This lack of information seemed to justify the well- accepted reduction of complete 3D-dose distributions within VOIs to the respective dose volume histograms (DVHs), which form the basis of all currently employed optimization strategies. The loss of intrinsic spatial dose information induces a certain amount of ‘blindness’ to the optimization procedure, which might well be clinically relevant. As example one can just think of analyzing a correlation between minimal dose values in a target and related tumor control. Not knowing whether a peripheral region of the target VOI, usually carry- ing a higher probability of not being tumor tissue, or an actual central tumor voxel receives that minimal dose almost prevents any meaningful analysis of this type. Furthermore, our current ﬁxation on optimization procedures based on global DVHs may be seriously questioned in the future as the increased application of biological imaging modalities in radiation oncology [8]

may reveal the importance of a spatially correlated ‘ﬁne structure’ within the traditional VOIs.

Global organ DVHs are the essential quantities from which all quality indicators of treatment plans are derived. They therefore form the basis of current IMRT optimization approaches employed for inverse planning. The definition of quality indicators constitutes a further reduction of the information about the ini- tially available dose distributions. Physical indicators are simply determined in terms of doses and irradiated volumes, e.g., minimal and maximal dose, global dose averages like mean or median dose or selected coordi- nate pairs of dose and irradiated volume to restrict the shape of a specific DVH. The definition of ‘biological’ indicators additionally requires further parameters, e.g., the assumed radiosensitivity of a tissue, which is nec- essary for the definition of the equivalent uniform dose (EUD) [9]. Common to both sets of indicators is their

‘global’ deﬁnition for an entire VOI, i.e., they do not allow the control of any local dose features within a certain region of interest.

Finally, we just would like to note that one specific physical parameter so far has not been explicitly con- sidered in IMRT optimization strategies: time. Neither global dose rate effects nor the influence of different fractionation schemes seemed to be a first order effect

(3)

relevant for the optimization of IMRT. In comparison to effects of spatial dose inhomogeneities, speciﬁcally addressed in almost every optimization strategy, temporal dose inhomogeneities are always assumed either to be non-existent or of minor relevance. Now that time adapted radiotherapy based on new linac-integrated imaging technologies might actually provide us with the combined spatial and temporal dose information, the issue of time dependent effects may need to be addressed explicitly in the future.

4.1.3 Clinical Compromises and Steerability

After a set of global quality indicators for various tissues is identiﬁed, there is still the open question how these should be combined for the evaluation of a treatment plan. Moreover, the planner or physician usually needs a ‘steering wheel’ for the optimization process, which allows him to put more emphasis on one or several quality indicators for selected tissues of interest.

This practical procedure of finding the best clinical compromise is one of the most difficult planning tasks and currently available planning systems can address this problem only insufficiently. There are basically three intrinsic problems of the current planning approach.

First, as discussed earlier, current optimization procedures rely on global dose indicators, which do not allow the control of any local dose feature within a volume of interest, i.e., currently the planner will be forced to introduce ‘artiﬁcial’ VOIs in order to manipulate this speciﬁc feature of the plan. Second, and more profound, the planning system does not provide any information about the physical limits which can be reached for a selected set of treatment parameters. Third, and probably of most practical consequence, is the fact that the planning system does not have ‘prior’ information about

‘efﬁcient compromises’ between conﬂicting goals of the optimization. For instance, information like “how much more would the average dose in organ ‘y’ increase if the dose homogeneity in the target is improved by 5%” can only be obtained by a lengthy manual trial and error planning process.

At least the last two problems mentioned above could be adequately addressed by new planning strategies based on multi-criteria optimization methods.

A key feature of this approach is that the optimization process will be separated in two steps. First, new mathematical strategies will be employed to create an entire set of clinically relevant treatment plans for which the ‘efficient compromises’ between different conflicting objectives are well known. Second, a new navigation tool will allow an easy and efficient search for the optimal treatment plan preferred by the individual planner. More details and first applications about these new planning strategies can be found in [5–7, 10].

4.2 The “Standard Model” of IMRT Optimization

While the potential benefit of intensity modulated treatment fields was first demonstrated by mathematical inversion of the relationship between energy fluence and its resulting dose patterns for idealized geometries [11], current practical approaches are all based on an iterative optimization scheme as schematically displayed in Fig. 1. The starting point of the optimization is a selected set of variable treatment parameters x whose values have to be adjusted to their optimal setting xopt. First, the 3D-dose distribution D is calculated for the starting, non-optimal values of x. Next, this complete 3D-dose pattern is reduced to a single number via the objective function OF(x). The value of OF(x) represents the qual- ity of the current plan and therefore allows a ranking of different plans, i.e., the optimization of the treatment plan corresponds in mathematical terms to a search for the minimal (in most cases preferred) or maximal value of OF. This is achieved with the help of the optimization algorithm, which calculates

an up-dated set of x-values, labeled as x in Fig. 1, for the next iteration of the optimization process. Often, the convergence of this ‘optimization loop’ is stopped when a certain threshold value for the relative change of OF(x) between two subsequent iterations is not ex- ceeded. The parameter set x found at that time is considered the result of the optimization process and will be used for the ﬁnal assessment of the plan quality.

In the following, we will describe in detail the main components of the optimization loop employed for the standard IMRT optimization that uses only physical quantities for the speciﬁcation of the plan quality.

Fig. 1. The optimization loop for iterative IMRT optimization.

From a set of initial treatment parameters x (usually the intensity amplitudes) a 3D-dose distribution is derived. Then, the respective treatment plan is evaluated and ranked by the objective function, which is based on current clinical experience. If the quality of the treatment plan is considered to be sufﬁcient the current value x is chosen as the optimum x_opt. If no convergence towards the opti- mal value is detected new ﬂuence amplitudes xare suggested by the optimization algorithm and a new iteration of the optimization is initiated

(4)

4.2.1 What is Optimized?

In the standard approach of IMRT plan optimization [1, 12, 13] the 2D-fluence profiles for all beam ports are chosen as the only free treatment parameters. Obvious other choices, like the number of beams, the beam entry angles, the beam energy or even the radiation type are preselected by the planner and remain fixed during the optimization process. The subdivision of each treatment field into small independent spatial sub-units, labeled by numbers from 1 to Nband usually referred to as ‘bixels’, allows one to define a set of related fluence amplitudes xi, i=^{1. . .N}bas optimization parameters. The number of these parameters naturally depends on the size of the lesion being irradiated and the choice of the spatial resolution of the fluence maps. In most cases the spatial extension of the bixel is chosen to correlate closely with the leaf width of the multi-leaf collimator (MLC) to be used for the actual treatment, e.g., for a standard MLC with 10 mm leaf width (projected at the plane of the isocenter) a bixel size of 10 mm×10 mm is employed.

The use of ﬁner bixel resolutions seems to be advantageous for the irradiation of complex shaped smaller lesions with micro- or mini-MLCs [14, 62]. The respective gain in spatial precision results in an increased number of treatment parameters and most likely also in extended overall treatment times. It has been shown that bixel resolutions in the range of 2–5 mm square can be beneﬁcial for selected clinical cases, and that a further increased bixel resolution will however, fail to yield even more improved results [63].

The choice of fluence amplitudes as the only optimization parameters facilitates the solution of the optimization problem considerably if a simple mathematical formulation of the optimization task in terms of physical parameters is chosen. One obvious reason is the linear relationship between fluence and dose. How this optimization problem is set up and which methods can be used for its solution will be described in the following paragraphs of this section. Approaches, which extend the standard optimization to other treatment parameters than fluence amplitudes are discussed separately below.

The concept described so far aims to optimize the so called ‘ideal’ intensity maps, i.e., the optimization process is based on the chosen ‘bixel’ configuration without considering the next practical step for the delivery of IMRT treatment fields: the conversion of the ideal intensity map into patterns of leaf sequences. Naturally, this step involves a re-grouping of the elementary bixels into larger, practically deliverable treatment fields, which can only approximate the ideal intensity profiles and therefore usually leads to a reduced quality of the treatment plan. The size of these effects was for instance demonstrated by [15] and there are several approaches to include these effects into the original optimization

scheme [15–17]. Another attempt to overcome the ‘se- quencer problem’ in IMRT optimization is the method of direct aperture optimization also discussed below.

4.2.2 Dose Constraints and Quality Indicators

Dose and DVH Constraints

The speciﬁcation of dose constraints for the tissues of interest is the ﬁrst step for the mathematical formulation of the optimization problem. Ideally, the constraints should be derived directly from clinical experience, i.e., they should rely on a direct correlation between clinical observation and characteristic dose values.

For targets, usually global thresholds for the tolera- ble minimal and maximal doses (Dminand Dmax) are set for the entire volumes, i.e., for a tumor VOI whose voxels are labeled with an index i ranging from 1. . .NT, the respective doses Dishould all satisfy the constraint:

Di> Dminand Di< Dmax. The doses Dminand Dmaxare usually chosen close to the prescribed dose Dpres and the respective dose window allows for some flexibility if conflicting goals in organs at risk have to be satis- fied simultaneously. The values of Dminand Dmaxin this approach represent the ‘clinical’ experience, i.e., they characterize the biological dose response although they are purely physical quantities. If both tolerance dose values are chosen to be close to each other and if both dose constraints are enforced with high priority, the use of Dminand Dmaxshould also lead to sufficient dose homogeneity. However, the feature of dose homogeneity in the target can also be addressed explicitly by imposing a constraint on the dose variance for target structures.

Dose Volume Constraints

For the remaining k=^{1. . .N}^k organs at risk (OARs), composed of N_i^k voxels, the natural expansion of the dose constraints described above is to employ a max- imal tolerance dose D^k_max for all voxels. For OARs, however, one wants to include not only global constraints for the entire organ. Observed or predicted dose volume effects for certain OARs can be accounted for by imposing one or several maximal dose limits D^k,lmax only if the dose exceeds this tolerance dose for a fraction vlof the organ’s volume [18, 20]. These dose- volume histogram constraints, i.e., DVH(D^k,lmax) < vl, can be visualized directly as shown in Fig. 2 where the two points (D^k,lmax; vl) l=1, 2 in the dose-volume plane should all be above the actual DVH-curve. The respective points identify ‘forbidden’ zones in the DVHs as indicated.

Quality Indicators for Physical Constraints

Next, the previously deﬁned clinical dose constraints are employed to deﬁne a mathematical measure which in-

(5)

Fig. 2. Dose volume histogram constraints. Two typical DVH con- straints are indicated by the two data points (D^k,1_max, v₁) and (D^k,2_max, v2), i.e., no volume greater than v1(v2) should be irradi- ated with a dose higher than D^k,1_max(D^k,2_max). As a consequence the DVH constraints mark the indicated regions in the volume-dose plane as ‘forbidden zones’, i.e., if a DVH crosses these areas the DVH constraint is violated

dicates its quality for a given treatment plan. Usually, this is done by assigning a numerical value to a speciﬁc violation of the given constraint. These quality indicators refer to one individual constraint and tissue, e.g., the standard measure is the sum of the quadratic dose deviations found for all voxels of the considered organ.

In mathematical terms, the related function OF_T⁽⁻⁾for the avoidance of an under-dosage of the target takes the form

OF⁽⁻⁾_T (x)= ¹

NT NT

i=¹

C+

D^T_min− D^T_i(x)2

. ⁽¹⁾

The analogue term for the avoidance of global over- dosage effects for either target or OARs reads

OF⁽⁺⁾_k (x)= ¹

Nk N_k

i=¹

C+

D^k_i(x) − D^k_max2

. ⁽²⁾

The operator C+(x) deﬁned by C+(x)=^{x for x}≥ 0 and C+(x) = 0 for x < 0 ensures that only constraint vio- lations contribute to the quality indicators OF⁽⁺⁾ and OF⁽⁻⁾. These most prominent quality indicators of objective functions – the measure of square deviations form given dose constraints – introduces new parameters into the optimization problem, which are not based on clinical experience but which instead are required for the mathematical formulation of the optimization problem. The ‘square’ terms were mostly chosen because the resulting full objective functions are still mathematically convex [4], which allows the employment of very fast gradient techniques for the solution of the optimization problem (for more details see section later).

The deﬁnition of quality indicators for the more general DVH constraints is done similarly and can for instance be found in [19]. Unfortunately, DVH constraints lead to non-convex objective functions [21, 22] However, it was shown [23] that the resulting local minima in the overall objective function are of minor practical relevance.

Objective Functions

For the final mathematical formulation of the optimization problem, the individual quality indicators which each represent the merit of a particular plan for one clinical constraint have to be combined to yield a single valued quality measure of the complete treatment plan. Naturally, the given indicators for target struc- tures OFT and organs at risk OFk refer to mutually conflicting goals of the optimization, i.e., the combi- nation of these individual constraints is crucial for the clinical compromise achievable with that particular optimization scheme. Furthermore, and maybe even more important, the design of the overall objective function has to introduce ‘steering parameters’ such that the planner can efficiently derive the clinically acceptable plan of his choice. Simplicity of the objective function, combined with the request that the overall objective function remains convex, leads to the well known weighted sum of individual quality indicators, i.e.,

OF(x)=^w⁽⁺⁾T OF_T⁽⁺⁾(x) + w⁽⁻⁾_T OF_T⁽⁻⁾(x)

+

k

wkOF_k⁽⁺⁾(x) . ⁽³⁾

With the introduced weighting factors w for each con- straint the planner can now ‘steer’ the result of the optimization towards the optimal treatment plan of his preference. As already mentioned, the parameters w unfortunately do not have any intuitive meaning and it is unknown how sensitive the outcome of an optimization is coupled to variations of the respective weighting factors. Both features make the manual determination of these parameters a cumbersome planning task, which is one of the reasons why there is increasing interest in the already mentioned approaches of multi- criteria optimization (see above). An extension of this introduction of global penalty factors with the aim to improve local dose features of a plan was introduced in [24].

Technical Criteria – Plan Degeneracy

As already mentioned, the standard model of inverse planning for IMRT optimizes an ideal fluence profile, independent of the sub-sequent translation into MLC- sequences required by the dose delivery system. Features of this important practical step directly prior to the dose delivery can be additionally optimized without signif- icantly reducing the plan quality. This is due to the fact that the solution of the IMRT optimization problem is not unique and that numerous sets of fluence amplitudes yield treatment plans of comparable quality. This degeneracy of the optimal fluence profiles, e.g., discussed in [4, 25, 26], allows one to account for further technical constraints that facilitate the practical dose delivery.

(6)

One of these additional technical criteria is the

‘smoothness’ of the intensity maps. Intensity maps that avoid patterns of clinically unmotivated high fluence gradients can be delivered faster and safer. The smooth- ing of these discontinuities can be either achieved by applying a median filter (cf. [27]) to the fluence profiles or by adding a respective term to the objective function [28].

Another important technical aspect is the conversion of the derived continuous fluence modulations into a spectrum of discrete fluence values, e.g., this number of intensity levels determines the number of subfields being used for the step-and shoot technique. It has been shown that for most cases a moderate number (5–7) of intensity levels seems to be adequate [29] which leads to a number of IMRT segments in the order of 100 which can be efficiently delivered with current IMRT technology.

4.2.3 Optimization Algorithms

To calculate the beam weights for a given set of constraints and a selected objective function, an optimization algorithm is required. Not all optimization algorithms can be used for all objective functions due to the mathematical properties of the objective functions. Over the last decades numerous optimization approaches for different problems haven been pub- lished and applied to problems within the ﬁeld of radiation therapy [3, 30]. In general, these algorithms can be divided into two categories. First, there are the deterministic algorithms like the gradient approach.

These techniques are applied to optimization problems where the objective functions are convex and therefore only a global minimum and no local minima exist [1]. For these convex objective functions like the standard quadratic objective function the deterministic algorithms can calculate the optimal solution very fast and are therefore currently used in most commer- cially available IMRT treatment planning systems [3].

Second, there are the stochastic methods, like simulated annealing or genetic algorithms. They have the advantage that even for non-convex objective functions based on biological objectives or DVH-constraints the global minimum can be found even if local minima exist.

In this section we only brieﬂy discuss the rationale of the most frequently used algorithms. First, as ex- amples for deterministic algorithms, simple gradient methods and the conjugate gradient approach are described. Second, the basic ideas of stochastic algorithms like simulated annealing and genetic algorithms are discussed. More detailed information about optimization algorithms used in radiotherapy can for instance be found in [31].

Deterministic Approaches

Steepest Descent This method is mostly used for ﬁnd- ing the global minimum of a convex objective function OF(x), where x represents the set of variable treatment parameters which have to be adjusted to their optimal value. The objective function can be visualized as a multi-dimensional surface given in terms of the coor- dinates x. For a general non-convex objective function a one-dimensional example is shown in Fig. 3. A key role in the optimization plays the ﬁrst derivative of this function or its generalization for N-dimensions – the gradient of OF(x). The gradient OF(x) determines the steepest direction along the surface of the objective func- tion. Finding the minimum of OF(x) via an iterative method requires that the values of the intensities x are updated at each step of the iteration i. The update of x while advancing from the iteration i to i + 1 is for the gradient approaches given by the rule

x(i + 1)=^{x(i) −}α· OF x(i)

. ⁽⁴⁾

This iterative search can be visualized as a ball rolling downhill into the valley along the steepest direction (see Fig. 3) until the minimum of the valley is reached. The constant factorα, often referred to as the damping factor, determines the step size of the iterative process. One problem with the steepest descent method is that many small steps in the calculated direction are performed even if the valley is of perfect quadratic form. The different methods of gradient approaches mostly differ by the determination ofαand therefore the step size. For the steepest descent method the value ofαis set to a ﬁxed value independent of the position and the iteration step.

Newton’s MethodThe Newton methods take into account the second order derivatives of the objective function for the determination of the damping factor, which controls the speed and success of the optimization. Employing a Taylor expansion of OF(x(i)) up to the second order derivatives one can show that a new damping factor for each iteration step is a promising alternative choice forα^.

Fig. 3. The steepest descend algorithm starts from position x₀and always walks with a predeﬁned step size towards the global mini- mum. If the algorithm started on the left side, the algorithm would be trapped in the local minimum

(7)

For the multi-dimensional optimization problem encountered in radiation therapy (several hundred ﬂuence amplitudes have to be simultaneously optimized) the damping factor can be expressed in terms of the inverse Hessian H⁻¹of the second derivatives of OF(x) [32], i.e.,

x(i + 1)=^{x(i) − H}⁻¹^x(i)^OF(x(i))

=x(i) −αNewtonOF x(i)

. (5)

One problem within the Newton approach is that for each step the complete inverse Hessian has to be calculated, which is a complex and time-consuming process.

One possible solution of this problem is not to recal- culate the inverse Hessian at each iteration step but to use approximations for the inverse Hessian [30]. If these methods are applied, the optimization algorithms are called “Quasi Newton” approaches. For example the treatment planning system KonRad uses a “Quasi Newton” algorithm for the optimization of the ﬂuence patterns [19]. There are other possible solutions and implementations for this problem, which can be found in [32]. The “steepest descent” method can be viewed as a special case of the “Quasi Newton” approach.

Conjugated Gradient Approach The problem with the

“steepest descent” or Newton methods is that the directions of the moves towards the unknown minimum in two successive iterations are not mutually orthogo- nal, i.e., the gain of approaching the minimum value of OF(x) achieved in one step of the iteration might get partially lost in the next step [32]. This is not the case for the “conjugated gradient” approach. Mathematically, there are two different methods to determine the global minimum of the objective function with that approach.

The ﬁrst approach calculates the Hessian at each step of the iteration and it can be shown that this version of the “conjugated gradient” approach ﬁnds the global minimum after N iterations where N is the number of optimization parameters x [32]. However, since the Hes- sian cannot always be calculated in a reasonable amount of time, an alternative approach is used more often for applications in radiation therapy [33].

Starting from an arbitrary point x(0) the objective function is evaluated, at different positions along the line through the starting point x(0) in the direction of the encountered ‘steepest descent’ h(0)=^−OFx(0)

. This is done until the position of the minimum of OF(x) along that line is found. At the position of the line mini- mum xminthe gradient g(1)=^−OF(x)xminis calculated and used for the determination of the next direction in which the global minimum will be approached. The new direction for this ‘line minimization’ approach is given by various iterative rules, i.e., h(i + 1)=^{g(i + 1) +}γ^(i)h(i)

where the factor γ(i) can be calculated according to Fletcher–Reeves or after Polak and Ribiere [32].

A detailed mathematical description and instruc- tions on how the conjugated gradient approach is

effectively implemented can for instance be found in [32]. One of the main problems with conjugated gradient algorithms can be the speed of the line minimization.

An example for a successful application of the conjugated gradient method in inverse IMRT planning is the HELIOS (Varian) IMRT treatment planning system.

One potential concern with deterministic algorithms is that the iterative process may get trapped in a local minimum (see Fig. 3) such that the desired global minimum is never discovered. Local minima can be encountered for example if DVH constraints are added to the objective function. Several investigations by different groups have come to somewhat different conclusions about this problem. Deasy [21] argues that the IMRT algorithm should include methods to deal with the problem of local minima, whereas others like Wu and Mohan [23] or Llacer et al. [22] have shown that even in the presence of local minima acceptable solutions for a given IMRT problem can be found. A study concern- ing the speed and the convergence properties of different gradient algorithms used for the optimization of IMRT can be found in [21].

Stochastic Methods

Stochastic optimization algorithms offer the advantage that they can find the optimal treatment parameters even for complex objective functions with potential local minima. The price to pay for this nice feature is a sig- nificantly increased optimization time in comparison to the discussed deterministic algorithms. Historically the method of simulated annealing was introduced as one of the first algorithms by S. Webb [34] in radiation therapy planning. More recently, even more complex optimization engines based on ‘genetic algorithms’ are employed for treatment plan optimization. In the following we introduce the basics of these concepts without referring to mathematical details which can be found in the given references.

Simulated Annealing There are basically two strategies as to how the method of ‘simulated annealing’ escapes from the trap of local minima – ‘climbing uphill’ and

‘tunneling’. Both methods were illustrated by Webb [35]

with a nice example.

Imagine a walker is instructed to ﬁnd a well in a hilly landscape. The well is assumed to be at the lowest point of the landscape and therefore coincides with the ‘global minimum’. Since the walker has no a priori knowledge on where to go he starts his task by walking downhill since he is aware that the mountains are higher then the well. Using the potential energy (V) as the objective function it is clear that VWell< VHill. His task is therefore to minimize|V − VWell|. Consequently, he walks in the direction of the steepest descent until he encoun- ters a valley. Unfortunately, the walker can see only the nearest surroundings due to some fog and therefore he

(8)

does not know if the valley is the local or the global minimum. The only way to ﬁnd out is to walk ‘uphill’

for some time and to further explore the whole landscape. In mathematical terms the ‘simulated annealing’

algorithm provides some probability of searching in the

‘uphill direction’ so that the search for a global minimum continues even if a local one is encountered.

Alternatively, the walker could have enlarged his step size so enormously that he leaves the valley in one step.

This process is equivalent to a “tunneling” through the walls of that valley (see Fig. 4).

Mathematically, both described strategies involve the sampling of distributions. First, the step size∆^{x(i) after}

i iterations (i≥ 1) is randomly chosen from a dis- placement distribution D(∆x(i)). The width of this distribution is dynamically decreasing so that smaller steps are preferred when one approaches the optimal so- lution. The sampling of D(∆x(i)) allows the inclusion of the ‘tunneling’ strategy. Next, the decision whether that iteration step was a good move toward the optimal solution has to be made. This is done by random sampling of a probability distribution P(i). If the difference of the ob- jective function∆ÔF(i)=ÔF^{x(i) +}∆^x(i)^{− OF}^x(i)îs

negative, then the new set of treatment parameters is accepted. However, in contrast to the gradient methods, the new position is also accepted with a probability P(i) if the difference is positive. The probability distribution P(i) for all displacement distributions is given as

P(i)=^exp⁻∆^OF(i)

kBT(i)

. (6)

With the temperature T(i) the width of P(i) is dynami- cally adjusted to smaller values during the optimization process. How this ‘cooling’ of the ‘up-hill’ climbing is done best depends also on the complexity of the ob- jective function. Different combinations of P(i) and D(∆x(i)) deﬁne various types of simulated annealing algorithms. The most prominent ones are discussed below.

Fig. 4. Simulated annealing: To avoid the local minimum the algorithm can either tunnel through (black arrow) or climb (gray arrow) the barrier

The concept of using the temperature to describe the simulated annealing process is taken from the area of solid physics. If, for example, a metal is heated until a phase transition occurs from solid to liquid then the molecules are moving randomly. If now the system is slowly cooled down a regular crystal structure is formed [36]. This procedure ensures that the internal energy of the solid is minimized.

Boltzmann Annealing

For the classic simulated annealing process the start- ing temperature T(0) is very large which leads to a higher probability to accept uphill steps. During the iteration process, the temperature is reduced and T(i) must satisfy the condition T(i)≥ T(0)_log(i)¹ [37]. For the Boltzmann annealing the step size∆x(i) is sampled from a Gaussian distribution:

DBoltzmann

∆^x(i)=²π^T(i)^−N^|²

× exp

−∆^x(i)²

kBT(i)

. (7)

Fast Simulated Annealing

Another method is the “fast simulated annealing”

algorithm which has been frequently used for the optimization of IMRT treatment plans [38, 39]. For “fast simulated annealing” the temperature change is given as T(i)=^T(0)|i and the displacement vectors are sampled from a Cauchy distribution [40]:

DFast

∆^x(i)= ^T(i)

∆^x(i)²^{+ T(i)}²^(N+1)^|² ⁽⁸⁾

Compared to the classical “Boltzmann annealing” the temperature is changed faster for the “fast simulated annealing” and the displacement vectors are chosen from a wider distribution. Fast simulated annealing is used for example in the Corvus treatment planning system [38].

Genetic Algorithms Genetic algorithms (see [64, 65]) have also been used recently for the optimization of IMRT plans based on highly complex objective functions [41, 42]. They emulate basic principles found in evolu- tionary biological systems to identify the ‘most likely survivor’ in a given pool of potential solutions. This is done by applying genetic principles such as inher- itance, mutation, recombination and natural selection.

Genetic algorithms are normally used if the search space is complex and large and traditional hill climbing algorithms like simulated annealing are likely to encounter difﬁculties.

In the following we will use a very simple example to illustrate a few of the basic features of genetic algorithms.

Let us consider the problem of ﬁnding the minimum of the one-dimensional objective function OF(x)=^x²^with

the help of a genetic algorithm, which can for example be divided into the following processes: Encoding, Evaluation, Crossover, Mutation and Decoding.

(9)

Encoding and Decoding

First, the original optimization problem has to be formulated in such a way that the processes responsible for the genetic evolution can be mathematically simulated. One of the most critical tasks is to ﬁnd a good representation of the solution space, a step which is often referred to as ‘Encoding’. An inappropriate encoding could lead to a very slow, practically useless optimization algorithm. For our example, one possible solution is to use a binary representation of the val- ues of OF(x). Due to a priori knowledge we can for instance limit the solution space to a range of 0 − 15 and therefore we can use a four-bit representation. Different candidates for a solution, e.g., A(x=^{2) and B(x}=¹³⁾

are then encoded as A={0010} and B={1101}. These strings can be compared to genetic strings where the bits (0|1) are the genes. Decoding simply denotes the process of reversing the solutions back into the original representation.

Next, one has to create a “ﬁrst generation pool” of solutions. These solutions are created either with the help of prior knowledge or by assuming a random distribution for a likely range of parameters. In case of IMRT, a single solution is described by the weights of the beamlets used for the optimization or by MLC shapes.

Evaluation

The next step is to evaluate the current generation pool and rank the solutions according to their fitness with the help of the objective function. In our example solution A has a higher rank than solution B since we are searching for the global minimum of OF(x). This ranking determines a normalized probability distribution which specifies the likelihood that a solution will be part of the next generation pool of solutions. At this stage of the optimization process one also decides whether the best solution of the considered pool is of sufficient quality such that the optimization can be stopped. New solutions not available in the current genetic pool can be created by processes like ‘crossover’ and ‘mutation’.

Crossover

By using the probability distribution calculated from the evaluation of the gene pool, two arbitrary solutions are selected and a ‘crossover’ process is performed at an arbitrary position within the ‘chromosome’ of these solutions. For our example a crossover of solution A and B at position 2 would lead to two new solutions. C={0001}

(corresponding to x=^{1) and D}={1110} (corresponding to x=14) which are then added to the second generation pool. This process is repeated until a sufﬁcient number of solutions is available in the second generation pool.

Since the crossover process is done preferably between two high rank solutions, the application of crossover processes alone bears the risk of being trapped within a local minimum. This can be avoided by additionally using the ‘mutation process’ as a second method of creating new solutions in the next genetic pool.

Mutation

The mutation process offers an analogue to the ‘hill climbing’ strategy applied in the ‘simulated annealing’

algorithm. The mutation procedure is done by changing one bit at a random position within a considered solution. For example a mutation of C could lead to_{1001_} (corresponding to x=9) if the first bit is changed. After the application of a well defined number of mutations and cross-over processes the newly created pool of solutions is evaluated again. When the desired convergence of the algorithm is reached the final solution is decoded back into the space of treatment parameters.

Summary

The advantage of deterministic approaches like the steepest descent, Quasi Newton or the conjugated gradient approach in contrast to the stochastic approaches is the optimization speed. For a highly complex IMRT case with over 1,000 degrees of freedom, a typical optimization time for one solution takes about 1 min on a standard PC, whereas the times for stochastic optimization algorithms are signiﬁcantly higher. If complex non convex objective functions with extreme local minima are employed, there is no alternative than to use stochastic algorithms like simulated annealing or genetic algorithms.

There are other optimization algorithms applied for IMRT optimization that were not considered in this section. For example, one other promising optimization method is linear programming. Especially for multi criteria optimization problems this technique seems to be advantageous [31]. A comprehensive review of optimization techniques applied in radiotherapy can be found in [43].

4.3 Optimization of Other Degrees of Freedom

The previous sections dealt mainly with the “standard approach” of IMRT, i.e., with the optimization of intensity maps for predefined beam configurations. This means that the planner specifies the general irradiation conditions (e.g., to use photons at 6 MV with five given beam directions using this specific hardware etc.), and all that is left to the planning software is to com- pute “optimal” intensity maps for every beam. However, there are a lot of other degrees of freedom that could in principle be included into the optimization process in order to improve the treatment plan. Generally speak- ing, every parameter that can modify the energy fluence in the patient is a potential candidate for optimization.

In addition to the optimization of intensity maps, the following degrees of freedom could be exploited: the number of beams (or arcs) and their directions of inci- dence, usually given by gantry and couch angles; the