Physical Optimization
Uwe Oelfke, Simeon Nill, Jan J. Wilkens
4
Contents
4.1 Introduction . . . . 31
4.1.1 What Is an ‘Optimal’ Treatment Plan? . . . . . 32
4.1.2 Physical Parameters and Indicators . . . . 32
4.1.3 Clinical Compromises and Steerability . . . . . 33
4.2 The “Standard Model” of IMRT Optimization . . . . . 33
4.2.1 What is Optimized? . . . . 34
4.2.2 Dose Constraints and Quality Indicators . . . 34
Dose and DVH Constraints . . . . 34
Dose Volume Constraints . . . . 34
Quality Indicators for Physical Constraints . . 34
Objective Functions . . . . 35
Technical Criteria – Plan Degeneracy . . . . . 35
4.2.3 Optimization Algorithms . . . . 36
Deterministic Approaches . . . . 36
Stochastic Methods . . . . 37
Summary . . . . 39
4.3 Optimization of Other Degrees of Freedom . . . . 39
4.3.1 Optimization of Beam Directions . . . . 40
Techniques for the Optimization of Beam Directions . . . . 41
4.3.2 Aperture-based Optimization . . . . 41
Contour-based Optimization . . . . 42
Direct Aperture Optimization . . . . 42
References . . . . 43
4.1 Introduction
One of the two prerequisites for the clinical application of IMRT was the development of inverse planning strate- gies – simply because the available forward planning strategies could not be applied to the optimization of the enormous number of treatment parameters suddenly required for the efficient delivery of intensity modu- lated treatment fields. The concept of ‘physical optimiza- tion’ was the first strategy implemented in commercial inverse planning systems and still currently is the ‘work- ing horse’ of most available planning platforms. Even the modifications of the original concept often referred
to as ‘biological optimization’, described below, basi- cally keep the same logical structure of the optimization while only the mathematical formulation of the ob- jectives of the optimization is modified. One common factor of both approaches is the selection of the energy fluence profiles for a pre-selected set of beam ports as the only treatment parameters to be optimized in the planning process. We consider the respective physical optimization as the current ‘standard model’ of IMRT optimization and review some of its detailed features later. The more recent extensions of this concept that attempt to include further physical degrees of freedom in the optimization process are described. The reader should be aware that the following brief discussion can- not aim to reflect all aspects of the physical optimization approach for IMRT. A by far more complete review about inverse planning and IMRT optimization and their de- tails can be found in the papers of T. Bortfeld [1, 2] and S. Webb [3, 4].
Before we start to review the methods mentioned above, we want to discuss briefly a few of the uncer- tainties or ‘weak’ conceptual points of current treatment plan optimization strategies, which mostly can be traced back to our quite limited knowledge of the correlation between delivered dose patterns and their induced clin- ical impact. First, there arises the obvious difficulty of what should be considered the optimal clinical treat- ment plan. IMRT allows for an enormous variety of achievable dose patterns whose merits have to be de- rived from mostly a few more or less clearly defined clinical endpoints. This fact requires a reduction of our complete 3D dose distributions to a few ‘global’ indica- tors which are assumed to represent the quality of its related treatment plans. This aspect will shortly be ad- dressed. Finally, the lack of detailed knowledge of the correlation between dose and clinical response allows and demands for a certain ‘subjectivity’ about what is the best clinical compromise achievable with IMRT, i.e., each treatment plan optimization has to introduce a certain
‘steerability’ for the physician to achieve his preferred plan. New strategies related to new approaches like multi-criteria optimization [5–7] are briefly mentioned later.
4.1.1 What Is an ‘Optimal’ Treatment Plan?
The task of IMRT optimization, or more precisely the task to generate an optimal clinical treatment plan for an IMRT treatment of a specific patient, is a conceptu- ally and to some extent also mathematically challenging problem. Ideally, the goal of the optimization effort should be uniquely defined and be quite clear before the development of respective optimization strategies.
Unfortunately, the definition of clinical optimal is by no means clear for the detailed level of dose painting achievable with IMRT. As soon as one goes beyond the paradigm of 3D-conformal therapy – maximum dose concentration within the tumor combined with mini- mal doses in organs at risk – one has to face far more difficult decisions about the optimal clinical plan. Subtle decisions on clinical compromises between tumor dose distributions and doses in several partially irradiated organs at risk require additional knowledge about the importance of dose homogeneity in the target, dose vol- ume effects in organs at risk, radio-sensitivity of patient specific tissues and many more so called ‘biological’ pa- rameters. The current lack of a solid ‘a priori’ knowledge of respective therapy relevant parameters is one of the intrinsic problems of IMRT optimization.
One key issue is the correlation of current clinical experience with a set of physical treatment parameters responsible for the observed clinical effect. Considering that usually the number of clinically relevant indicators for the assessment of success, failure or specific risks of a treatment is fairly small compared to the number of employed treatment parameters, this process natu- rally involves quite a reduction of the available physical information. For instance sets of complete 3D-dose dis- tributions have to be reduced to a few, mostly organ averaged, quantities which serve as indicators for the quality of an achieved treatment plan. One of the ad- vantages of ‘physical optimization’ concepts is that they are based on quality indicators derived completely from verifiable physical quantities like dose levels and irra- diated volumes while the ‘biological approach’ assumes additional knowledge – mostly in terms of phenomeno- logical parameters – to characterize an observed clinical effect. Both strategies are successfully implemented in clinical practice. While the ‘biological optimization’ ap- proach can be considered as an extension to the purely
‘physical concept’ both of them naturally suffer from the fact that our detailed knowledge about the response of complexly organized tissues to radiation fields with complicated variations in space and time is quite limited.
4.1.2 Physical Parameters and Indicators
Let us now consider the basic physical parameters–
quantities which can be determined by a well-defined
measurement without any additional assumptions – and discuss their role for the optimization process. The primary physical quantity available for characterizing a radiation treatment is the three-dimensional dose dis- tribution for a patient anatomy specified by its electron densities obtained from a CT scan. This simple physi- cal representation of the anatomy is further reduced by specifying volumes of interest (VOI), whose definitions especially for tumor targets is of crucial importance for the optimization. Current routinely employed di- agnostic procedures usually cannot specify any further spatial discrimination of the VOIs with respect to their functionality or radio-sensitivity.
This lack of information seemed to justify the well- accepted reduction of complete 3D-dose distributions within VOIs to the respective dose volume histograms (DVHs), which form the basis of all currently employed optimization strategies. The loss of intrinsic spatial dose information induces a certain amount of ‘blindness’ to the optimization procedure, which might well be clini- cally relevant. As example one can just think of analyzing a correlation between minimal dose values in a tar- get and related tumor control. Not knowing whether a peripheral region of the target VOI, usually carry- ing a higher probability of not being tumor tissue, or an actual central tumor voxel receives that minimal dose almost prevents any meaningful analysis of this type. Furthermore, our current fixation on optimiza- tion procedures based on global DVHs may be seriously questioned in the future as the increased application of biological imaging modalities in radiation oncology [8]
may reveal the importance of a spatially correlated ‘fine structure’ within the traditional VOIs.
Global organ DVHs are the essential quantities from which all quality indicators of treatment plans are de- rived. They therefore form the basis of current IMRT optimization approaches employed for inverse plan- ning. The definition of quality indicators constitutes a further reduction of the information about the ini- tially available dose distributions. Physical indicators are simply determined in terms of doses and irradiated volumes, e.g., minimal and maximal dose, global dose averages like mean or median dose or selected coordi- nate pairs of dose and irradiated volume to restrict the shape of a specific DVH. The definition of ‘biological’ in- dicators additionally requires further parameters, e.g., the assumed radiosensitivity of a tissue, which is nec- essary for the definition of the equivalent uniform dose (EUD) [9]. Common to both sets of indicators is their
‘global’ definition for an entire VOI, i.e., they do not al- low the control of any local dose features within a certain region of interest.
Finally, we just would like to note that one specific physical parameter so far has not been explicitly con- sidered in IMRT optimization strategies: time. Neither global dose rate effects nor the influence of different fractionation schemes seemed to be a first order effect
relevant for the optimization of IMRT. In comparison to effects of spatial dose inhomogeneities, specifically addressed in almost every optimization strategy, tem- poral dose inhomogeneities are always assumed either to be non-existent or of minor relevance. Now that time adapted radiotherapy based on new linac-integrated imaging technologies might actually provide us with the combined spatial and temporal dose information, the is- sue of time dependent effects may need to be addressed explicitly in the future.
4.1.3 Clinical Compromises and Steerability
After a set of global quality indicators for various tis- sues is identified, there is still the open question how these should be combined for the evaluation of a treat- ment plan. Moreover, the planner or physician usually needs a ‘steering wheel’ for the optimization process, which allows him to put more emphasis on one or sev- eral quality indicators for selected tissues of interest.
This practical procedure of finding the best clinical com- promise is one of the most difficult planning tasks and currently available planning systems can address this problem only insufficiently. There are basically three intrinsic problems of the current planning approach.
First, as discussed earlier, current optimization pro- cedures rely on global dose indicators, which do not allow the control of any local dose feature within a vol- ume of interest, i.e., currently the planner will be forced to introduce ‘artificial’ VOIs in order to manipulate this specific feature of the plan. Second, and more profound, the planning system does not provide any information about the physical limits which can be reached for a se- lected set of treatment parameters. Third, and probably of most practical consequence, is the fact that the plan- ning system does not have ‘prior’ information about
‘efficient compromises’ between conflicting goals of the optimization. For instance, information like “how much more would the average dose in organ ‘y’ increase if the dose homogeneity in the target is improved by 5%” can only be obtained by a lengthy manual trial and error planning process.
At least the last two problems mentioned above could be adequately addressed by new planning strate- gies based on multi-criteria optimization methods.
A key feature of this approach is that the optimiza- tion process will be separated in two steps. First, new mathematical strategies will be employed to cre- ate an entire set of clinically relevant treatment plans for which the ‘efficient compromises’ between different conflicting objectives are well known. Second, a new navigation tool will allow an easy and efficient search for the optimal treatment plan preferred by the in- dividual planner. More details and first applications about these new planning strategies can be found in [5–7, 10].
4.2 The “Standard Model” of IMRT Optimization
While the potential benefit of intensity modulated treat- ment fields was first demonstrated by mathematical inversion of the relationship between energy fluence and its resulting dose patterns for idealized geometries [11], current practical approaches are all based on an itera- tive optimization scheme as schematically displayed in Fig. 1. The starting point of the optimization is a selected set of variable treatment parameters x whose values have to be adjusted to their optimal setting xopt. First, the 3D-dose distribution D is calculated for the starting, non-optimal values of x. Next, this complete 3D-dose pattern is reduced to a single number via the objective function OF(x). The value of OF(x) represents the qual- ity of the current plan and therefore allows a ranking of different plans, i.e., the optimization of the treatment plan corresponds in mathematical terms to a search for the minimal (in most cases preferred) or maximal value of OF. This is achieved with the help of the optimization algorithm, which calculates
an up-dated set of x-values, labeled as x in Fig. 1, for the next iteration of the optimization process. Often, the convergence of this ‘optimization loop’ is stopped when a certain threshold value for the relative change of OF(x) between two subsequent iterations is not ex- ceeded. The parameter set x found at that time is considered the result of the optimization process and will be used for the final assessment of the plan qual- ity.
In the following, we will describe in detail the main components of the optimization loop employed for the standard IMRT optimization that uses only physical quantities for the specification of the plan quality.
Fig. 1. The optimization loop for iterative IMRT optimization.
From a set of initial treatment parameters x (usually the intensity amplitudes) a 3D-dose distribution is derived. Then, the respective treatment plan is evaluated and ranked by the objective function, which is based on current clinical experience. If the quality of the treatment plan is considered to be sufficient the current value x is chosen as the optimum xopt. If no convergence towards the opti- mal value is detected new fluence amplitudes xare suggested by the optimization algorithm and a new iteration of the optimization is initiated
4.2.1 What is Optimized?
In the standard approach of IMRT plan optimization [1, 12, 13] the 2D-fluence profiles for all beam ports are chosen as the only free treatment parameters. Obvious other choices, like the number of beams, the beam entry angles, the beam energy or even the radiation type are preselected by the planner and remain fixed during the optimization process. The subdivision of each treatment field into small independent spatial sub-units, labeled by numbers from 1 to Nband usually referred to as ‘bixels’, allows one to define a set of related fluence amplitudes xi, i=1. . .Nbas optimization parameters. The number of these parameters naturally depends on the size of the lesion being irradiated and the choice of the spatial res- olution of the fluence maps. In most cases the spatial extension of the bixel is chosen to correlate closely with the leaf width of the multi-leaf collimator (MLC) to be used for the actual treatment, e.g., for a standard MLC with 10 mm leaf width (projected at the plane of the isocenter) a bixel size of 10 mm×10 mm is employed.
The use of finer bixel resolutions seems to be advan- tageous for the irradiation of complex shaped smaller lesions with micro- or mini-MLCs [14, 62]. The respec- tive gain in spatial precision results in an increased number of treatment parameters and most likely also in extended overall treatment times. It has been shown that bixel resolutions in the range of 2–5 mm square can be beneficial for selected clinical cases, and that a fur- ther increased bixel resolution will however, fail to yield even more improved results [63].
The choice of fluence amplitudes as the only op- timization parameters facilitates the solution of the optimization problem considerably if a simple math- ematical formulation of the optimization task in terms of physical parameters is chosen. One obvious reason is the linear relationship between fluence and dose. How this optimization problem is set up and which meth- ods can be used for its solution will be described in the following paragraphs of this section. Approaches, which extend the standard optimization to other treat- ment parameters than fluence amplitudes are discussed separately below.
The concept described so far aims to optimize the so called ‘ideal’ intensity maps, i.e., the optimization pro- cess is based on the chosen ‘bixel’ configuration without considering the next practical step for the delivery of IMRT treatment fields: the conversion of the ideal in- tensity map into patterns of leaf sequences. Naturally, this step involves a re-grouping of the elementary bix- els into larger, practically deliverable treatment fields, which can only approximate the ideal intensity profiles and therefore usually leads to a reduced quality of the treatment plan. The size of these effects was for instance demonstrated by [15] and there are several approaches to include these effects into the original optimization
scheme [15–17]. Another attempt to overcome the ‘se- quencer problem’ in IMRT optimization is the method of direct aperture optimization also discussed below.
4.2.2 Dose Constraints and Quality Indicators
Dose and DVH Constraints
The specification of dose constraints for the tissues of interest is the first step for the mathematical formula- tion of the optimization problem. Ideally, the constraints should be derived directly from clinical experience, i.e., they should rely on a direct correlation between clinical observation and characteristic dose values.
For targets, usually global thresholds for the tolera- ble minimal and maximal doses (Dminand Dmax) are set for the entire volumes, i.e., for a tumor VOI whose voxels are labeled with an index i ranging from 1. . .NT, the respective doses Dishould all satisfy the constraint:
Di> Dminand Di< Dmax. The doses Dminand Dmaxare usually chosen close to the prescribed dose Dpres and the respective dose window allows for some flexibility if conflicting goals in organs at risk have to be satis- fied simultaneously. The values of Dminand Dmaxin this approach represent the ‘clinical’ experience, i.e., they characterize the biological dose response although they are purely physical quantities. If both tolerance dose val- ues are chosen to be close to each other and if both dose constraints are enforced with high priority, the use of Dminand Dmaxshould also lead to sufficient dose ho- mogeneity. However, the feature of dose homogeneity in the target can also be addressed explicitly by imposing a constraint on the dose variance for target structures.
Dose Volume Constraints
For the remaining k=1. . .Nk organs at risk (OARs), composed of Nik voxels, the natural expansion of the dose constraints described above is to employ a max- imal tolerance dose Dkmax for all voxels. For OARs, however, one wants to include not only global con- straints for the entire organ. Observed or predicted dose volume effects for certain OARs can be accounted for by imposing one or several maximal dose limits Dk,lmax only if the dose exceeds this tolerance dose for a fraction vlof the organ’s volume [18, 20]. These dose- volume histogram constraints, i.e., DVH(Dk,lmax) < vl, can be visualized directly as shown in Fig. 2 where the two points (Dk,lmax; vl) l=1, 2 in the dose-volume plane should all be above the actual DVH-curve. The respec- tive points identify ‘forbidden’ zones in the DVHs as indicated.
Quality Indicators for Physical Constraints
Next, the previously defined clinical dose constraints are employed to define a mathematical measure which in-
Fig. 2. Dose volume histogram constraints. Two typical DVH con- straints are indicated by the two data points (Dk,1max, v1) and (Dk,2max, v2), i.e., no volume greater than v1(v2) should be irradi- ated with a dose higher than Dk,1max(Dk,2max). As a consequence the DVH constraints mark the indicated regions in the volume-dose plane as ‘forbidden zones’, i.e., if a DVH crosses these areas the DVH constraint is violated
dicates its quality for a given treatment plan. Usually, this is done by assigning a numerical value to a specific violation of the given constraint. These quality indica- tors refer to one individual constraint and tissue, e.g., the standard measure is the sum of the quadratic dose deviations found for all voxels of the considered organ.
In mathematical terms, the related function OFT(−)for the avoidance of an under-dosage of the target takes the form
OF(−)T (x)= 1
NT NT
i=1
C+
DTmin− DTi(x)2
. (1)
The analogue term for the avoidance of global over- dosage effects for either target or OARs reads
OF(+)k (x)= 1
Nk Nk
i=1
C+
Dki(x) − Dkmax2
. (2)
The operator C+(x) defined by C+(x)=x for x≥ 0 and C+(x) = 0 for x < 0 ensures that only constraint vio- lations contribute to the quality indicators OF(+) and OF(−). These most prominent quality indicators of ob- jective functions – the measure of square deviations form given dose constraints – introduces new parame- ters into the optimization problem, which are not based on clinical experience but which instead are required for the mathematical formulation of the optimization prob- lem. The ‘square’ terms were mostly chosen because the resulting full objective functions are still mathemati- cally convex [4], which allows the employment of very fast gradient techniques for the solution of the opti- mization problem (for more details see section later).
The definition of quality indicators for the more general DVH constraints is done similarly and can for instance be found in [19]. Unfortunately, DVH constraints lead to non-convex objective functions [21, 22] However, it was shown [23] that the resulting local minima in the overall objective function are of minor practical relevance.
Objective Functions
For the final mathematical formulation of the optimiza- tion problem, the individual quality indicators which each represent the merit of a particular plan for one clinical constraint have to be combined to yield a sin- gle valued quality measure of the complete treatment plan. Naturally, the given indicators for target struc- tures OFT and organs at risk OFk refer to mutually conflicting goals of the optimization, i.e., the combi- nation of these individual constraints is crucial for the clinical compromise achievable with that particu- lar optimization scheme. Furthermore, and maybe even more important, the design of the overall objective function has to introduce ‘steering parameters’ such that the planner can efficiently derive the clinically ac- ceptable plan of his choice. Simplicity of the objective function, combined with the request that the overall objective function remains convex, leads to the well known weighted sum of individual quality indicators, i.e.,
OF(x)=w(+)T OFT(+)(x) + w(−)T OFT(−)(x)
+
k
wkOFk(+)(x) . (3)
With the introduced weighting factors w for each con- straint the planner can now ‘steer’ the result of the optimization towards the optimal treatment plan of his preference. As already mentioned, the parameters w unfortunately do not have any intuitive meaning and it is unknown how sensitive the outcome of an op- timization is coupled to variations of the respective weighting factors. Both features make the manual deter- mination of these parameters a cumbersome planning task, which is one of the reasons why there is increasing interest in the already mentioned approaches of multi- criteria optimization (see above). An extension of this introduction of global penalty factors with the aim to improve local dose features of a plan was introduced in [24].
Technical Criteria – Plan Degeneracy
As already mentioned, the standard model of inverse planning for IMRT optimizes an ideal fluence profile, independent of the sub-sequent translation into MLC- sequences required by the dose delivery system. Features of this important practical step directly prior to the dose delivery can be additionally optimized without signif- icantly reducing the plan quality. This is due to the fact that the solution of the IMRT optimization prob- lem is not unique and that numerous sets of fluence amplitudes yield treatment plans of comparable qual- ity. This degeneracy of the optimal fluence profiles, e.g., discussed in [4, 25, 26], allows one to account for further technical constraints that facilitate the practical dose delivery.
One of these additional technical criteria is the
‘smoothness’ of the intensity maps. Intensity maps that avoid patterns of clinically unmotivated high fluence gradients can be delivered faster and safer. The smooth- ing of these discontinuities can be either achieved by applying a median filter (cf. [27]) to the fluence pro- files or by adding a respective term to the objective function [28].
Another important technical aspect is the conver- sion of the derived continuous fluence modulations into a spectrum of discrete fluence values, e.g., this number of intensity levels determines the number of subfields being used for the step-and shoot technique. It has been shown that for most cases a moderate number (5–7) of intensity levels seems to be adequate [29] which leads to a number of IMRT segments in the order of 100 which can be efficiently delivered with current IMRT technology.
4.2.3 Optimization Algorithms
To calculate the beam weights for a given set of constraints and a selected objective function, an op- timization algorithm is required. Not all optimization algorithms can be used for all objective functions due to the mathematical properties of the objective func- tions. Over the last decades numerous optimization approaches for different problems haven been pub- lished and applied to problems within the field of radiation therapy [3, 30]. In general, these algorithms can be divided into two categories. First, there are the deterministic algorithms like the gradient approach.
These techniques are applied to optimization prob- lems where the objective functions are convex and therefore only a global minimum and no local min- ima exist [1]. For these convex objective functions like the standard quadratic objective function the determin- istic algorithms can calculate the optimal solution very fast and are therefore currently used in most commer- cially available IMRT treatment planning systems [3].
Second, there are the stochastic methods, like simu- lated annealing or genetic algorithms. They have the advantage that even for non-convex objective functions based on biological objectives or DVH-constraints the global minimum can be found even if local minima exist.
In this section we only briefly discuss the rationale of the most frequently used algorithms. First, as ex- amples for deterministic algorithms, simple gradient methods and the conjugate gradient approach are de- scribed. Second, the basic ideas of stochastic algorithms like simulated annealing and genetic algorithms are dis- cussed. More detailed information about optimization algorithms used in radiotherapy can for instance be found in [31].
Deterministic Approaches
Steepest Descent This method is mostly used for find- ing the global minimum of a convex objective function OF(x), where x represents the set of variable treatment parameters which have to be adjusted to their opti- mal value. The objective function can be visualized as a multi-dimensional surface given in terms of the coor- dinates x. For a general non-convex objective function a one-dimensional example is shown in Fig. 3. A key role in the optimization plays the first derivative of this function or its generalization for N-dimensions – the gradient of OF(x). The gradient OF(x) determines the steepest direction along the surface of the objective func- tion. Finding the minimum of OF(x) via an iterative method requires that the values of the intensities x are updated at each step of the iteration i. The update of x while advancing from the iteration i to i + 1 is for the gradient approaches given by the rule
x(i + 1)=x(i) −α· OF x(i)
. (4)
This iterative search can be visualized as a ball rolling downhill into the valley along the steepest direction (see Fig. 3) until the minimum of the valley is reached. The constant factorα, often referred to as the damping fac- tor, determines the step size of the iterative process. One problem with the steepest descent method is that many small steps in the calculated direction are performed even if the valley is of perfect quadratic form. The dif- ferent methods of gradient approaches mostly differ by the determination ofαand therefore the step size. For the steepest descent method the value ofαis set to a fixed value independent of the position and the iteration step.
Newton’s MethodThe Newton methods take into account the second order derivatives of the objective function for the determination of the damping factor, which controls the speed and success of the optimization. Employing a Taylor expansion of OF(x(i)) up to the second order derivatives one can show that a new damping factor for each iteration step is a promising alternative choice forα.
Fig. 3. The steepest descend algorithm starts from position x0and always walks with a predefined step size towards the global mini- mum. If the algorithm started on the left side, the algorithm would be trapped in the local minimum
For the multi-dimensional optimization problem en- countered in radiation therapy (several hundred fluence amplitudes have to be simultaneously optimized) the damping factor can be expressed in terms of the inverse Hessian H−1of the second derivatives of OF(x) [32], i.e.,
x(i + 1)=x(i) − H−1x(i) OF(x(i))
=x(i) −αNewtonOF x(i)
. (5)
One problem within the Newton approach is that for each step the complete inverse Hessian has to be calcu- lated, which is a complex and time-consuming process.
One possible solution of this problem is not to recal- culate the inverse Hessian at each iteration step but to use approximations for the inverse Hessian [30]. If these methods are applied, the optimization algorithms are called “Quasi Newton” approaches. For example the treatment planning system KonRad uses a “Quasi Newton” algorithm for the optimization of the fluence patterns [19]. There are other possible solutions and implementations for this problem, which can be found in [32]. The “steepest descent” method can be viewed as a special case of the “Quasi Newton” approach.
Conjugated Gradient Approach The problem with the
“steepest descent” or Newton methods is that the di- rections of the moves towards the unknown minimum in two successive iterations are not mutually orthogo- nal, i.e., the gain of approaching the minimum value of OF(x) achieved in one step of the iteration might get partially lost in the next step [32]. This is not the case for the “conjugated gradient” approach. Mathematically, there are two different methods to determine the global minimum of the objective function with that approach.
The first approach calculates the Hessian at each step of the iteration and it can be shown that this version of the “conjugated gradient” approach finds the global minimum after N iterations where N is the number of optimization parameters x [32]. However, since the Hes- sian cannot always be calculated in a reasonable amount of time, an alternative approach is used more often for applications in radiation therapy [33].
Starting from an arbitrary point x(0) the objective function is evaluated, at different positions along the line through the starting point x(0) in the direction of the encountered ‘steepest descent’ h(0)=−OFx(0)
. This is done until the position of the minimum of OF(x) along that line is found. At the position of the line mini- mum xminthe gradient g(1)=−OF(x)xminis calculated and used for the determination of the next direction in which the global minimum will be approached. The new direction for this ‘line minimization’ approach is given by various iterative rules, i.e., h(i + 1)=g(i + 1) +γ(i)h(i)
where the factor γ(i) can be calculated according to Fletcher–Reeves or after Polak and Ribiere [32].
A detailed mathematical description and instruc- tions on how the conjugated gradient approach is
effectively implemented can for instance be found in [32]. One of the main problems with conjugated gradi- ent algorithms can be the speed of the line minimization.
An example for a successful application of the conju- gated gradient method in inverse IMRT planning is the HELIOS (Varian) IMRT treatment planning system.
One potential concern with deterministic algorithms is that the iterative process may get trapped in a lo- cal minimum (see Fig. 3) such that the desired global minimum is never discovered. Local minima can be en- countered for example if DVH constraints are added to the objective function. Several investigations by differ- ent groups have come to somewhat different conclusions about this problem. Deasy [21] argues that the IMRT algorithm should include methods to deal with the problem of local minima, whereas others like Wu and Mohan [23] or Llacer et al. [22] have shown that even in the presence of local minima acceptable solutions for a given IMRT problem can be found. A study concern- ing the speed and the convergence properties of different gradient algorithms used for the optimization of IMRT can be found in [21].
Stochastic Methods
Stochastic optimization algorithms offer the advantage that they can find the optimal treatment parameters even for complex objective functions with potential local minima. The price to pay for this nice feature is a sig- nificantly increased optimization time in comparison to the discussed deterministic algorithms. Historically the method of simulated annealing was introduced as one of the first algorithms by S. Webb [34] in radiation therapy planning. More recently, even more complex optimiza- tion engines based on ‘genetic algorithms’ are employed for treatment plan optimization. In the following we in- troduce the basics of these concepts without referring to mathematical details which can be found in the given references.
Simulated Annealing There are basically two strategies as to how the method of ‘simulated annealing’ escapes from the trap of local minima – ‘climbing uphill’ and
‘tunneling’. Both methods were illustrated by Webb [35]
with a nice example.
Imagine a walker is instructed to find a well in a hilly landscape. The well is assumed to be at the lowest point of the landscape and therefore coincides with the ‘global minimum’. Since the walker has no a priori knowledge on where to go he starts his task by walking downhill since he is aware that the mountains are higher then the well. Using the potential energy (V) as the objective function it is clear that VWell< VHill. His task is there- fore to minimize|V − VWell|. Consequently, he walks in the direction of the steepest descent until he encoun- ters a valley. Unfortunately, the walker can see only the nearest surroundings due to some fog and therefore he
does not know if the valley is the local or the global minimum. The only way to find out is to walk ‘uphill’
for some time and to further explore the whole land- scape. In mathematical terms the ‘simulated annealing’
algorithm provides some probability of searching in the
‘uphill direction’ so that the search for a global minimum continues even if a local one is encountered.
Alternatively, the walker could have enlarged his step size so enormously that he leaves the valley in one step.
This process is equivalent to a “tunneling” through the walls of that valley (see Fig. 4).
Mathematically, both described strategies involve the sampling of distributions. First, the step size∆x(i) after
i iterations (i≥ 1) is randomly chosen from a dis- placement distribution D(∆x(i)). The width of this distribution is dynamically decreasing so that smaller steps are preferred when one approaches the optimal so- lution. The sampling of D(∆x(i)) allows the inclusion of the ‘tunneling’ strategy. Next, the decision whether that iteration step was a good move toward the optimal solu- tion has to be made. This is done by random sampling of a probability distribution P(i). If the difference of the ob- jective function∆OF(i)=OFx(i) +∆x(i)− OFx(i)is
negative, then the new set of treatment parameters is ac- cepted. However, in contrast to the gradient methods, the new position is also accepted with a probability P(i) if the difference is positive. The probability distribution P(i) for all displacement distributions is given as
P(i)=exp−∆OF(i)
kBT(i)
. (6)
With the temperature T(i) the width of P(i) is dynami- cally adjusted to smaller values during the optimization process. How this ‘cooling’ of the ‘up-hill’ climbing is done best depends also on the complexity of the ob- jective function. Different combinations of P(i) and D(∆x(i)) define various types of simulated annealing algorithms. The most prominent ones are discussed below.
Fig. 4. Simulated an- nealing: To avoid the local minimum the algorithm can either tunnel through (black arrow) or climb (gray arrow) the barrier
The concept of using the temperature to describe the simulated annealing process is taken from the area of solid physics. If, for example, a metal is heated un- til a phase transition occurs from solid to liquid then the molecules are moving randomly. If now the sys- tem is slowly cooled down a regular crystal structure is formed [36]. This procedure ensures that the internal energy of the solid is minimized.
Boltzmann Annealing
For the classic simulated annealing process the start- ing temperature T(0) is very large which leads to a higher probability to accept uphill steps. During the iteration process, the temperature is reduced and T(i) must satisfy the condition T(i)≥ T(0)log(i)1 [37]. For the Boltzmann annealing the step size∆x(i) is sampled from a Gaussian distribution:
DBoltzmann
∆x(i)=2πT(i)−N|2
× exp
−∆x(i)2
kBT(i)
. (7)
Fast Simulated Annealing
Another method is the “fast simulated annealing”
algorithm which has been frequently used for the op- timization of IMRT treatment plans [38, 39]. For “fast simulated annealing” the temperature change is given as T(i)=T(0)|i and the displacement vectors are sampled from a Cauchy distribution [40]:
DFast
∆x(i)= T(i)
∆x(i)2+ T(i)2(N+1)|2 (8)
Compared to the classical “Boltzmann annealing” the temperature is changed faster for the “fast simulated annealing” and the displacement vectors are chosen from a wider distribution. Fast simulated annealing is used for example in the Corvus treatment planning system [38].
Genetic Algorithms Genetic algorithms (see [64, 65]) have also been used recently for the optimization of IMRT plans based on highly complex objective functions [41, 42]. They emulate basic principles found in evolu- tionary biological systems to identify the ‘most likely survivor’ in a given pool of potential solutions. This is done by applying genetic principles such as inher- itance, mutation, recombination and natural selection.
Genetic algorithms are normally used if the search space is complex and large and traditional hill climbing algo- rithms like simulated annealing are likely to encounter difficulties.
In the following we will use a very simple example to illustrate a few of the basic features of genetic algorithms.
Let us consider the problem of finding the minimum of the one-dimensional objective function OF(x)=x2with
the help of a genetic algorithm, which can for exam- ple be divided into the following processes: Encoding, Evaluation, Crossover, Mutation and Decoding.
Encoding and Decoding
First, the original optimization problem has to be formulated in such a way that the processes responsible for the genetic evolution can be mathematically simu- lated. One of the most critical tasks is to find a good representation of the solution space, a step which is often referred to as ‘Encoding’. An inappropriate en- coding could lead to a very slow, practically useless optimization algorithm. For our example, one possible solution is to use a binary representation of the val- ues of OF(x). Due to a priori knowledge we can for instance limit the solution space to a range of 0 − 15 and therefore we can use a four-bit representation. Different candidates for a solution, e.g., A(x=2) and B(x=13)
are then encoded as A={0010} and B={1101}. These strings can be compared to genetic strings where the bits (0|1) are the genes. Decoding simply denotes the process of reversing the solutions back into the original representation.
Next, one has to create a “first generation pool” of solutions. These solutions are created either with the help of prior knowledge or by assuming a random dis- tribution for a likely range of parameters. In case of IMRT, a single solution is described by the weights of the beamlets used for the optimization or by MLC shapes.
Evaluation
The next step is to evaluate the current generation pool and rank the solutions according to their fitness with the help of the objective function. In our exam- ple solution A has a higher rank than solution B since we are searching for the global minimum of OF(x). This ranking determines a normalized probability distribu- tion which specifies the likelihood that a solution will be part of the next generation pool of solutions. At this stage of the optimization process one also decides whether the best solution of the considered pool is of sufficient quality such that the optimization can be stopped. New solutions not available in the current genetic pool can be created by processes like ‘crossover’ and ‘mutation’.
Crossover
By using the probability distribution calculated from the evaluation of the gene pool, two arbitrary solutions are selected and a ‘crossover’ process is performed at an arbitrary position within the ‘chromosome’ of these so- lutions. For our example a crossover of solution A and B at position 2 would lead to two new solutions. C={0001}
(corresponding to x=1) and D={1110} (corresponding to x=14) which are then added to the second generation pool. This process is repeated until a sufficient number of solutions is available in the second generation pool.
Since the crossover process is done preferably between two high rank solutions, the application of crossover processes alone bears the risk of being trapped within a local minimum. This can be avoided by additionally us- ing the ‘mutation process’ as a second method of creating new solutions in the next genetic pool.
Mutation
The mutation process offers an analogue to the ‘hill climbing’ strategy applied in the ‘simulated annealing’
algorithm. The mutation procedure is done by changing one bit at a random position within a considered solu- tion. For example a mutation of C could lead to{1001} (corresponding to x=9) if the first bit is changed. After the application of a well defined number of mutations and cross-over processes the newly created pool of solu- tions is evaluated again. When the desired convergence of the algorithm is reached the final solution is decoded back into the space of treatment parameters.
Summary
The advantage of deterministic approaches like the steepest descent, Quasi Newton or the conjugated gra- dient approach in contrast to the stochastic approaches is the optimization speed. For a highly complex IMRT case with over 1,000 degrees of freedom, a typical op- timization time for one solution takes about 1 min on a standard PC, whereas the times for stochastic opti- mization algorithms are significantly higher. If complex non convex objective functions with extreme local min- ima are employed, there is no alternative than to use stochastic algorithms like simulated annealing or ge- netic algorithms.
There are other optimization algorithms applied for IMRT optimization that were not considered in this sec- tion. For example, one other promising optimization method is linear programming. Especially for multi cri- teria optimization problems this technique seems to be advantageous [31]. A comprehensive review of op- timization techniques applied in radiotherapy can be found in [43].
4.3 Optimization of Other Degrees of Freedom
The previous sections dealt mainly with the “standard approach” of IMRT, i.e., with the optimization of in- tensity maps for predefined beam configurations. This means that the planner specifies the general irradia- tion conditions (e.g., to use photons at 6 MV with five given beam directions using this specific hardware etc.), and all that is left to the planning software is to com- pute “optimal” intensity maps for every beam. However, there are a lot of other degrees of freedom that could in principle be included into the optimization process in order to improve the treatment plan. Generally speak- ing, every parameter that can modify the energy fluence in the patient is a potential candidate for optimization.
In addition to the optimization of intensity maps, the following degrees of freedom could be exploited: the number of beams (or arcs) and their directions of inci- dence, usually given by gantry and couch angles; the