Analysis of possible implemen- tation strategies

(1)

Chapter 4 Analysis of possible implemen- tation strategies

4.1 Introduction

In the following paragraphs we will examine and compare algorithms on the hypotesis that we already know every Green’s function and overlap matrix needed by them. Also, taking into account M xM matrices, where in our case M is the number of modes we are considering, the cost of a multiplication is M², while the cost of an inversion is proportional to M³, which means we will define the speed of an algorithm based on the number of inversions it requires to find the solution.

We will assume the x axis along the direction between the two ends of the system and the y axis along the orthogonal transverse direction. Finally, P will be the number of slices affected by the tip, which will be supposed to be costant, since we are, for the moment, only interested in a comparison of speeds. For the same reason we will consider every slice to be of the same length.

4.2 Decreasing the size of the system

(2)

Before we start listing the possible algorithms, one last issue should be mentioned. In order to find the Green’s functions for the whole system, the recursive Green’s function method must be applied to all its slices, but since there is no need to do so in a specific order, it is possible to scan the system from left to right and vice versa, saving the resulting Green’s functions at every step, in order to compose them later with the Green’s function of the part of the system affected by the probe (one on the right and one on the left, unless the probe covers the very beginning or end of the system, in which case only one of them needs to be connected). A graphical explanation is presented in Fig. 4.1.

For the evaluation of the computational cost we will always consider the most common case, i.e. the one involving the connection with a region on the left and one on the right of the probe.

4.3 Simple Algorithm (SA)

The first algorithm to be analyzed is the simple one, since it is going to be used as a reference to measure the speed of the others. It is possible to apply this algorithm with any kind of probe potential profile.

The Simple Algorithm simply calculates everything everytime: for every position assumed by the probe, or rather for every single mesh point meant to be scanned, it starts a cycle that connects the first slice affected by the

(3)

000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000

111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000

111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111

Area affected by the probe

Left part, unaffected Right part, unaffected

+ +

N,1

G_q,p G_N,q+1

G G_p−1,1

Fig. 4.1 Obtaining the Green function of the system from the composition of the Green’s function computed during the initial scan and relative to the blocks on the right and to those on the left, togheter with the Green’s function of the part of the system affected by the probe.

probe with the next one, connecting the result to the next slice, again and again, until every slice has been taken care of.

Unlike in the case of the algorithms of the following paragraphs, with the Simple Algorithm it is possible to apply a slightly different strategy for the calculation of the Green’s functions for the whole system. We start by considering the Green’s function for the part of the system at the left of the

(4)

zone affected by the probe, if any, than we start using the Simple Algorithm until we connect the slice affected by the probe and all that is left is to add the Green’s function for the part of the system at the left of the zone of the probe, if any.

At a first look there is no difference at all, but actually it saves some calculations and, potentially, some space in memory, since when we know that we will not need to connect something to the right, for our calculations we only need the Green’s functions from the beginning back to the end, and from the end back to itself.

Let us write again the formula we use to connect two slices togheter, equation (3.5):

Gd,a= G⁰_d,cVc,bI − G⁰b,bVb,cG⁰_c,cVc,b

−1

G⁰_b,a

We can see that if we connect slices on the left, so that the block of the slices already connected together is always the one on the right, ranging from a to b, then the only Green’s function we need are G⁰_b,a and G⁰_b,b. This is still true if we consider (3.15) (or rather if we consider the Green’s functions needed to calculate Gd,d).

There is no need to evaluate Ga,a since we never use G⁰_a,a during our calculations.

(5)

Instead, if we want to connect slices to the right of the block of slices we have already connected togheter, the Green’s function in equations (3.5) and (3.15) which represents the blok are those with indexes c and d. Equation (3.5) requires G⁰_d,c and G⁰_c,c, and equation (3.15) force us to know G⁰_d,d too.

In other words, in order to be able to iterate the procedure of connecting a slice to the right of our system we need to calculate three Green’s functions instead of two, as in the previous case.

Since in this case all the calculations we can skip are multiplications, this approach does not affect the algorithmical cost for the Simple Algorithm.

Altough the speed of the program will not be substantially affected, it is still something to consider because applying this strategy can provide benefits when we consider a more complex version of the program. There will be an example of this later in this chapter.

Back to the cost evaluation, with a total of P slices affected we need to repeat the composition procedure P − 1 times in order to find the Green’s function we want. Than, of course, we need to compose the Green’s functions for the left and the right part of the system, those unaffected by the probe.

This brings the count to a total number of inversions per step at:

Cost (ST ) = P + 1 (4.1)

(6)

4.4 Block Algorithm (BA)

Let us suppose that we have calculated the Green’s function for the probe in a certain point. Than we move the probe one step to the left and check the area covered by the perturbation caused by the probe. If we compare the two cases, we see that only the leftmost and rightmost part of the area affected by the probe differ in the two cases. If only we could take the system of the first case, delete the effect of the rightmost part of the probe (the part that affects a region of the system which is not affected in the second case), and add the effect of the probe on the leftmost part of the region affected by it, we would have moved from case one to case two.

This is exactly what the Block Algorithm does. The first time it uses the Simple Algorithm to calculate the Green’s functions for the part of the system affected by the probe, than it removes one slice on the right and includes one on the left.

It is important to note that this algorithm works only for hard wall sys- tems (no condition on the shape of the potential of the probe, though). Refer to Fig. 4.2 for a graphical example of the idea behind the Block Algorithm.

Disregarding the cost for the application of the Simple Algorithm, since it is paid only once per every iteration over the y axis, at every iteration over the xaxis we need to do a slice composition and a slice removal. The composition

(7)

0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000 0000000000000000000000

1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 1111111111111111111111 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000

111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111

000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000 000000000000000000000

111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111 111111111111111111111

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 00

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 11

Area affected by the probe

Step h+1 Step h

Add one slice Unchanged Remove one slice

Fig. 4.2 Graphical explaination of the Block Algorithm.

cost is well known to be one inversion, but we need to do some math to find the removal cost.

As we did back in chapter 2, we examine two slices and their combination. The slice on the left goes from d to c, the one on the right goes from b to a, but this time we know the Green’s functions for the right slice and the global system, and we need to find the Green’s functions for the left slice.

First of all we remember the final formula found in chapter 3 (equations (3.8) and (3.15) respectively):

Gd,a= α∆G⁰_b,a (4.2)

(8)

Gd,d = G⁰_d,d+ α∆G⁰_b,bα^T (4.3)

Recalling that, by definition, α = G⁰_d,cVc,b, (4.2) means that, if we know

∆, with an inversion we also know α, and thus G⁰_d,c. And by means of (4.3) knowing α also means we know G⁰_d,d without the use of any additional inversion.

It is also important to remember that ∆ is a function only of Vb,c, its transpose, G⁰_b,b and G⁰_c,c, and computing it costs one inversion. Thus now we just need a way to find G⁰_c,c and with just two more inversions we can reach our goal.

We can start by trying to use the only data we still have not used, Ga,a

and G⁰_a,a. By applying Dyson’s equation:











Ga,a= G⁰_a,a+ G⁰_a,bVb,cGc,a

Gc,a = G⁰_c,cVc,bGb,a

Gb,a = G⁰_b,a+ G⁰_b,bVb,cGc,a

The second and third equations can be combined to obtain Gc,a as a function of G⁰_c,c and known matrices only:

Gc,a =I − G⁰c,cVc,bG⁰_b,bVb,c

−1

G⁰_c,cVc,bG⁰_b,a

By combining this last equation with the first one from the system, after some manipulation, we obtain:

G⁰_a,bVb,c

−1G_a,a− G⁰a,a V_c,bG⁰_b,a−1

=I − G⁰c,cVc,bG⁰_b,bVb,c

−1

G⁰_c,c

(9)

It must be noted that G⁰_a,bVb,cis the transpose of Vc,bG⁰_b,a, and since inversion and transposition can be applied in any order we want, we just need to calculate one inversion, and by a simple (and costless) transposition we obtain the left operand of the last equation.

With just a bit more manipulation we finally find what we wanted:

Φ ≡G⁰_a,bVb,c

−1G_a,a− G⁰a,a V_c,bG⁰_b,a−1

G⁰_c,c = ΦI + Vc,bG⁰_b,bVb,cΦ−1

(4.4)

To sum this all up, we find G⁰_c,c using two inversions. With a third inversion we find ∆, and using (4.3) and (4.2), with only one more inversion, we finally have all the Green’s functions we were looking for.

We now know that a slice removal costs four inversions, and since the Block Algorithm needs one combination and one remmoval to find the Green’s functions for the current step, starting from the Green’s functions for the previous step, considering we also need to add the Green’s function for the left and right part of the system, the result is:

Cost (BA) = 7 (4.5)

When there is a change in the transverse potential between the slices affected by the probe, we need to modify the procedure. For an easier explanation we

(10)

will use an example: we have a probe made by ten slices and during the last iteration the first five slices were over a region of the system characterized by a transverse potential V1, while the other five were over a region of the system characterized by a transverse potential V2. Let us examine the current iteration.

We can use the block made by the first four slices, since it does not matter a lot that the probe moved: we have the four rightmost slices affecting a region characterized by the potential V1. For the very same reason we can use again the five leftmost slices, which are all affecting a region characterized by the potential V₂.

We need to cut the old block into three parts: the leftmost (made up of five slices), with which we will combine a new slice on the right, the rightmost (made up of four slices), which we will combine with the new block of six slices we just combined, and the block made by the fifth slice alone, which will be discarded.

It is obvious that the cost of this algorithm is higher if we consider this modification, but since the actual increase in cost depends on the average number of longitudinal discontinuities of the transverse potential, we will only compare the ideal cost.

4.5 Memory Algorithm (MA)

(11)

All the other algorithms we have discussed and that we will discuss do not store any computed Green’s function from one iteration (of the probe position) to the following, except for the Block Algorithm which stores “one”.

The Memory Algorithm is the only one that really requiresa large amount of memory in order to work.

From a temporal memory point of view, or rather from a “step” point of view, the Block Algorithm has a memory only one step deep, the Memory Algorithm instead has a depth dependent on the number of slices affected by the probe.

Back to the “one” of a few lines ago, actually the Green’s functions saved by the Block Algorithm are three, since we need three to describe a slice, but in this paragraph, and this paragraph only, we will refer to those three as

“one”, since it could become quite confusing otherways, and to store it we will need one memory “slot”. For the very same reason we will also adopt a different notation for the Green’s functions: if we are considering N slices, numbered from 1 to N , then the Green’s function for the first slice will be G₁, for the second G₂, and so on. With Gq,p, instead, we will be referring to the Green’s function for the block of slices from p to q, so, in example, G_4,1 will be the Green’s function for the block of the slices from 1 to 4.

(12)

This algorithm is, actually, a sort of “reborn” Block Algorithm, since it is based on the idea to reuse the Green’s function of a block of slices calculated during previous steps, too. We start from the last slice affected by the probe and use the Simple Algorithm procedure until we find the Green’s function for the entire block of slices affected by it, but we save the result for each intermediate step. Exactly in the same way as when computing the Green’s function for the left part of the system, when applying the idea described in paragraph 4.2.

Of course, since the Memory Algorithm is based on the same idea as the Block Algorithm, it suffers from the same litimation too: it can only be applied to a system characterized by a hard wall potential (but without limitations on the probe potential).

The best way to enter into the details of this algorithm is by using an easy example. If we assume the probe covers four slices, we will first of all calculate G_4,3 and save it, than G_4,2, and save it too, and at last we will calculate G_4,1.

First iteration: the cost so far is three combinations. We need two memory slots since we need to store G_4,3 and G_4,2.

For the next iteration the probe will move one slice to the left, so our target is to find G_5,2. This is actually a fast iteration since we already know

(13)

G_4,2, thus we just need to add G₅ to the left of it and we have what we are looking for.

Second iteration: the cost is one combination, thus the global cost is now four (since one combination costs one inversion). Since all the following probe positions will be too far for them to affect the second slice, G4,2 is of no use anymore, on the other hand we need to store nothing new, thus the memory need is still two slots, one of which contains G4,3.

The next iteration is required to find G6,3. We have G4,3 stored, and we know G6 and G5. We will proceed to compute G6,5 and store it, then we will combine it with G_4,3 in order to find G_6,3.

Third iteration: two combinations, with a global cost of six. The memory needs are unchanged, only G_6,5 is stored now.

At the following iteration, first we combine G₇ and G_6,5, finding and storing G_7,5, then we combine G₄ with it, and we find the Green’s function required by this step: G_7,4.

Fourth iteration: two additions again, global cost is eight. We are now storing only G_7,5, so our memory needs are unchanged.

The next, and “last”, iteration is for G_8,5, which is pretty straightforward to find, thanks to our stored G_7,5.

(14)

Fifth iteration: one combination, the global cost is nine. The memory is now free.

After the fifth iteration we have nothing useful in our memory, thus we are in the same situation as we were for the first iteration. From the next probe position on, all that is needed is to repeat the steps we just saw, which means we will calculate G9,8 and G9,7 and store them, and then calculate G_9,6 which is the Green’s function needed for the sixth iteration along the x axis.

None of the Green’s function of the sixth iteration can be obtained with the use of the ones we calculated in previous steps, even if we already worked with slices 6, 7 and 8. The reason is we need the Green’s function ending at slice 9 and our procedure starts from the end of the slice block. This is a really important point: the reason why we start from the left end and go backward is that, since we move the probe from right to left, blocks of slices with higher indexes are going to be used more times.

To understand this we just need to consider the result of a forward use of this procedure in the example just analyzed. During the first iteration we calculate G_2,1, then G_3,1 and finally G_4,1. However none of these is ever going to be used again, since they all start at slice 1, which will never be affected by the probe again in the following iterations. It is the same

(15)

reasoning applied during the fourth iteration. Back then we had a choice:

we could have calculated G_6,4 instead of G_7,5, but the first has no use during the fifth iteration since it can not be used to find G8,5 by the only mean of adding more slices. When the probe dimension grows, other strategies come to mind, but the “best Green’s function to store is the one relative to the block with the highest numerical index on the left” rule still provides the best results.

By applying this algorithm to bigger probes, we can see that a sequence arises. So, for a probe of length N our algorithm restarts every N + 1 iterations. During the first iteration N − 1 additions are required, the second iteration always costs only one addition, like the last iteration. Every other iteration costs two additions. This means that for every N + 1 iterations the cost is:

(N − 1) + 1 ∗ 2 + 2 ∗ (N + 1 − 3) = 3N − 3

And thus the average cost per iteration is:

Cost (M A) = 3N − 1

N + 1 + 2, (4.6)

where we added the cost for the composition of the blocks of slices unaffected by the probe, and wrote the formula in a way that lets us immediatly see that for N approaching infinity Cost (M A) approaches the value of five.

(16)

It is self-evident that this algorithm can not be used for probes shorter than three slices, and the three-slice length is peculiar, since the algorithm resets every two iteration and requires only one memory slot. As far as the cost goes, the first iteration requires two combinations, the second one only one, for a total cost of three, which leads to Cost (M A) = 3.5.

Anyway, this algorithm is quite powerful, but lacks a bit in flexibility since it can not easily handle probes of length lower than four, for which a dedicated code is required.

Dedicated code is also needed to handle the case of a longitudinal discontinuity of the transverse potential, like we did for the Block Algorithm.

However, since for the comparison we will use the cost provided in equation (4.6), we will refrain from explaining in detail the solution to the discontinuity problem. Essentially we would need to restart the Memory Algorithm on the left of the discontinuity, while using the Green’s function stored before encountering the discontinuity.

4.5.1 Storage needs of the Memory Algorithm

As far as memory space requirements are concerned, with a probe of length N , N − 2 slots are needed. However from this point of view the Memory Algorithm is very flexible. If we inspect the memory needs over a

(17)

single algorithm cycle, which means N + 1 iterations in our case of a probe of length N , we see that after the first iteration they decrease.

Memorywise we can split the algorithm into two. An ascending part, where the memory need increases, and a descending part, where it decreases.

It also happens that the ascending part perfectly coincides with the first iteration, while all the other iterations coincide with the descending part. As a reference, in Fig. 4.3 it is possible to see a scheme that helps understand this algorithm’s memory usage.

On the left of Fig. 4.3 (do not mind the line, for the moment) we can clearly see the ascent and descent part of the memory usage. We can see that every slot on the right of the last slot where we stored data contains no information, since no data has already been stored there during the ascension, and the data in those slots have no more use during the descent.

What happens if the memory needed is not available is quite easy to see just by looking again at the left part of Fig. 4.3 and comparing it with the right one: for steps that work with data in memory slots on the left of the line nothing changes, but for those steps that work with data in slots on the right we no longer have the data stored. By carefully inspecting the same image we can also deduce that every object we store in memory is directly used exactly twice, and in two consecutive steps. But they are used, directly or

(18)

2 3 4 5

6

7

8

9

10

1 1 2 3 4

5

6

9 8

7

10

Buffer of correct capacity Buffer short of 1 slot

Fig. 4.3 On the left an efficient, and easy to implement, way of handling the buffer to store the data needed by the Memory Algorithm, useful to understand different aspects of the storage needs. On the right, an illutration of what happens if the buffer is smaller than needed.

indirectly, in all the following steps until the descend cycle goes back enough to be able to save data on the buffer. The math in the end is not difficult, if we can only use N −2−M memory slots, the number of additional steps over the cycle of N + 1 iterations is 2M . And obviously for M equal to N − 2, or

(19)

rather for no memory storage at all, this algorithm essentially morphs into the Simple Algorithm.

It should be noted, by the way, that these calculations assume we fill the buffer exactly as we would do in case we had all the space we needed.

However in these cases the “best Green’s function to store is the one relative to the block with the highest numerical index on the left” rule is not enough to decide among all the possible Green’s function we can store. There are better rules than “save what you would save if you had all the space you needed” to decide what data to store, so the cost modification we just found is actually an upper bound.

Even if the strategy employed in the example is not optimal, it would be of little interest to further discuss about the optimal strategy to apply in the case of limited space, since at the present time the problem is marginal.

4.6 Metalidis-Bruno Algorithm

As we have already seen, in their paper the authors described an algorithm that works on single columns of mesh points (or rather slices of length zero), using an ideal probe that only affects one mesh point at a time. Thanks to this ideality they were able to reduce the inversion needed to find G_N,1

(20)

from the Gq,p functions to a trivial one, thus making the really time con- suming part of the problem, iterating calculations for every probe position, essentially costless.

Sadly, we can not just use the algorithm as it is, since we work in a different framework. They use coordinate representation for both x and y axis, while we use coordinate representation along x and mode representation along y. Which means that our probe is never going to be represented by a matrix with only one element different from zero, not even in the case of an ideal probe.

We can nevertheless use the idea behind the paper, to apply Dyson’s equation to the whole system, and this leads to the creation of two algorithms:

one which works on slices, and one which works on columns. Also, in case of hard wall potential, it is possible to combine them with the strategies behind the Block and the Memory Algorithms.

4.6.1 Metalidis-Bruno Algorithm by Column (MBAC)

A system made up to a single column of mesh points is characterized by only one Green’s function, which saves some multiplication but does not affect the Cost function when we calculate the addition of a column to the block of columns we have already created. Though, working on a column by

(21)

column basis obviously means a lot more work than working on a slice by slice basis, which is undoubtly a drawback of this algorithm.

Anyway, after the first steps, which essentially do the same thing ex- plained in paragraph 4.2, we need to calculate GN,1 and GN,N, and this is going to cost us one inversion (counts for GN,1 are exactly the same as in paragraph 3.3, those for GN,N follows from them). Since, as previously said, our framework is transverse eigenmodes instead of coordinates and we know that our perturbation matrix is going to be a full matrix, regardless of how many points along y are affected by the probe.

Of course we need to repeat this procedure if the probe affects more than one column, in this case the calculations from the paper need to be extended, since we need to know all the Green’s functions that will be required in the following step. From this point of view, the superscript on the Green’s functions will identify the step they refer to, where 0 means that the system is unaffected by the probe, 1 means that only one column of the system is affected by it, 2 means that two columns of the system are affected, and so on.

This is easy to understand if we look at the formula for the Green’s function of the generic step h , G^h_N,1 and G^h_N,N.

G^h_N,1 = G^h−1_N,1 + G^h−1_N,nV_n,n^tip^h

I − G^h−1n,nV_n,n^tip^h−1

G^h−1_n,₁

(22)

G^h_N,N = G⁽h − 1)N,N + G^h−1_N,nV_n,n^tip^h

I − G^h−1n,n V_n,n^tip^h−1

G^h−1_n,N

These tell us that in order to find what we want we need to know G^h−1_n,1 , G^h−1_n,n , G^h−1_n,N, G^h−1_N,₁ and G^h−1_N,N. V_n,n^tip^h is the perturbation matrix due to the effect of the probe on the isolated column h.

We have two equations and five variables, which calls for three more equations for the system to be uniquely soluble. Using the Dyson equation, and remembering that our system is a whole, completely unified, system, we find:

G^h_n,N = G^h−1_n,N + G^h−1_n,n V^tip^hG^h_n,N =

I − G^h−1n,nV^tip^h−1

G^h−1_n,N

G^h_n,n = G^h−1_n,n + G^h−1_n,n V^tip^hG^h_n,n =

I − G^h−1n,n V^tip^h−1

G^h−1_n,n

G^h_n,1 = G^h−1_n,1 + G^h−1_n,n V^tip^hG^h_n,1 =

I − G^h−1n,n V^tip^h−1

G^h−1_n,1

Only one inversion is needed to solve all of these five equations, so the cost per step is not changed.

In order to avoid confusion, we will define Pmthe number of mesh points along x affected by the probe, since we always used P to refer to the number of slices affected by the probe. Since Pm is the number of columns modified by the probe, it means it takes Pm steps to apply the entire probe to the system:

Cost(M BAC) = Pm (4.7)

(23)

We did not add the usual “+2”, due to the sum of the parts of the system unaffected by the probe, since we are not working just on the part of the system affected by it, but on the whole system.

If we want to merge this algorithm with the Memory Algorithm, in order to speed up calculations in the case of a probe with a costant potential along x, we could have troubles with the total memory needed, thus we could think of applying a Memory Algorithm strategy with reduced memory capability (lower than N − 2).

4.6.2 Metalidis-Bruno Algorithm by Slice (MBAS)

Working on slices saves a lot of calculations, so it is worth to see how the Metalidis-Bruno approach will perform when modified to hanlde slices.

This algorithm is the same as the previous, with the only difference that it operates on a different element: previously we applied the probe one column at a time, now we will apply it one slice at a time.

First thing to do is to find the equations that define the compose procedure between two slices. Thus, adopting the last paragraph notation, from Dyson’s equation:

G¹_d,a= G⁰_d,a+X

q,p

G⁰_d,qVq,pG¹_p,a

(24)

In our case d = N and a = 1, since we are applying the probe to the whole system. Our current probe slice ranges from j to k, so V q, p = 0 if the values of of p and q are different from them.

G¹_d,a = G⁰_d,a+ G⁰_d,jVj,kG¹_k,a+ G⁰_d,kVk,jG¹_j,a (4.8)

Normally at this point one or two of the terms of the right operand are zero, but this time it is not the case. Indeed G⁰_d,a is the Green’s function for the whole system not affected by the probe, so it surely is not zero, and the two other terms are not zero because we can always define a Green’s function between two points within the same system, and in this case the superscript 0 tells us that we are considering the whole device.

We need to find expressions for both G¹_k,a and G¹_j,a, thus we need to apply Dyson’s equation at least two more times:

G¹_k,a= G⁰_k,a+ G⁰_k,jVj,kG¹_k,a+ G⁰_k,kVk,jG¹_j,a (4.9)

G¹_j,a= G⁰_j,a+ G⁰_j,jVj,kG¹_k,a+ G⁰_j,kVk,jG¹_j,a (4.10)

Equations (4.8), (4.9) and (4.10) have a total of three variables, G¹_d,a, G¹_k,a and G¹_j,a, so the system is soluble. We will proceed to apply (4.10) to (4.9), and after some manipulation we find:

G¹_k,a = ΓaG⁰_k,a+ G⁰_k,kVk,jβaG⁰_j,a

(4.11)

(25)

where we defined:

Γa≡I − G⁰k,jVj,k− G⁰k,kVk,jβG⁰_j,jVj,k

−1

(4.12)

βa ≡I − G⁰j,kVk,j

−1

(4.13)

As we can see from (4.12) and (4.13), we needed to compute two inversion just to find G¹_k,a, luckily we need no more inversions to compute G¹_j,a and G¹_d,a. By straightforward substitution:

G¹_j,a = βaG⁰_j,a+ G⁰_j,jVj,kΓaG⁰_k,a+ G⁰_k,kVk,jβaG⁰_j,a

(4.14) G¹_d,a= G⁰_d,a+ G⁰_d,jVj,kΓaG⁰_k,a+ G⁰_k,kVk,jβaG⁰_j,a

+ G⁰_d,kVk,jβaG⁰_j,a+ G⁰_j,jVj,kΓaG⁰_k,a+ G⁰_k,kVk,jβaG⁰_j,a

(4.15)

We should check what Green’s functions of superscript 0 we used, since we will need to use their corresponding functions of superscript 1 as the known data in the calculations for the following step.

We used G⁰_d,a, G⁰_d,k, G⁰_d,j, G⁰_k,a, G⁰_k,k, G⁰_k,j, G⁰_j,a, G⁰_j,j. We used eight Green’s functions to compute three of them for the next step, with a cost of two inversions. And we still have to account for a ninth Green’s function, G⁰_d,d, which we well know is needed for our purpose of computing the transmission and reflection probabilities.

Let us apply Dyson’s equation again, to G⁰_d,d this time:

G¹_d,d = G⁰_d,d+ G⁰_d,jVj,kG¹_k,d+ G⁰_d,kVk,jG¹_j,d (4.16)

(26)

As we can see, we now need one new Green’s function of the step 0, G⁰_d,d. We also need, in order to find an explicit expression for G¹_d,d, to compute two other Green’s functions but they do not further increase the computational burden because the involved computational cost was included in that for G¹_d,a.

G¹_k,d = G⁰_k,d+ G⁰_k,jVj,kG¹_k,d+ G⁰_k,kVk,jG¹_j,d (4.17)

G¹_j,d = G⁰_j,d+ G⁰_j,jVj,kG¹_k,d+ G⁰_j,kVk,jG¹_j,d (4.18)

These last two equations do not require us to compute new Green’s function for step 0 or step 1.

From a formal view point, equations (4.16), (4.17) and (4.18) are exactly the same as (4.8), (4.9) and (4.10). Thus, even if the actual calculations are different, the formal calculation are identical and we can immediatly write the explicit expressions for them, by using respectively equations (4.15), (4.11) and (4.14). Since the equations are the same, they just relate different variables, it is strightforward we will need to introduce βd and Γd (we essentially switched the right subscript from a to d to pass from one set of equations to the other), but due to their particular form, we can immediatly see that

βa = βd ≡ β, Γa= Γd ≡ Γ

(27)

What is left is to compute G⁰_k,k, G⁰_k,j and G⁰_j,j, altough we will choose to compute directly G⁰_j,k instead of G⁰_k,j, obtaining the second one by transposition. By doing so, when computing G⁰_j,k and G⁰_k,k we will be able to use the same β and Γ once again. We will than be able to find G⁰_j,j by direct substitution and by using β^T (thanks to the fact that the order in which we apply transposition and inversion is irrelevant to the solution).

We are now able to perform this procedure recursively, at the cost of two inversions per step (and storage for nine Green’s functions). Since a probe N slice long requires N steps:

Cost (M BAS) = 2N (4.19)

4.7 Exploiting repetitions

First thing to do in order to write a clean, understandable, and optimized program is to organize the data storage. Of course, which data need to be stored, and how, depends on how the algorithm works, and from trade-offs between execution cost (time) and memory needed for data storage.

Since space is much less of an issue than time, the usual approach should be to sacrifice space in order to get shorter execution times. Without for- getting to keep an eye on how to prevent the memory requirements from running wild.

(28)

Regardless of the algorithm chosen, the program works by analyzing the system by slices (which we called columns, in case of unitary length), and adding the new one to those previously examined. In order to do so, the program needs to find eigenvectors and eigenvalues for the slice, and the only input needed for such computations is the slice potential profile. These calculations are quite expensive, and their size depends on the number of mesh points along y rather than the number of modes to be accounted for (which is the size of the matrices we work within the algorithms analyzed in this chapter, and which determines the computational cost). Usually there are far more points along y than modes active in a system, because a good approximation to the continuoum energy spectrum is desired.

Storing the result for every examined potential profile can save a good amount of calculations, depending on the system and the probe. For example, an ideal quantum point contact can be defined with two types of transverse potential profiles (the one for the leads and the one for the constriction), and if we examine the system with a symmetrical probe with a width of 21 slices, we can see how the program only needs to find eigenvalues and eigenvectors 22 times for every mesh point along y. One for every possible combination of probe and system slice, and two times in total for the case of a system unaffected by the probe. Without storing these data such calculations need

(29)

to be performed a total of 21 times for every position of the probe, plus three times to find eigenvalues and eigenvectors for the system when it is unaffected by the probe.

We can use an example to understand how much time saving this can result into: if we want to apply the probe on 100 different mesh points along the x axis, by storing the result of 22 calculations we are going to decrease the number of times we need to perform such calculations by two orders of magnitude. Increasing the number of mesh points along y makes the time spent on such calculations grow fast, while the time spent on Green’s function calculations remains costant. This means that the time of an iteration, in such cases, is defined by the time needed to calculate eigenvalues and eigenvectors. By reducing it by a factor which is of the order of 100 or 1000, we can see how the speed of our program will be barely affected by them.

Indeed if the Green’s function calculations per iteration cost 2 seconds and an iteration over y involves 200 iterations over x, we will go from 12 seconds per iteration to an average of 2.05 seconds per iteration.

Even with a non symmetrical probe the results are still good. As long as we only use hard wall potentials to describe the device (without instead specific requirements for the description of the probe), we can safely use this method and have significant gains. The amount of memory to store the data

(30)

needed is directly proportional to the number of slices and the square of the number of modes, which means that, with the RAM in every computer and graphic card being of the order of gigabytes, it is not going to be a problem at all.

Problem is that, if we consider a non ideal system with softwall potentials, there are hardly two slices whith the same potential profile.