JAGP and JLGVB: two ansatzes for the study of the electronic wave function of strongly correlated systems

(1)

Universit`

a degli Studi di Pisa

DIPARTIMENTO DI FISICA Corso di Laurea Magistrale in Fisica

JAGP and JLGVB: two ansatzes for the

study of the electronic wave function

of strongly correlated systems

Candidate

Antonella Meninno

Supervisors

Claudio Amovilli

Sandro Sorella

A.A. 2017-2018

(2)

(3)

Introduction

The object of this thesis is the study of electronic wave functions in conditions of strong electron interaction. The accurate calculations of electronic energy for any molecular system depends strongly on the form of the wave function. This is still an open problem. The main challenge is the achievement of linear scaling in computational time with the growth of the molecular system without any loss of accuracy. We are currently observing the development of a growing number of new and modern computational techniques, based on the state of the art of technological resources (as a leading example, the birth of the clusters and parallel programming).

One more point to remark is that, nowadays, the experimental, computational and theoretical aspects proceed almost evenly, influencing each other. This is especially true in the study of systems of chemical and physical interest (as in this thesis) and in the development of new materials and advanced electronics.

We have decided to apply our study of wave functions to model systems, that have already been studied in literature, with different methods, and that can give us a good estimation of the accuracy of our calculations. Testing methodologies on benchmark systems is a very diffused practice in the scientific community, both for testing new methodologies and obtaining confirmations of results already obtained. Among the most used computational methods, we surely have couple cluster (CC) and density functional theory (DFT). This is because, for small systems, CC is still the most advantageous (in terms of the ratio accuracy/computational cost). Unfortunately, methods like CC, that are accurate for small electronic systems, become rapidly computationally prohibitive as the number of electrons grows. In this thesis we have decided to use quantum Monte Carlo (QMC) methods. While

(8)

for small systems they give results that are comparable to the other methods, for systems with a large number of electrons they have a great advantage both in computational cost and accuracy. This could bring to remarkable achievements in the study of large molecules, both from a theoretical point of view, and from the point of view of practical applications to real systems, directly helping the development of technology.

Monte Carlo methods also offer a great versatility. These techniques are appropriate for describing lattices, open systems, large molecules and molecular clusters. With these methods, we are able to compute a significant part of the correlation energy, as a function of the inter atomic distances.

In particular, in this thesis, we have decided to use two kinds of trial wave functions for Quantum Monte Carlo. First, we use a wave function that can be described by a single pairing function (JAGP), studied for instance in [Casula and Sorella, 2003], and then we use a multideterminantal one (JLGVB), developed in [Fracchia et al., 2012] and using the Jastrow factor from [Filippi and Umrigar, 1996]. The Jastrow factor is an important focus of our analysis, as we have studied its effect and importance (especially for single determinant wave functions, using the Jastrow factor we manage to give good estimates of the correlation energy even with a small basis set). Whenever possible, we compare the results of the two different approaches: in principle, multideterminantal approaches should be heavier from a computational point of view than single determinant approaches, but should give more accurate results. We will see how the Jastrow factor allows us to give good energy estimates with both wave functions.

We will study two different physical systems, a planar molecule of H4 and

linear chains of hydrogen atoms. We will compare the results from our analysis of the H4 molecule with the results of [Gasperich et al., 2017], and we will compare

the energy that we obtain for linear chains with the analysis given in [Motta et al., 2017].

The results of our study are promising: we obtain very realistic estimates of the correlation energy, that confirm (and in some case improve) the results published in the literature. Owing to the great flexibility of our approaches, we believe that great progress can be done in the application to larger systems.

(9)

Introduction 3 1. In the first chapter, we describe the QMC methods. We will discuss in general about stochastic processes and the problem of sampling, and then we will introduce the two Monte Carlo methods that we use, Variational Monte Carlo (VMC) and (fixed-node) Diffusion Monte Carlo (DMC). We will compare those two methods and describe their main features, concluding with the estimate of errors.

2. In the second chapter, we discuss the Hartree-Fock method, that is central in describing systems of many electrons. We will discuss the main features of the method, and then we will look at the methods that go beyond Hartree-Fock approximation, by considering excited determinants and their linear combinations. In this chapter, we introduce also Density Functional Theory (DFT) in the Kohn-Sham approach. We mainly focus on the methods that we use in this thesis.

3. In the third chapter, we discuss the construction of the trial wave functions. This is a very important chapter, in which we put the basis of the present work. One section is dedicated to the JAGP wave function, and the other to the JLGVB wave function. In both sections, we describe the appropriate Jastrow factors that we use to impose cusp conditions.

4. In the fourth chapter, we present the results of our analysis. We first give a brief presentation of the programs that we use, and then present the two physical systems that we study. For both, we describe the physical system, give a literature overview, discuss the methods that we apply and then outline the results (that are further discussed in the conclusions).

5. The appendices are about important topics that are not fundamental for the core of the discussion. In one appendix we discuss pseudopotentials and their advantages in improving calculations by not considering the dynamics of the core electrons, and substituting them with an effective potential. In the other, a method for explicitly solving Hartree-Fock is presented, through Roothan’s equations.

(10)

(11)

Chapter 1 Stochastic processes and

Quantum Monte Carlo methods

Quantum Monte Carlo (QMC) is an approach that is naturally built for the new generations of computing architectures, thanks to its parallel algorithms based on a large number of independent processes used in the stochastic evaluation of many electron integrals [Attaccalite, 2005].

The focus of this work is on QMC methods whose purpose is to describe the many body wave function by means of a compatible trial wave function. By using the variational principle, these methods aim to find the best approximate solution of the Schr¨odinger equation using a trial wave function as an ansatz for the true Hamiltonian ground state, in systems in which the ground state is non degenerate. The wave function for a system of N electrons is evaluated in the full 3N -dimensional configurational space generated by electronic coordinates [Attac-calite, 2005].

In order to properly introduce the QMC methods, we must first discuss about stochastic processes, that are used to evaluate stochastically correlation functions defined over our chosen trial function.

This chapter is structured as follows: after an introduction to techniques of sampling, we discuss the two techniques that we used in this thesis, namely the Variational and Diffusion Monte Carlo, and then we conclude with a discussion on

(12)

methods to estimate the statistical errors within the QMC approach.

1.1 Stochastic processes

QMC methods are based on the stochastic sampling of an (in principle) infinite dimensional Hilbert space, by means of a large number of configurations that are described for a system of N particles by a 3N -dimensional vector ~x. In this section, by means of example, we will see how to generate configurations for a system of 1 particle in 1 dimension, with configuration described by a real number x. The processes that we describe are iterative, so we will often use an index xi to indicate

the configuration at the i-th iteration. We will still refer to 3N dimensional vectors using an arrow, as ~x.

Monte Carlo methods are based on an iterative generation of a large number of configurations xi, to be used in obtaining numerical approximations for integrals.

The idea is to generate random samples that we use to evaluate the integrand to approximate the exact result. As we need to generate a large set of points, this sampling is only possible thanks to the improvements in calculation power.

The computers cannot produce really random numbers, but they can just produce a pseudo-random numbers because they use a deterministic (even if very complicated) algorithm.

We give a first, very basic example of sampling [Becca and Sorella, 2017] to produce a succession ri of numbers in [0, 1). The idea is to choose positive integers

a and b and generate randomly the first element x1 using a given seed. Then the

chain is built as

xi+1 = (axi+ b) mod M,

where M = 231_{− 1 for a 32-bit generators and a and b are coprime. A real number}

in [0, 1) is then obtained by taking ri = xi/M . Starting from a given seed, the

full chain of pseudo-random numbers is iteratively generated. The main problem is that the sequence has a short cycle, compared to M .

(13)

1.1. STOCHASTIC PROCESSES 7

1.1.1 Direct sampling and evaluation of integrals

Let us give an example in which the direct sampling method can be useful. Suppose that we want to compute the integral

Z b

a

F (x) dx, (1.1.1)

where F (x) is a function peaked in a given point in [a, b]. We define a probability density P (x) (i.e. P (x) ≥ 0 and Rb

a P (x) dx = 1) in [a, b] close to F (x) ∗_{, so we} have: I = Z b a F (x) P (x)P (x) dx . (1.1.2)

We say that we can sample P (x) directly if we have a procedure to extract a sequence of number xi distributed according to the P (x). Using them, we can

evaluate the integral as (M is the number of sampled points) I ≈ 1 N M X i=1 F (xi) P (xi) ± √σ N, (1.1.3)

where σ can be estimated by σ2 = 1 N M X i=1 F (xi) P (xi) 2 − 1 N M X i=1 F (xi) P (xi) !2 . (1.1.4)

Let us point out some crucial points:

• P (x) must be chosen close enough to F (x) to reduce the statistical fluctua-tions;

• σ2 _{= 0 when P (x) ∝ F (x): this means that there are no statistical}

fluctua-tions.

Direct sampling is unfortunately not possible in practical applications, as we need a method to extract numbers from a probability distribution.

∗_{The meaning of ”close to” will be clear when we compute the variance: for now it suffices}

(14)

1.1.2 Markov chains

The Markov chain is an example of (pseudo-)random process that we can use to sample a probability distribution. This is a chain of real numbers xn generated

using a function G and some set of pseudo random numbers indicated with ξn: let

us further assume that the configuration at the time n + 1 is entirely determined by xn, ξn and G only, so that we can write:

xn+1= G(xn, ξn). (1.1.5)

A typical example of a Markov chain is the famous example of the random walk. In this example, ξn assumes values ±1, and the position xn+1 is decided as

xn+1 = G(xn, ξn) = xn+ ξn. Important quantities in this process are the

proba-bility distributions of the various xn, called Pn, and the conditional probability of

obtaining x when starting from x0_{, ω(x|x}0_{). We want the succession of probabilities}

Pn to converge to the wanted distribution function Peq: in order to do that, we do

not automatically accept every movement, but we must control our outcome. A method of doing that is provided by the Metropolis algorithm.

The Metropolis algorithm starts from a configuration and makes a trial move to a new configuration chosen from some probability density function. Then the trial move is accepted with a certain probability (that we will specify in the next paragraph). If the trial move is accepted the new proposed point becomes the next point on the succession; if the trial move is rejected, the old point becomes the next point on the succession. The details of this algorithm are explained in the next subsection.

1.1.3 Converging to a solution: the Metropolis algorithm

The main question is if Pn(x) converges to the equilibrium distribution Peq(x)

or not. We describe a way to know if equilibrium will be reached and a practical algorithm for building the acceptance in such a way that we have a convergence, following [Becca and Sorella, 2017].

(15)

1.1. STOCHASTIC PROCESSES 9 to satisfy the so-called detailed balance condition:

ω(x0|x)Peq(x) = ω(x|x0)Peq(x0) (1.1.6)

Through this condition, if at some time we have Pn(x) = Peq(x), at the next time

we will have (using the normalization condition P

x0ω(x0|x) = 1) Pn+1(x) = X x0 ω(x|x0)Peq(x0) = Peq(x) X x0 ω(x0|x) = Peq(x). (1.1.7)

We now turn to the problem of building an ω(x|x0_{) satisfying detailed balance and}

letting us to converge to a specified Peq(x) by specifying the acceptance.

As a transition from x to x0 _{happens when the move is proposed and accepted,}

we can write the transition probability as

ω(x0|x) = T (x0|x)A(x0|x), (1.1.8)

where T (x0_{|x) is a trial probability, representing the proposal of the move from x to}

x0_{, and A(x}0_{|x) is the acceptance probability. Here, we assume T (x}0_{|x) symmetric,}

i.e. T (x0_{|x) = T (x|x}0_).

In the Metropolis algorithm, we define A(x0_{|x) as}

A(x0|x) = minn1,Peq(x 0_{)T (x|x}0₎ Peq(x)T (x0|x) o = minn1,Peq(x 0₎ Peq(x) o . (1.1.9)

With this acceptance, we show that we can reach the equilibrium probability. In fact, if Peq(x0) > Peq(x), then A(x0|x) = 1 and A(x|x0) = Peq(x)/Peq(x0), so

ω(x|x0)Peq(x0) = T (x|x0)A(x|x0)Peq(x0) = T (x0|x)A(x0|x)Peq(x) = ω(x0|x)Peq(x).

(1.1.10) This means that the detailed balance holds, so after many iterations we generate numbers according to Peq(x).

Thus, the Markov iteration is defined as follows:

• A move is proposed by generating a configuration x0 _{according to the}

tran-sition probability T (x0_|x n)

• The move is accepted and the new configuration xn+1 is taken to be equal

to x0_{, if a random number η, uniformly distributed in [0, 1), is such that}

η < A(x0_{|x); otherwise the move is rejected and one keeps x}

(16)

Figure 1.1: A flowchart describing the variational Monte Carlo method. In the last arrow, the result is printed after a certain number of repetitions of the scheme (fixed at the start of the algorithm) is performed.

1.2 Quantum Monte Carlo methods

1.2.1 Variational Monte Carlo

We now describe the Quantum Monte Carlo methods, that we use to evaluate integrals that appear in the calculations of physical expectation values.

(17)

1.2. QUANTUM MONTE CARLO METHODS 11 (we denote the variables collectively as dR)† _{that gives the average of the energy,}

< H >= R ψ

∗_{(R)Hψ(R) dR}

R ψ∗_{(R)ψ(R) dR} (1.2.1)

We can define a probability density

P (R) = | ψ(R) |

2

R

| ψ(R) |2 _dR, (1.2.2)

and through the Metropolis algorithm described in the last section to sample a set of points from P (R).

Variational Monte Carlo is based on the variational principle, that is applica-ble to systems in which the energy levels are bounded from below: this is always the case in condensed matter physics.

The expectation value of the energy on a trial wave function must be greater or equal than the true ground state energy. This is derived expanding a normalized trial wave function ψT in term of the exact eigenstate of the Hamiltonian, called

ψi,

ψT =

X

i

ciψi, (1.2.3)

where the complex coefficients ci are normalized, i. e.

X

i

|ci|2 = 1. (1.2.4)

We can evaluate the expectation value of the energy: by calling Ei the eigenvalues

of the energy, with E0 the ground state (so Ei ≥ E0), we have

hψT| H |ψTi = X i X j c∗_jcihψj| H |ψii = X i |ci|2Ei ≥ E0. (1.2.5)

This means that when we use a trial wave function for computing the energy expectation value we always overestimate the ground state energy.

†_{In this chapter, we will neglect the spin coordinates in this section, but we will reinclude}

them in next chapter. Spin products are always trivial, and they are usually executed by hand before evaluating spatial integrals. They are not relevant quantities for the Monte Carlo analysis. Unfortunately, one of the Jastrow factor is not symmetric and spin integrals is not so trivial.

(18)

We choose to calculate the average with Quantum Monte Carlo methods, in particular variational quantum Monte Carlo (VMC). This method is convenient for the binding energy of a molecule (i.e. 1 kcal per mole, which is about 0, 04 eV for the binding energy of a molecule) through parallel computation [Foulkes et al., 2001].

We want to calculate the expectation values integrating over a 3N -dimensional space. We indicate with R all spatial coordinates of our N electrons. The VMC is based on the trial wave function ψT that must be a good approximation of the real

ground state wave function. In order to use VMC method ψT and ∇ψT must be

continuous, the integralsR ψ∗

TψT and R ψ∗THψT must exist and the variance must

be finite. We want to compute E₀V M C = R ψ ∗ T(R)HψT(R) dR R ψ∗ T(R)ψT(R) dR = R | ψT(R) |2 [ψT−1(R)HψT(R)] dR R | ψT(R) |2 dR ≥ E0. (1.2.6) We define P (R) = | ψT(R) | 2 R | ψT(R) |2 dR : (1.2.7)

this is the probability density that we use in the Metropolis algorithm to sample a set of points Rm. It is fundamental that P (R) ≥ 0 and R P (R) dR = 1: we

will have to ensure that those two properties hold. Then we evaluate the so called local energy

EL(R) = ψT−1(R)HψT(R) (1.2.8)

in every sampled point and calculate the average value as hELi ≈ 1 M M X m=1 EL(Rm), (1.2.9)

where M is the number of sampled points and the Rm are called walker. The

value of the integral is then

E₀V M C = lim M →+∞ 1 M M X m=1 EL(Rm), (1.2.10)

(19)

1.2. QUANTUM MONTE CARLO METHODS 13 and the variance is

σ2 M ≈ 1 M (M − 1) M X m=1 (EL(Rm)− < EL>)2. (1.2.11)

To use the Metropolis algorithm, we sample the probability P (R) with samples generated with the Markov chain defined by the Metropolis algorithm.

We work under the hypotheses that each transition R → R0 _{is reversible, that}

the stationary distribution exists and that the ergodicity is guaranteed. If we call T (R0 _{← R) the probability density function that the walker moves in the new}

position then the acceptance is

A(R0 ← R) = min1,T (R ← R

0_{)P (R}0₎

T (R0 _{← R)P (R)}

. (1.2.12)

Let Peq(R) be the equilibrium walker density. We must balance the moving from

R → R0 _{to the moving from R}0 _{→ R: by referring to 1.1.3,this reversibility implies}

the detailed balance condition

A(R0 ← R)T (R0 ← R)Peq(R) = A(R ← R0)T (R ← R0)Peq(R0), (1.2.13)

and the equilibrium distribution Peq(R) satisfies

Peq(R) Peq(R0) = A(R ← R 0_{)T (R ← R}0₎ A(R0 _{← R)T (R}0 _{← R)}. (1.2.14) We consider A(R ← R0₎ A(R0 _{← R)} = T (R0 _{← R)P (R)} T (R ← R0_{)P (R}0₎, (1.2.15)

and so, using (1.2.14), we have

Peq(R)

Peq(R0)

= P (R)

P (R0₎. (1.2.16)

This means that Peq(R) is proportional to P (R), as required.

In this way, we can use the acceptance or rejection algorithm to generate configurations distributed according to Peq(R). To this purpose, we generate a

random number in [0, 1). If the number is less or equal of the acceptance ratio in (1.2.12), we accept the move, otherwise it is rejected. In this way we generate a Markov chain that approximates our probability distribution.

(20)

Quantum averages and statistical samplings

The main limitation of this technique is that it heavily relies on an ansatz, as we have to choose the form of the wave function and express it in terms of a set of parameters. With a good ansatz, we can approach a vast class of problems. We now describe how to use the Variational Monte Carlo approach to obtain the optimal set of parameters (and thus the wave function of the ground state).

The first step is to fix a basis set {|Ri} such that: X

R

|Ri hR| = 1, (1.2.17)

and the quantum states can be written as

|ψi =X

R

|Ri hR|ψi =X

R

ψ(R) |Ri . (1.2.18)

It is impossible to perform an exact enumeration of the configurations |Ri to compute hOi exactly, so we can define a probability P (R) as

hOi = P R| hψ|Ri | 2 hR|O|ψi hR|ψi P R| hψ|Ri |2 , (1.2.20) P (R) = |ψ(R)| 2 P R|ψ(R)|2 . (1.2.21)

P (R) has the properties that:

P (R) ≥ 0, X

R

P (R) = 1. (1.2.22)

We can define a local value of the operator O OL(R) =

hR| O |ψi

(21)

1.2. QUANTUM MONTE CARLO METHODS 15 and, through Markov process a new set {|xni} is generated by sampling from

P (R). Now, we can rewrite the expectation value of the operator O as hOi ≈ 1 N N X n=1 OL(Rn), (1.2.24)

where OL (defined in (1.2.23)) is a generalization of the local energy EL.

The main point of the variational Monte Carlo approach is the computation of hR|ψi, which is the value of the variational state over a generic element of the basis set.

Optimization of variational wave functions

As we discussed before the wave function plays a crucial role in VMC method. Our aim is to optimize this wave function using a set of parameters αk (that we

collectively denote as {α}). We have two possibilities: 1. we can minimize the ground state energy EV M C

0 E₀V M C({α}) = R ψ 2 T(R, {α})EL(R, {α}) dR R ψ2 T(R, {α}) dR , (1.2.25)

2. or we can minimize the energy variance σ2({α}) = R ψ 2 T(R, {α})[EL(R, {α}) − E0V M C({α})]2dR R ψ2 T(R, {α}) dR . (1.2.26)

A first proposal was based on the so called correlated sampling. We can in general choose to do optimization of the energy or of the variance of the energy. For the variance we have a lower bound: it is zero if ψT({α}) is a real ground state

eigen-state of the Hamiltonian. The technique of using the variational parameters {α} to minimize the variance and obtain the trial wave function has been introduced in [Umrigar et al., 1988], and explored in [Umrigar et al., 2007] and [Filippi and Umrigar, 1996]: in this approach, energy optimization is less stable than variance optimization. We describe the method following [Becca and Sorella, 2017]. Gener-ally we can write σ2_{({α}) by averaging it over a reference wave function at given}

(22)

variational parameters {α0}: σ2({α}) = R ψ 2 T(R, {α0})w({α})[EL(R, {α}) − E0V M C({α})]2dR R ψ2 T(R, {α0})w({α}) dR , (1.2.27) where w({α}) = ψ 2 T(R, {α}) ψ2 T(R, {α0}) . (1.2.28)

In order to minimize σ2_{({α}), we take its average over a set of M configuration}

Rm calculated starting from ψT2(R, {α0}) and we arrive at

σ2({α}) ≈ PM m w(Rm, {α})[EL(Rm, {α}) − E0V M C({α})]2 PM m w(Rm, {α}) . (1.2.29)

When the variational parameters are such that ψ2

T(R, {α}) is an eigenstate of the

energy, then σ2_{({α}) = 0 [Foulkes et al., 2001]. The aim is then to find the values}

of the variational parameters that give the lowest possible variance: we can find them using any numerical method. With these values, we can optimize the wave function and obtain all observables of interest.

Another way to optimize the wave function is by means of energy derivatives: let fk = − ∂Eα ∂αk = − ∂ ∂αk hψα| H |ψαi hψα|ψαi . (1.2.30)

Here we vary only a parameter αk in the set of parameters collectively denoted

as α. The aim is to look for the parameters {α} that make this derivative zero, giving a minimum of the energy.

We perform the derivation with respect to αk varying the wave function as

ψα+δαk(R) = ψα(R) + δαk

∂ψα(R)

∂αk

+ o((δαk)2). (1.2.31)

As we are dealing with non degenerate ground states, we can assume that all involved quantities are real: the wave function is a real function of the coordinates with real parameters.

We compute fk explicitly and give indications about how to compute it with

Monte Carlo. Let

Ok(R) = 1 ψα(R) ∂ψα(R) ∂αk . (1.2.32)

(23)

1.2. QUANTUM MONTE CARLO METHODS 17 Furthermore, let |v0,αi = |ψαi ||ψα|| , Ok = hv0,α| Ok|v0,αi , |vk,αi = (Ok− Ok) |v0,αi . (1.2.33)

All the states |vk,αi are orthogonal to v0,α, but they do not constitute in general

an orthonormal set.

Computing explicitly |v0,α+δαki is a matter of performing the expansion. We

do it step by step, and all equalities here are to be intended up to o(δα2

k). The

variation of the ket can be expressed in terms of Ok as (using (1.2.31))

|ψα+δαki = (1 + δαkOk) |ψαi . (1.2.34)

We need to work out the variation of the denominator. We start with ||ψα+δαk||

2 _{= hψ}

α| (1 + δαkOk)(1 + δαkOk) |ψαi = (1.2.35)

= ||ψα||2+ 2 hψα| δαkOk|ψαi = ||ψα||2(1 + 2 hv0,α| δαkOk|v0,αi))

= ||ψα||2(1 + 2δαkOk).

Using the standard Taylor expansions √ 1 + 2ax = 1 + ax + o(x2_), 1 1 + bx = 1 − bx + o(x 2_), _(1.2.36) we get 1 ||ψα+δαk|| 2 = 1 ||ψα|| (1 − δαkOk). (1.2.37)

We can now combine (1.2.34) and (1.2.37) to obtain

|v0,α+δαki = (1 + δαkOk− δαkOk) |v0,αi = (1.2.38)

= |v0,αi + δαk(Ok− Ok) |v0,αi = |v0,αi + δαk|vk,αi + o((δαk)2).

We can compute the variation of Eα using the definition of derivative:

(24)

We insert a completeness relation in fk (also using the definition of local energy (1.2.23)): fk = − 2 P Rhψα| H |Ri hR| (Ok−Ok) |ψαi P Rhψα|Ri hR|ψαi = (1.2.40) = − 2 P REL,α(R)(Ok− Ok)|ψα|2 P R|ψα|2 .

In the Monte Carlo algorithm, this object is evaluated by sampling N times Ri as

before: with the same set of the Ri, we then obtain the energy derivative as

fk = −2 1 N N X i=1 EL,α(Ri)(Ok(xi) − Ok), Ok = 1 N N X i=1 Ok(Ri). (1.2.41)

Having computed the energy derivatives with respect to the parameters, we can understand if a configuration minimizes the energy locally by verifying if fk = 0:

as fk is the derivative of the energy with respect to the variational parameter αk,

configurations with fk= 0 are local extrema of the energy.

The simplest method based on the energy derivatives to find a minimizing configuration {α} is the method of steepest descent [Press et al., 2007]. Intuitively, this method varies the parameters along the derivative, according to

δαk = ∆fk, (1.2.42)

where ∆ is a positive constant, to be taken small. This way, the energy difference in two configurations can be written as

Eα+δα− Eα = X k ∂Eα ∂αk δαk+ o((δα)2) = −∆ X k f_k2+ o(∆2) ≤ 0 : (1.2.43) then the energy after the variation is always smaller than the energy before the variation. This method is useful in this simple example: we can consider a wave function that is linear combination of orthonormal Slater determinants |ki as |Ψαi = P_kαk|ki: in this case, as we want to find the direction for

minimiza-tion, we look for the minimizing direction along lines with length δs2 ₌X

k

δα2

k, (1.2.44)

and minimize the quantity (using Lagrange multiplier µ) ∆E + δs2µ =X

k

(−fkδαk+ µδα2k) =⇒ δαk =

fk

(25)

1.2. QUANTUM MONTE CARLO METHODS 19 we then obtain ∆ = 1/2µ and this is the same law as (1.2.42), so we confirm the intuitive approach.

Even if intuitive, this method is not enough in real applications, where the variation of a parameter can have a noticeable effect on the wave function, while the variation of another parameter can have a negligible effect. In order to account for this, we can select our variations by computing how much they vary the wave function. We use

δs2 = ||v0,α+δα− v0,α||2 : (1.2.46)

that is, we do the variations keeping the difference in wave function constant. As we can express

|v0,α+δαi = |v0,αi +

X

k

δαk|vk,αi + o(δα2), (1.2.47)

we obtain, after symmetrization δs2 =X k,k0 hvk,α|vk0_,αi + hv_k0_,α|v_k,αi 2 δαkδα0k+ o(δα3) ' X k,k0 Sk,k0δα_kδα_k0. (1.2.48) We can now use the approach of stochastic reconfiguration [Sorella, 2001]. We fix the value of δs2 _{by minimizing the quantity (with µ a Lagrange multiplier as}

before) ∆E + µδs2 _{= −}X k fkδαk+ X k,k0 Sk,k0δα_kδα_k0 : (1.2.49)

minimization with respect to the variation gives X

k,k0

Sk,k0δα_k0 = fk

2µ. (1.2.50)

As in the paper, the matrix S is strictly positive definite, so it can be inverted and gives the δαk in terms of the fk that we know how to compute and in terms of the

Lagrange multiplier µ, that we can fix to a constant (by identification ∆ = 1/2µ). In the Monte Carlo approach, the matrix S is computed from N samplings xi as

Sk,k0 ' 1 N N X i=1 (Ok(xi) − Ok)(Ok0(x_i) − O_k0). (1.2.51)

We then summarize the procedure for using the derivative method to find the optimal parameters {α}:

(26)

1. starting with a wave function ψα that depends on a set of parameters, we

compute the fk in the Monte Carlo approach using (1.2.41);

2. if the forces are compatible with zero we already are at a minimum, else we continue the algorithm;

3. we decide a small step ∆;

4. using the sampling, we compute the matrix S from (1.2.51);

5. we invert the matrix S and use (1.2.50) to compute the δα, also using ∆ = 1/2µ;

6. we recalculate the wave function ψ at parameters α + δα, and compute the fk with the new parameters. If the fk are compatible with zero we stop, else

we restart from point 3.

1.2.2 Fixed-node diffusion Monte Carlo

Diffusion Monte Carlo is a projection method whose main purpose is to find the “exact” ground state energy. It is very important because it connects the projection method to the statistical approach [Ceperley and Alder, 1980, Foulkes et al., 2001].

Diffusion Monte Carlo uses the Green functions, and we will use its fixed node approximation. Following [Foulkes et al., 2001], we first introduce the general method, and then we introduce the approximation scheme that we’re going to use.

Diffusion Monte Carlo: Green functions

We aim to solve the time-dependent Schr¨odinger equation with imaginary time

−∂tΦ(R, t) = (H − ET)Φ(R, t), (1.2.52)

where R represents the coordinates in the system and ET is an energy offset, that

will be fixed later. This equation can be solved in closed form: Φ(R, t + τ ) =

Z

(27)

1.2. QUANTUM MONTE CARLO METHODS 21 where G is the Green function of the system, that is given by

G(R ← R0, τ ) = hR| exp(−τ (H − ET)) |R0i . (1.2.54)

G solves the same differential equation of Φ, (1.2.52) (with t changed with τ ), with initial condition G(R ← R0_{, 0) = δ(R}0_{− R) (with this initial condition, no}

diffusion is possible in zero time). If |Ψii are eigenstates of the Hamiltonian with

energies Ei, then we can decompose

exp(−τ H) =X

i

|Ψii exp(−τ Ei) hΨi| , (1.2.55)

so we can write the Green function as G(R ← R0, τ ) =X

i

Ψi(R)e−τ (Ei−ET)Ψ∗i(R 0

). (1.2.56)

We now reorganize the eigenstates and eigenvalues in such a way that E0 is the

ground state of the system and choose ET = E0. In the limit τ → ∞, the

expo-nential tends to zero for every Ei, with the exception of E0, as the exponential is

e−τ ·0 _{= 1 so it remains 1 in this limit. The situation is schematically illustrated}

in figure 1.2. This situation is similar to statistical mechanics, where the parti-tion funcparti-tion e−βH _{(for particles with no exchange symmetry) in the limit of zero}

temperature β → ∞ distributes all particles in the ground state. In the case of 3N free particles, we have

∂tΦ(R, t) = 1 2 N X i=1 ∇2_iΦ(R, t) (1.2.57)

and the Green function is Gaussian,

GF(R ← R0, τ ) = (2πτ )−3N/2exp −|R − R 0_|2 2τ . (1.2.58)

The Green function point of view allows us to interpret the imaginary time evo-lution of this quantum system as a Brownian motion. This works by substituting the wavefunction Φ with a sum of Dirac deltas, centered around given positions Rk, that represent sampling points (or walkers) in a Brownian motion: this is done

through (here Ns is the number of samplings)

Φ(R, t) → Ns X k=1 δ(R − Rk) =⇒ Φ(R, t + τ ) → Ns X k=1 G(R ← Rk, τ ). (1.2.59)

(28)

Figure 1.2: The limit β → ∞ turns the operator e−β∆E _{into a projector on the}

ground state of the theory. We have here used β instead of τ as a bridge between the imaginary time Schr¨odinger equation and the statistical mechanics description.

In such a Brownian motion, we start from some fixed positions that we randomly sample from some distribution (like an initial wave function), and then in the next step we sample the new distribution (from the right hand side of (1.2.59)) after a time τ to obtain a new starting position for a new time evolution. The algorithm succeeds if we reach stabilization: in this case the distribution of points will provide us an approximation to the ground state wave function.

In presence of a potential V we can approximate the Green function using perturbation theory, using the limit τ → 0. Details are not important for our discussion: from the same reference, we obtain the correction

G(R ← R0, τ ) = GF(R ← R0, τ ) exp(−τ [V (R) + V (R0) − 2ET]/2)

| {z }

P

. (1.2.60)

The extra factor P incorporates the effect of the potential. We can use P to avoid unlikely situations by following an algorithm from [Reynolds et al., 1982]: after having performed our sampling using the free Green’s function, we evaluate P for every walker. If P ≤ 1, the position can be unlikely, so we remove the walker with probability 1 − P . If P > 1, we let the walker continue, and create a new walker at the same position with probability P − 1. We can use the offset ET to regulate

(29)

1.2. QUANTUM MONTE CARLO METHODS 23

Figure 1.3: Projection of the state |ψ0i after repeated application of e−τ H: at each

iteration the projected state becomes a better approximation for the ground state. In our procedure, we have used the function Φ (that is a real function, as it solves the real equation (1.2.52)) as a probability density function, as we use the function to sample sets of starting conditions for the diffusion algorithm. This is not the general situation, and solutions to (1.2.52) can have nodes in general, switching sign. We will use an approximation of this procedure, the fixed node approximation, that deals with this problem.

Fixed node approximation

The fixed node approximation is able to deal with the sign problem that makes the direct application of diffusion Monte Carlo difficult.

The main idea of the fixed node approximation is to make an ansatz on the location of the nodes of the solution to (1.2.52). If we locate the nodes and keep them fixed, then we can divide the space in regions in which Φ is positive and regions in which Φ is negative‡_{, and use the standard algorithm in those regions.}

Furthermore, we erase a walker when it goes from a region of positive sign to a region of negative sign or viceversa, and we regulate the offset ET in such a

‡_{A negative Φ is not a problem to the algorithm, as it is sufficient to use −Φ as probability}

(30)

Figure 1.4: Evolution of walkers for DMC method for the harmonic oscillator. In the various vertical steps we are performing a step of our diffusion and sampling algorithm. At convergence, we obtain a good approximation of the ground state wave function of the harmonic oscillator. The image has been taken from [Foulkes et al., 2001].

way that the total number of walkers remains approximatively constant. Due to the antisymmetry of the electron wave function, we know that we have zero whenever the wave function is evaluated at coinciding positions: this is in general not sufficient to compute all the possible zeroes of the function (nodal surface), and we often use trial nodal surfaces from trial wave functions. We will call pockets the various disconnected regions in the configuration space.

Suppose that we use a trial wave function ΨT(R) to define the nodal surface.

In each pocket vα_{, we run the simulation and obtain a wave function Ψ}α

0(R) for

the pocket: this is a function that is non zero in the pocket vα_{, zero in all other}

pockets and solves

HΨα₀(R) = E₀αΨα₀(R) + δα. (1.2.61) The additional constant δα _{has to be introduced because the pocket wave}

func-tion does not go to zero smoothly at the nodal surface, so discontinuities can be expected. According to [Reynolds et al., 1982, Moskowitz et al., 1982], the energy Eα

0 is higher than the true ground state energy E0, so in each region we get an

(31)

1.2. QUANTUM MONTE CARLO METHODS 25 surface as an ansatz we can get an energy estimate equal to the actual value: the choice of the trial wave function is then very important.

This simple algorithm has a problem: the quantity P from (1.2.60), that is fundamental in studying the problem of scattering under a potential and in controlling the population of walkers, fluctuates wildly. To solve this problem, we use a method called importance sampling [Reynolds et al., 1982, Ceperley and Kalos, 1979, Grimm and Storer, 1971]. This method relies on the choice of a trial wave function ΨT(R), that is multiplied to both sides of equation (1.2.52). In

terms of f (R, t) = Φ(R, t)ΨT(R), the equation reads

−∂tf (R, t) = − 1 2∇ 2_{f (R, t) + ∇ · [v} D(R)f (R, t)] + [EL(R) − ET]f (R, t). (1.2.62) In this equation, ∇ is the gradient with respect to the coordinates R, and we have defined

vD(R) = ΨT(R)−1∇ΨT(R), EL(R) = ΨT(R)−1HΨT(R). (1.2.63)

The first object is usually referred to as drift velocity, while the second is the usual local energy. The Green function for this system ˜G is expressed in terms of the Green function (1.2.54) as ˜G(R ← R0_{, t) = Ψ}

T(R)G(R ← R0, t)Ψ−1T (R

0_{). This}

function can be approximated by (when τ → 0) [Foulkes et al., 2001] ˜ G(R ← R0, t) ' Gd(R ← R0, t)Gb(R ← R0, t), (1.2.64) with Gd(R ← R0, t) = (2πt)−3N/2exp −|R − R 0_{− tv} D(R0)|2 2t , (1.2.65) Gb(R ← R0, t) = exp(−t(EL(R) + EL(R0) − 2ET)/2).

With this approximation, we obtain an enhancement of the density of walkers in the regions where |ΨT| is increasing due to the drift velocity. Furthermore, the local

energy substitutes the potential energy, and this energy is subject to way smaller fluctuations (if we choose the trial wave function in a clever way). Last, the drift velocity makes it way more difficult for a walker to cross the nodal surface. The approximation has a downside, as it requires small intervals t at each time to be correct, and this makes computations heavier. A further improvement [Reynolds

(32)

et al., 1982, Ceperley et al., 1981] involves an acceptance-rejection mechanism as in the VMC method: the Green function satisfies the detailed balance

˜

G(R ← R0, t)ΨT(R0)2 = ˜G(R0 ← R, t)ΨT(R)2. (1.2.66)

Furthermore, Gb is symmetric under the exchange R0 ↔ R, so we can write an

acceptance algorithm as pacceptance(R ← R0) = min 1, Gd(R 0 _{← R, t)Ψ} T(R)2 Gd(R ← R0, t)ΨT(R0)2 . (1.2.67)

This procedure significantly reduces calculation time and improves convergence. The computer time required to calculate the energy of a system to some given accuracy using the fermionic VMC and DMC methods scales as N3 _{[Ceperley and}

Alder, 1980, Foulkes et al., 2001].

1.3 Estimation of errors

At every measurement we must associate an error; here we will see how to compute estimations for the variance σ2_{. Let us consider a random variable x, with}

average µx, that we estimate from N measurements xi. An unbiased estimation

of the average µx is given by

¯ x = 1 N N X i=1 xi = hxi. (1.3.1)

This estimate is unbiased (in the sense that it tends to µx for N → ∞) for

uncorrelated and correlated measurements.

In the case of uncorrelated variables, we can obtain the variance of our mea-surement σ2

x = h(x − µx)2i using the estimator

s2 = 1 N X i x2_i − 1 N2 X i,j xixj, (1.3.2)

that provides an unbiased estimator of σ2

x when the variables are not correlated,

(33)

1.3. ESTIMATION OF ERRORS 27 that the samples will be in general not correlated. We then need an alternative method that also considers correlations. We take this method from [Becca and Sorella, 2017], that is called the binning technique.

We divide up the data set {xi} derived from a long Markov chain into several

Nbin segments, each of length Lbin= _NN_bin. On each bin j, with j = 1, ..., Nbin, we

define the partial average

xj = 1 Lbin jLbin X i=(j−1)Lbin+1 xi. (1.3.3)

The upper index j _{goes from 1 to L}

bin, and we can compute the average ¯x by

summing all the xj _{and dividing by L}

bin: this procedure will yield the same average

as before. This binning procedure is useful for estimating the variance. If we take Lbin to be sufficiently larger than time correlation τ , then the different bin

averages xj _{can be reasonably considered to be independent random variables and}

the variance can be easily estimated, s2_bin = 1 Nbin Nbin X j=1 (xj − ¯x)2. (1.3.4)

Then the variance of the average value is given by s2_¯_x= s

2 bin

Nbin

. (1.3.5)

We will use this technique to give estimation of the variance and obtain errorbars for our measurements.

(34)

(35)

Chapter 2 Post Hartree-Fock methods

In this chapter we give an overview about the major computational methods for strongly correlated systems. Though we use these methods only to generate the Monte Carlo input, these methods have a marked importance in the study of these systems. We start from the most important mean field theory, the Hartree-Fock method, and after we illustrate the most important post Hartree-Fock methods, both variational or not. For the first two sections of this chapter, we mainly follow the exposition and notation (with some slight differences) of [Szabo and Ostlund, 1996, G. Grosso, 2013].

The chapter is entirely focused on the construction of initial wave functions by using various methods. The organization of the chapter is as follows:

• In the first section we describe the Hartree-Fock method, showing the theo-retical derivation of the Hartree-Fock energy and showing the Fock equation (and interpretations of its solutions). For the interested reader, a way of solving the Fock equation in an iterative way is presented in Appendix A. • In the second section we will go beyond Hartree-Fock, using the solutions

to the Fock equation to overcome the main limitation of the method: the computation of correlation energy. We will examine the configuration inter-action method and a method that is not strictly Hartree-Fock, the density functional theory method. Lastly, we will talk about methods that are widely present in literature, that we use to benchmark our results.

(36)

2.1 Hartree-Fock method

Hartree-Fock theory is fundamental in electronic structure theory. It is based on the Born-Oppenheimer approximation for the Hamiltonian of the system, in which we fix the nuclei of our system, so we can work only with the electronic Hamiltonian. In this approximation we can assume that the motion of atomic nuclei and electrons can be separated because the mass of an electron me is small

compared to that of a nucleus MN (me/MN ' 10−4 for the hydrogen nucleus, that

is the lightest one).

2.1.1 The Hartree-Fock equation

We consider only the electronic Schr¨odinger equation that contains informa-tions about the equilibrium geometry of the nuclei (due to the minimum of the energy) and about molecular properties such as dipole and multipole moments and polarizability (due to the wave function form). To study this system we choose a basis set of molecular orbitals (MO) that is a basis formed by single particle func-tions, and we impose the antisymmetry under exchange of fermion coordinates, as electrons are described by fermion statistic. So, with {ψ(r)} the orbital wave functions and ω the spin coordinate (assuming values up and down), with χ the spin wave function (that can be written in the basis {α(ω), β(ω)}, with α the spin up wave function and β the spin down wave function), we can write the single electron function, with x = {r, ω}:

ψ(x) = φ(r)χ(ω), (2.1.1)

where χ(ω) is a spin function. The antisymmetric wave function for a system of N particles is then expressed in terms of the Slater determinant as

ψHF = 1 √ N ! ˆ A{ψ1(x1)...ψN(xN)} = 1 √ N !det ψ1(x1) . . . ψN(x1) ... . . . ... ψ1(xN) . . . ψN(xN) . (2.1.2)

Exchanging two coordinates amounts to exchanging two rows of the matrix, and this procedure changes the sign of the determinant. Then ψHF is manifestly

(37)

2.1. HARTREE-FOCK METHOD 31 The limitation of the Hartree-Fock method is that we consider only a single Slater determinant, but its advantage is to incorporate the Fermi statistic in an immediate way.

Now we want to manipulate the electronic Hamiltonian in order to have a simple equation. We can write the electron Hamiltonian as (here RI are the

positions of the nuclei, that are fixed, the ZI are their charges, the lowercase

indexes label the N electrons and the uppercase indexes label the nucleons) Hel= X i p2_i 2 + X i Vnucl(ri) + 1 2 X i6=j 1 |ri− rj| +1 2 X I6=J ZIZJ |RI− RJ| (2.1.3) where the first term is the kinetic energy of the electrons, the interaction between electrons and nuclei is given by

Vnucl(ri) = − X I ZI |RI− ri| , (2.1.4)

the third term describes the interaction between electrons and the last term de-scribes the interactions of the nuclei between themselves: in this approximation it is constant, i.e. it is an irrelevant energy shift. By writing the operator (here rij = |ri− rj|) hi = − 1 2∇ 2 i − X I ZI riI (2.1.5) we can rewrite the electronic Hamiltonian

Hel = X i hi+ 1 2 X i6=j 1 rij = G1+ G2. (2.1.6)

We now follow [G. Grosso, 2013] to average (2.1.6) on the Slater determinant (2.1.2). The operators present in (2.1.6) are of two kinds: the one electron op-erators hi and the two electron operators vij. We start with the evaluation of

G1: hψHF| G1|ψHFi = 1 N !D ˆA{ψ1(x1)...ψN(xN)} G1 A{ψˆ 1(x1)...ψN(xN)} E . (2.1.7) We can eliminate the antisymmetrizations in the following way: we first eliminate one of them by expanding it as a sum of N ! terms, and recognizing that in each term we have a matrix element between a product state and an antisymmetrized

(38)

state. Then we expand the second antisymmetrized state and recognize that the only surviving term is the one in which the order of orbitals and coordinates is the same in the bra and ket. Those are N ! contributions, so the N ! in the denominator is canceled. Furthermore, in each term of the sum, when the matrix element of hi is computed the orbitals that are not evaluated at the coordinate i give various

factors of 1, and the product of spin variables is trivial. Reusing some standard notation, we then obtain the first part of the average as (h is defined as hi with

every instance of ri replaced by r)

hψHF| G1|ψHFi =

X

i

hψi(x)| h |ψi(x)i . (2.1.8)

To average the two particle operator G2, we follow a similar procedure: we compute

hψHF| G2|ψHFi = 1 N !D ˆA{ψ1(x1)...ψN(xN)} G2 A{ψˆ 1(x1)...ψN(xN)} E . (2.1.9) We follow the same procedure as before to remove the antisymmetrization: in this case, we will have matrix elements involving two wave functions χi instead of one,

and we have to consider the different possible and inequivalent ways of assigning the coordinates to the orbitals. After similar computations, we get

hψHF| G2|ψHFi = 1 2 X i6=j hψi(x1)ψj(x2)| 1 r12 |ψi(x1)ψj(x2)i − (2.1.10) − hψi(x1)ψj(x2)| 1 r12 |ψj(x1)ψi(x2)i .

Using the standard convention to suppress the variables xi (and assign them

ac-cording to the order of the orbitals in the bras and kets, that is a convention that we will use from now on: as an example, |ψaψbψci = |ψa(x1)ψb(x2)ψc(x3)i, we

finally get the Hartree-Fock energy EHF= X i hψi| h |ψii + 1 2 X i6=j hψiψj| 1 r12 |ψiψji − hψiψj| 1 r12 |ψjψii , (2.1.11) where the two-electrons integral is

hψiψj| 1 r12 |ψkψli = Z dx1dx2ψ∗i(x1)ψj∗(x2) 1 r12 ψk(x1)ψl(x2). (2.1.12)

We have chosen to express everything in terms of the xi instead of the ri: it is

intended in the differential dxi that we integrate over ri and perform the spin

(39)

2.1. HARTREE-FOCK METHOD 33 The Hartree-Fock method determines the set of spin orbitals which minimize the energy and give us the best single determinant, i.e. we minimize the energy with respect to the variation ψi → ψi+ δψi, under the constraint that the varied

spin orbitals remain orthonormal [G. Grosso, 2013]. In the EHF equation, the

first two body term is commonly called the Coulomb term, while the second is called exchange term. The variational step proceeds as always, and in terms of the Lagrange multiplier i, the final eigenvalue evaluation that determines the

eigenvectors ψi and the eigenvalues i is (Hartree-Fock equation)

−1

2∇

2

r+ Vnucl(r) + Vcoul(r) + Vexch(r)

ψi = F ψi = iψi, (2.1.13)

where the definitions for the operators Vcoul(r) and Vexch(r) are (the sum is

re-stricted to the considered orbitals in the Hartree-Fock determinant, and the nota-tion for the differential means that both space integranota-tion and spin product must be done: the spin product selects ψ functions with parallel spin in the integral)

Vcoulψi(r, σ) = (occ) X j ψi(r, σ) Z ψ_j∗(r0, σ0) 1 |r − r0_|ψj(r 0 , σ0) d(r0, σ0) , (2.1.14) Vexchψi(r, σ) = − (occ) X j ψj(r, σ) Z ψ∗j(r0, σ0) 1 |r − r0_|ψi(r 0 , σ0) d(r0, σ0) . (2.1.15)

The Hartree-Fock equations are really complicated, as they have to be solved self-consistently, as the solutions ψi enter in the definitions of the operators in the

Hartree-Fock operator F . As they are composed by a spin and an orbital part and they depend on the configuration of the nuclei, the ψi are called molecular spin

orbitals.

According to the selection of the spin parts of the wave functions, the Hartree-Fock method is used in the corresponding restricted and unrestricted formulations. We call the method restricted if we consider a closed shell orbital fully occupied and we suppose that the orbital part of the wave function is the same for spin α and for spin β. Instead we have the unrestricted Hartree-Fock if the orbital part of the wave function is different for α and β spin, i.e. if there is spin contamination.

(40)

2.1.2 Interpretation of the solutions: Koopman’s theorem

The Hartree-Fock equation (2.1.13) gives us the functions ψi, eigenfunctions

of the Fock operator, and the respective eigenvalues i (that are called the orbital

energies). For a system of N electrons, the Hartree-Fock ground state is the determinant built using the N lowest eigenvectors ψi of F . It is important to note

that the Hartree-Fock energy EHF defined in (2.1.11) is not the sum of the first N

eigenvalues i. (2.1.13) can be written as equation between kets in an obvious way

as

F |ψii = i|ψii . (2.1.16)

By performing a scalar product using hψi|, we obtain

i = hψi| h |ψii + (occ) X j hψiψj| 1 r12 |ψiψji − hψiψj| 1 r12 |ψjψii . (2.1.17)

Confronting this expression with EHF, we get

EHF= X i i− 1 2 (occ) X i6=j hψiψj| 1 r12 |ψiψji − hψiψj| 1 r12 |ψjψii . (2.1.18)

The difference between the sum of the orbital energies and the Hartree-Fock energy is due to the difference in the factors 1/2 in the definitions of i and EHF.

The Hartree-Fock method does not consider the correlation energy Ecorr_{, that}

is defined as the difference between EHF and the theoretical exact ground state

energy Eexac

0 . Due to the fact that Hartree-Fock is a variational method, the

correlation energy will always be positive.

There is a very simple interpretation of the i, given by Koopmans’ theorem

[Koopmans, 1934]. This theorem is obtained by considering the difference between the Hartree-Fock energy for N particles and the Hartree-Fock energy for N − 1 particles (in which we have only removed an electron from the orbital ψN): this

difference can be interpreted as the opposite of the ionization energy. It is simply obtained by computing

(41)

2.2. BEYOND HARTREE-FOCK 35 = hψN| h |ψNi + occ X j hψNψj| 1 r12 |ψNψji − hψNψj| 1 r12 |ψjψNi = N.

We then see that we can interpret the i as opposites of ionization energies: as the

i are all negative, the ionization energy always turns out to be positive.

2.2 Beyond Hartree-Fock

The Hartree-Fock method builds an antisymmetric wave function out of atomic orbitals. As stated, this is a first approximation to the ground state of the system. Its main limitation is in the fact that it uses a single determinant state, so it has low variational adaptability (relying on a restrictive ansatz). We have then to ex-pand the Hartree-Fock method, to consider more general forms of the ground state wave function. In this section, we will explore how to go beyond the Hartree-Fock approximation by describing some methods that are based on the Hartree-Fock ground state. The main goal of those methods is to compute the correlation en-ergy Ecorr_{, that is defined as the difference between the Hartree-Fock estimate E}

HF

an the energy of the ground state of the Hamiltonian (2.1.3), to obtain a better approximation of the true ground state wave function (and its associated energy).

2.2.1 Configuration interaction method

Excited determinants

A convenient way of expanding the space of wave functions is to consider the Hartree-Fock ground state as a reference for building other determinantal states that can be taken to represent approximate excited states of the system and can be used in linear combination with the Hartree-Fock ground state for a more accurate description. Intuitively, we are performing excitations of the Hartree-Fock ground state, and cutting away eventual determinants that involve orbitals of high energy (that should have a low weight in the ground state wave function). We can write the singly excited state like the state in which one electron is moved from his original state to an identified excited one. As an example, for an N electrons system (we remember that we will always consider N even) in which

(42)

we consider 2m molecular spin orbitals (built as in the restricted Hartree-Fock method, choosing m molecular orbitals and multiplying them by spin up or down wave functions) we order the solutions of the Fock equation in order of growing eigenvalues, using indexes a, b, ... to label the first N eigenstates and indexes s, t, ... to label additional eigenstates of higher energy, with the indexes going from N + 1 to 2m. Consider the standard determinant

|ψi = √1 N !

ˆ

A |ψ1...ψa...ψb...ψNi , (2.2.1)

where ψi are some spin orbitals. We can define another determinant through a

single substitution (single move) |ψr_ai = √1

N ! ˆ

A |ψ1...ψr...ψb...ψNi , (2.2.2)

and we can also consider the moving of two electrons (double move): as an example, if we substitute one from a to r and the other from b to s

|ψrs_abi = √1 N !

ˆ

A |ψ1...ψr...ψs...ψNi , (2.2.3)

and so on. We illustrate the substitution process graphically in Figure 2.1. At the end, for complex coefficients crs...

ab..., we obtain the configuration

interac-tion wave funcinterac-tion

|ψCIi = c0|ψHFi + X r,a cr_a|ψ_ari +X a<b X r<s crs_ab|ψrs_abi + ... (2.2.4) where the sums are limited upon the orbitals chosen for the basis set: in practical calculations, we choose a limited number of orbitals for substitution. In particular we have that the number of possible determinants in the sum (2.2.4) is given by the number of possible ways to pick N objects from 2m objects, without repetition and regardless of ordering, namely

Ndet=

2m N

. (2.2.5)

The exact energies (for an infinite set) of the ground and excited states of the system are the eigenvalues of the hamiltonian matrix hψi| H |ψji formed from the

set {|ψii} = {|ψHFi , |ψrai , |ψabrsi , ...} (this set is a complete set when we consider

(43)

2.2. BEYOND HARTREE-FOCK 37 ψ1 ψa ψb ψN ψN +1 ψr ψs ψ2m ψ1 ψa ψb ψN ψN +1 ψr ψs ψ2m

Figure 2.1: The substitution process to obtain different determinants. In both figures, at the bottom we have the orbitals forming the Hartree-Fock ground state, and at the top (separated by the thick line) the additional orbitals that we can substitute. The arrows start on the orbital to insert and ends on the orbital to remove. On the left hand side, there is the single move of a single orbital ψa→ ψr. On the right hand side, we have the double move in which we substitute

ψa→ ψr, ψb → ψs.

Matrix elements between determinants

The procedure is now (theoretically) straightforward: having a complete set {|ψii}, we compute the Hamiltonian matrix explicitly and perform the

diagonal-ization of the matrix. This procedure would give us an exact result in the limit of m → ∞ and considering all possible substitutions, but in practical calculations we will always truncate our basis.

There are some useful general tricks that we can use to reduce the number of calculated integrals. Those tricks involve taking the matrix elements between different determinants. We will discover that, in this basis, the Hamiltonian matrix will be sparse.

(44)

Averages of the Hamiltonian H between determinants is done in the same way as the average on the ground state: it is sufficient to substitute the sums over the lowest eigenvectors of the Fock equation with the sum over all molecular orbitals that are in the state over which H is to be averaged. Following notations of (2.1.6), the matrix element of G1 between a determinant and the same determinant with

one substitution is given by

hψr_i| G1|ψHFi = hψr| h |ψii . (2.2.6)

Here we have shown the matrix element between the ground state and the singly substituted state, but the same result holds for determinants that differ only by one substitution, with the appropriate changes (including, possibly, a sign difference given by the need to reorganize the spin orbitals, if needed). Averages of G1

between determinants with more than one substitution vanish. Regarding G2, the

matrix element between determinants with only one substitution is hψr_i| G2|ψHFi = X j hψrψj| 1 r12 |ψiψji − hψrψj| 1 r12 |ψjψii . (2.2.7)

As always, the result is easily generalized to arbitrary determinants differing only by two spin orbitals. To conclude this part about the computation of matrix elements, we take some remarks from [Szabo and Ostlund, 1996]:

1. The first important result is Brillouin’s theorem: the matrix element of the Hamiltonian between the Hartree-Fock ground state and a singly substituted state is zero. This is a non trivial result.

2. Matrix elements between determinants that differ by more than two spin orbitals automatically vanish. All matrix elements between the ground state and a triply substituted state vanish automatically, and we get many zeroes also in higher matrix elements. As an example, the matrix element between ψabc

ijk and ψldcan be non zero only if one of the indexes between i, j, k is equal

(45)

2.2. BEYOND HARTREE-FOCK 39 3. Even if some states do not mix with the ground state directly, they can mix with states that then mix with the ground state. As an example, single excitations are not as important as double excitations. The situation changes with excited states, but in our work we will only consider the ground state. This method is conceptually very simple, as it just consists in a clever choice of basis that will make the Hamiltonian matrix sparse.

In literature [Cramer, 2005], writing a wave function as a linear combination of determinant states that obey some symmetry relations depending on the sys-tem (that we will explore in next chapters) is called Multi Configurational Self Consistent Field (MCSCF) method. This method is very useful when consider-ing situations in which the ground state is quasi-degenerate, or bond breakconsider-ing situations.

In this method, it is very important to choose the orbitals that can be occupied by the electrons in such a way that computations speed up: it is unimportant to include core orbitals. Some orbitals can be ruled out using symmetries of the system, so it is important to recognize such orbitals and not include them in the calculation. On the other side, to gain in flexibility and accuracy in the result, it is also important to recognize which orbitals are important to include in the calculation. In the ideal case, we should allow the possibility of every possible combination of distributing N electrons in 2m orbitals (Complete Active Space self consistent field method, CASSCF). The computational cost can be pretty high, as the possible determinants that we can build are [Cramer, 2005]

(2m)!(2m + 1)! N 2! N 2 + 1! 2m − N 2! 2m − N 2 + 1! . (2.2.9)

This method often has very high computational cost, so in practical applications the number of possible determinants has to be cut down (at the cost of accuracy in results).

2.2.2 Density functional theory method

Another method that is aimed to compute the correlation energy is Den-sity Functional Theory, DFT. Strictly speaking, DFT is not a post Hartree-Fock

(46)

method, as it is not based on the HF scheme, being based on a different approach. The power of this method is to replace the wave function of the system (that de-pends on 3N spatial coordinates for N electrons and the spin coordinates) by a density function n(r), only depending on r. This density is normalized according to R n(r) dr = N: the energy of the system and all its molecular properties are then computed as functionals of the density n. We assume that the non-degenerate ground state properties are described only by electron density. For describing the DFT method, we follow [Parr and Weitao, 1994].

HF method includes the exchange effect but neglects the electronic correla-tion. Correlation energy is a very small fraction of the total energy, but it is very important in getting accurate properties of the ground state. Using post-HF methods we have a linear combination of determinants but their convergence is poor and the number of determinants increases rapidly with the size of the sys-tem. DFT theory is based on the idea to minimize the ground state energy of the system with respect to a density functional of the electrons, i.e. E(n(r)). The main idea is to introduce an auxiliary non interacting system that has the same electron density of the real one.

As we have seen in equation (2.1.3), we can write the electronic energy as the sum of a kinetic electrons energy (T ), a one body (V ) and two body (U ) energy terms and a constant (the nuclear repulsion energy, that in the Born-Oppenheimer approximation we neglect), i.e.

E[n] = T [n] + V [n] + U [n]. (2.2.10)

We do not specify for now the functional forms of T, V, U : deciding how to describe them is an integral part of the particular DFT approach that we take.

A result due to Hohenberg and Kohn [Hohenberg and Kohn, 1964] states that we can describe the properties of the ground state of a certain system uniquely by giving a density function n(r), in terms of which we can write the energy. This result is written in terms of two fundamental theorems, that we now state:

• The first theorem is a general theorem regarding a system of electrons moving in an external potential: it states that the external potential is determined by the density, up to a constant. This means that different potentials (not simply differing by a constant) give different ground state densities.

(47)

2.2. BEYOND HARTREE-FOCK 41 • The second theorem states that the ground state energy correspond to the minimum value of E[n(r)] with respect to the electronic density that defines it: we have E[n0_{(r)] ≥ E}

0, where E0 is the true ground state energy, and

equality is reached if and only if n0 _{is equal to the ground state density.}

The density is defined in terms of the wave function of the system by n(r) = N

Z

d3r2...d3rNψ∗(r, r2, ..., rN)ψ(r, r2, ..., rN). (2.2.11)

In particular, if we consider the ground state

ψ0 = ψ0[n0], (2.2.12)

we have that the expectation value of an observable O is

hOi[n0] = hψ0[n0]| O |ψ0[n0]i . (2.2.13)

In particular we want to minimize the ground state energy

E0 = E[n0] = hψ0[n0]| T + V + U |ψ0[n0]i . (2.2.14)

For minimizing this energy

E[n0] = T [n0] + V [n0] + U [n0], (2.2.15)

we use the usual variational approach. We denote V [n] =

Z

n(r)v(r)dr, F [n] = T [n] + U [n], (2.2.16) where v(r) is the external potential, defined as the potential acting on the electrons that does not include the many body electron-electron interaction (in the case of a molecule, v is the Coulomb attraction to nuclei) and F is often denoted as the Hohenberg-Kohn functional. The minimization equation is

µ = v(r) +δF [n]

δn(r), (2.2.17)

and the chemical potential µ is used to enforce the constraint Z

JAGP and JLGVB: two ansatzes for the study of the electronic wave function of strongly correlated systems

Universit`

a degli Studi di Pisa

JAGP and JLGVB: two ansatzes for the

study of the electronic wave function

of strongly correlated systems

Candidate

Antonella Meninno

Supervisors

Claudio Amovilli

Sandro Sorella

Contents

Introduction

Chapter 1

Stochastic processes and

Quantum Monte Carlo methods

1.1

Stochastic processes

1.1.1

Direct sampling and evaluation of integrals

1.1.2

Markov chains

1.1.3

Converging to a solution: the Metropolis algorithm

1.2

Quantum Monte Carlo methods

1.2.1

Variational Monte Carlo

1.2.2

Fixed-node diffusion Monte Carlo

1.3

Estimation of errors

Chapter 2

Post Hartree-Fock methods

2.1

Hartree-Fock method

2.1.1

The Hartree-Fock equation

2.1.2

Interpretation of the solutions: Koopman’s theorem

2.2

Beyond Hartree-Fock

2.2.1

Configuration interaction method

2.2.2

Density functional theory method