• Non ci sono risultati.

Multiscale simulations of beta2-Microglobulin


Academic year: 2021

Condividi "Multiscale simulations of beta2-Microglobulin"


Testo completo



a degli Studi di Pisa


Corso di Laurea Magistrale in Fisica

Tesi di laurea magistrale

Multiscale modeling of β




Anna Bochicchio


Dott. Valentina Tozzini


C O N T E N T S Introduction vi I S Y S T E M A N D M E T H O D S 1 1 P R O T E I N S: S T R U C T U R E, F U N C T I O N A N D T H E I R R E L A -T I O N S H I P 2 1.1 Proteins, fundamentals 2

1.1.1 Basic biochemistry of proteins 2

1.1.2 Conformational Hierarchy 4

1.2 Protein folding, misfolding and aggregation 8

1.3 β2-Microglobulin 9

1.3.1 Structural Aspects 10

1.3.2 The role of β2m in amyloid disease 11

1.3.3 Role of the cis-trans isomerization 12

1.3.4 Prion like conversion 16

1.3.5 Summary 18

2 T H E O R Y A N D M E T H O D S 20

2.1 Molecular Dynamics 21

2.1.1 Atomistic simulations of β2m 24

2.2 Coarse grained models 25

2.2.1 One bead models 25

2.2.2 Models with two or more beads 29

2.2.3 Simulation methodologies 29

2.3 Methods for the analysis of the MD simulations 34

2.3.1 Cross-correlation matrices 35

2.3.2 Principal Component Analysis 35

2.4 Enhanced sampling techniques 37

2.4.1 Metastabilty and dimensional reduction 38

2.4.2 Biased potentials method 39

2.4.3 Metadynamics 39

II R E S U LT S A N D D I S C U S S I O N 44

3 E Q U I L I B R I U M D Y N A M I C S S I M U L AT I O N S 45

3.1 Introduction 45

3.2 Choice of the reference structures 47

3.3 Atomistic simulations of state A and B 50

3.4 Coarse grained simulations 52

3.4.1 Minimalist force fields for reference states 52


3.4.2 Parameters of the coarse grained force fields 54

3.4.3 Analysis of equilibrium properties 61

3.4.4 Dynamical Correlation 63

3.5 Summary 64

4 C O A R S E G R A I N E D M I N I M A L I S T B I-S TA B L E M O D E L 66

4.1 Bi-stable force field 67

4.1.1 Testing the implementation 68

4.2 Free energy landscape 70

4.2.1 Principal Component Analysis 70

4.2.2 Well-tempered metadynamics 75

4.3 Transition simulation with the coarse grained force field 76

4.4 Summary of this chapter 81

5 C O N C L U S I O N S 85 A A M I N O A C I D S 89

B A L G O R I T H M I C D E TA I L S 92

B.1 Constraints 92

B.2 Periodic boundary conditions 93

B.3 Neighbours List 94 B.4 Nosè-Hoover thermostat 94 C D L_P O LY 97 D P L U M E D 101 D.1 Well-tempered metadynamics 101 D.2 Typical output 102 B I B L I O G R A P H Y 103 v



This Thesis reports a classical molecular dynamics study of the β2

-Microglo-bulin, performed using two different resolutions: the atomistic one and a min-imalist coarse grained model (one bead per amino acid).

β2-Microglobulin (β2m) is a globular protein that self-associates into

insolu-ble fibrillar amyloid deposits in specific environmental conditions (presence of copper ions, low pH value, mutations and other). The systemic deposition of β2m fibrils is associated to dialysis-related amyloidosis (DRA), a disease

arising in individuals with chronic renal failure following long-term hemodial-ysis. The study of β2m aggregation and fibrillogenesis in vitro keeps

provid-ing novel clues and challengprovid-ing suggestions to understand some general fea-tures of amyloid formation and deposition mechanisms, which are at the ori-gin of numerous diseases, including Alzheimer’s and Parkinson’s. Therefore, the number of investigations on β2m aggregation has been steadily increasing

over the last years not only because of the relevance of β2m to amyloidosis,

but also because of its prototypical valence for protein folding and misfolding studies.

Computational simulations can be used as an analytical tool to study protein aggregation. They can provide insights, explanation of the aggregation mech-anism, and constitute a discovery tool to complement experimental studies. However, as a result of the breadth in time scales (from ns for the formation of early oligomers1to days, months or even years for the formation of mature fibrils) and length scales (from a nm sized protein to several hundred nm long aggregates) involved in aggregation, the computational study of this process lends itself to the use of hierarchy of models.

Different levels of resolution allow probing different elements of the ag-gregation process. Fully atomic models can provide invaluable information at a detailed level not accessible to experiment, but they are still not able to reach biologically interesting time and length scales due to the limitations in computational resources. These limitations have drawn interest of researchers toward the development of Coarse Grained models, which use simplified rep-resentations of the polypeptide chains, to reduce the degrees of freedom and, thus, to increase the spatial and temporal domains accessible to computation and simulations. In fact, through the condensation of the degrees of freedom of groups of atoms in single interactive centers (or beads), it is typically pos-sible to reduce by several orders of magnitude the degrees of freedom of the

1 An oligomer is a molecular complex that consists of a few monomer units


system depending on the level of coarse graining. The elimination of the in-ternal degrees of freedom, associated to the highest frequencies of the system, allows reaching the biologically relevant time/size scales.

However, in CG models, especially the minimalist ones, it is often difficult to reach a compromise between accuracy, predictive power and simplicity. For this reason, in this study the atomistic representation is also used, with two specific aims: complementing the CG representation in specific cases and for limited time scales, and building a set of high resolution dynamical data which are used, together with the experimental ones, to optimise the minimalist model parameterization.

A further gain is achieved using the implicit description of the solvent. This is used in the CG model, with a further span of additional orders of magnitude in computational cost.

The aggregation mechanism of β2m involves at least three stages of

pro-cess. The first level pertains to the molecular features of the mechanism and concerns the structure of the protein in response to environmental conditions. The second level addresses the evolution of the interacting molecules towards the establishment of a pathological assembly that may or not may exert a pathological role. Finally, the third level considers the deposition and the pathological role of the deposits. The present work focuses on the first stage: the deviation from the native state and the intermediate state ensemble (prone to aggregation) can substantially impact protein aggregation properties. The aim is to construct a simplified model able to describe the conformational changes leading the protein from a soluble state to an amyloid-prone one.

The thesis is organized as follows:

C H A P T E R 1 presents the fundamental biological basis related to the prob-lem of protein folding and misfolding. It starts with a comprehensive description of the properties of amino acids and proteins. Then the several experimental structures of β2m are described in detail,

focus-ing on the structural properties that would favour the transition of the native state of the protein toward a state more prone to aggregation. Fi-nally, the reasons are indicated for the choice of the experimental struc-tures considered to parameterize the force field of the CG model and on which the simulations were conducted.

C H A P T E R 2 illustrates the main concepts of classical molecular dynamics

and of the empirical force fields, concepts that are applied to both atom-istic simulations and to simulations using with CG models. Afterwards, the characteristics of the two approaches (all atom and CG) and of the different coarse grained models available in literature are discussed and classified according to the parameterization strategies adopted and to


the level of coarsening. The level of coarse graining chosen for the model here developed is of a single interactive center per amino acid. The one bead level is the the highest level of simplification that still allows for a description of the secondary structure of the protein. It also corresponds to the level of accuracy of low-resolution experimen-tal techniques such as cryo-electron microscopy, allowing a direct com-parison between theory and experiment. These elements combine to make the force field developed of easy implementation and the interpre-tation of the simulations results is more immediate. On the other hand, such a high level of simplification is paid with additional difficulties in parameterization. As anticipated, these difficulties are here addressed combining experimental data and atomistic simulations in the parame-terization of the minimalist model.

C H A P T E R 3 presents the first set of original results of this work: Atomistic molecular dynamics simulations of the pre-fibrillar and fibrillogenic states, which constitutes the high resolution stage of the multi-scale ap-proach, and the parameterization and optimization of the CG force field, suited for exploring the equilibrium dynamics of the two states. Starting from an existing parameterization, based on the statistical distributions of the internal variables of a set of proteins, individual potential have been optimized on the basis of the atomistic simulations. Consequently, the obtained CG model is fully compatible with the atomistic represen-tation and constitutes the low resolution stage of the Multi-Scale simu-lation.

C H A P T E R 4 illustrates the second set of original result of this Thesis: the building and parameterization of a bistable coarse grained force field, allowing the description of the conformational changes involved in a transition between two metastable state of the protein. For this purpose, it was necessary to implement new functional forms in the software used for the simulations. The parameterization has been optimized with the inclusion of physico-chemical and thermodynamic data. The free energy surface was then explored with advanced sampling techniques (well-tempered metadynamics), in order to get an accurate characteriza-tion of the thermodynamics of the transicharacteriza-tion.

Finally in Chapter5 we resume the results, and outline some possible ex-tensions of the present work and its applications, going in the direction of suggesting new strategies against the DRA and other amiloydosis related dis-eases.



A number of abbreviations and typographic conventions has been adopted in order to maintain consistency throughout. We summarize them here:

M H C-I class I major histocompatibility complex β2M β2microglobulin

M D molecular dynamics F F force field

A A atomistic simulations

C G coarse grained simulations R M S D root mean square deviations R M S F root mean square fluctuations

R G radius of gyration

C V collective variables F E S fee energy surface

P C A principal component analysis

W T M E TA D well-tempered metadynamics Cα alpha carbon


The term in vivo is used to describe biological phenomena reproduced in a cell culture.

The term in vitro is used to describe biological phenomena reproduced in a test tube.

The term in silico is used to indicate the chemical and biological phenom-ena of reproduced in a computational simulation.


C O N T E N T S x


All the cartoon representations of proteins have been realizide with the VMD [45] software.


Part I



P R O T E I N S : S T R U C T U R E , F U N C T I O N A N D T H E I R R E L AT I O N S H I P

Life is the mode of existence of proteins, and this mode of existence essentially consists in the constant self-renewal of the chemical constituents of these substances.

—Friedrich Engels

1.1 P R O T E I N S, F U N D A M E N TA L S

Proteins are essential parts of organisms and participate virtually in every process within cells. Many proteins are enzymes that catalyze biochemical reactions and are vital to metabolism. Proteins also have structural or me-chanical functions, such as actin and myosin in muscles that are in charge of motion of cells and organisms. Other proteins are important for transport-ing materials, cell signaltransport-ing, immune response, and several other functions. Proteins are the main building blocks of life.

1.1.1 Basic biochemistry of proteins

A protein is a polymer chain of amino acids whose sequence is defined in a gene: three subsequent nucleotides1 in the DNA sequence specify one out of the 20 natural amino acids. All amino acids possess common structural features: they have an α-carbon (Cα) to which an amino group (NH+3), a

carboxyl group (COO−), a hydrogen atom and a variable side chain (R) are attached (Figure: 1). The simplest amino acid is glycine, were R is a hy-drogen atom. The tetrahedral arrangement of Cα makes possible two mirror

images for the molecule, but only "left-handed" isomer (L-isomer) is a con-stituent of proteins on earth. The side chains of the standard amino acids, have a great variety of chemical structures and properties (see Appendix A). The

1 Nucleotides are the main building block of nucleic acids (DNA or RNA), composed by a nitrogenous base, a five-carbon sugar and a phosphate group.


1.1 P R O T E I N S, F U N D A M E N TA L S 3

Figure 1.: A skeletal model of the generalized amino acid showing tha amino (blue) carboxyl (red) and R groups attached to a central α carbon. combined effect of all the amino acids in the sequence and their interactions with the environment determines the stable structure of the protein (or native state). In fact, each amino acid has a unique combination of properties— size, polarity, cyclic constituents, sulfur constituents, etc.— that critically af-fects the interactions forming and stabilizing the three-dimensional structure of proteins. These interactions originate from electrostatic, van der Waals, hydrophobic, and hydrogen bonding forces, in addition to specific covalent (disulphide bonds) and ionic bonds (salt bridges).

A polypeptide is formed when the amino acids are linked together by pep-tide bonds between the carboxyl and amino groups of adjacent residues, caus-ing the release of a molecule of water (H2O) and the formation of a

polypep-tide (Figure: 2). The process is a dehydratation synthesis reaction (also known as a condensation reaction). The geometry of the peptide group is planar and

Figure 2.: Formation of a dipeptide by joining two amino acids. rigid while there is a strong flexibility about each of the single bonds along


1.1 P R O T E I N S, F U N D A M E N TA L S 4

the backbone, {NCα} and {Cα− C} = O. The two dihedral angles φ and ψ

are used to define rotations about the bond between the nitrogen and Cαof the

main-chain and between Cα and the carbonyl group, respectively (Figure3).

The dihedral angle ω defines the rotation about the peptide bond, specifically

Figure 3.: Rotational flexibility in polypeptides: definition of φ,ψ and ω di-hedral angles.

for the sequence C1α−{C − N} − C2α, where C1and C2are the α-carbons of two adjacent amino acids. Because of the partial double-bond character of the peptide bond and the steric interactions between adjacent side-chains, ω is typically (99.95% probability) in the trans configuration, i.e. ω = 180◦. In this orientation, all four atoms lie in the same plane. However, one exception is found in peptide bonds where the following residue is proline. Proline, un-usual in having a cyclic side chain that bonds to the backbone amide nitrogen, has less repulsion between side chain atoms. This leads to an increase in the relative stability of the cis peptide bond when compared with the trans state. The possible combinations of the φ and ψ angles are limited due to steric hindrance. That is, only certain combinations are typically observed, with some dependence on residue size and shape. Thus in a plot of φ versus ψ (Ramachandran plot), there are regions which are sterically forbidden, there are fully allowed regions with no steric hindrance, and there are unfavourable regions which can be assumed by slight bending of bonds.

1.1.2 Conformational Hierarchy

Every protein is defined by a unique sequence of residues and all subse-quent levels of organization (secondary, super secondary, tertiary and quater-nary) rely on this primary level of structure. Some proteins are related to one another leading to several degree of similarity in primary sequence: the


1.1 P R O T E I N S, F U N D A M E N TA L S 5

great variety of proteins that can be observed today has arisen from much smaller number of ancestors during evolution. However, amino acids linked together in a flat "two-dimensional" representation fail to convey the three-dimensional arrangement of proteins. It is the formation of regular secondary structure into complicated patterns of protein folding that leads to the char-acteristic functional properties of proteins. Hence, primary structure leads to secondary structure, the regularly repeating structures stabilized by hydrogen bonds. In globular proteins the three basic units of secondary structure are the α helix, the β strand and turns; all other structures represent variations of one of these basic themes. The right-handed α helix (Figure 4) is the most popular and the identifiable unit of secondary structure: a hydrogen bond-ing network connects each backbone carbonyl (C = O) oxygen of residue i to the backbone hydrogen of the NH group of residue i + 4 (see Figure 4). This hydrogen bonding provides substantial stabilization energy. The regular

Figure 4.: An α helix. Only heavy atoms (C, N and O) are shown and the side chains are omitted for clarity.

spiral network of the α helix is ubiquitous in proteins. It is associated with a φ,ψ pair of about {−60◦, −50◦}. The resulting helix has 3.6 residues per turn and a translation distance per residue of 1.5 Å. Hydrogen bonds have directionality that reflects the intrinsic polarizationof the hydrogen bond due to electronegative oxygen atom. In a similar fashion the peptide bond also has polarity and the combined effect of these two parameters give α helices pro-nunced dipole moments: on avarage the amino end of the α helix is positive whilst the carboxyl end is negative. There are more common variants of the α helix motif that are typically not stable in solution but can play a part in overall protein structure. These include the tighter 310 helix of three residues

per turn that involves hydrogen bonds between the residues i and i + 3 (there are 10 atoms within the hydrogen bond, hence the nomenclature 310). The

more loosely coiled π helix has hydrogen bonds between residues i and i + 5 of the polypeptide.


1.1 P R O T E I N S, F U N D A M E N TA L S 6

Another common motif is a β sheet (Figure5). These sheet regions form by

Figure 5.: A β sheet. Only heavy atoms (C in cyan, N in blue and O in red) are shown and the side chains are omitted for clarity.

aggregating amino acid strands, called β strands, via hydrogen bonds. Typical lengths of β strands are 5 − 10 residues. The aggregation can occur in a parallel or antiparallel orientation of the strands each with a distinct hydrogen bonding pattern. The hydrogen bond cross- link between strands –alternating C = O· · · H − N and N − H · · · O = C– is such that the sheet has a pleated appearance. Thus, in comparison to α helices, β sheets require connectivity interactions that are much longer in range. For parallel β sheets, φ ≈ −120◦ and ψ ≈ +115◦, for anti-parallel one, φ ≈ −140◦and ψ ≈ +135◦.

Other common structural motifs are turns and loops. Turns occur in re-gions of sharp reversal of orientation, such as the junction of two anti-parallel β strands. Loops occur often in short (five residues or less) regions connect-ing various motifs, in particular loop regions that connect two adjacent anti-parallel β strands are known as hairpin loops. The majority of turns and loops lies on the protein surface since they are often hydrophilic. They are impor-tant elements that allow, and possibly drive, protein compaction and often participate in interactions between proteins and other molecules. Since pro-tein core regions are more stable than short connective elements of helices and strands, evolutionary differences among homologous (proteins derived from a common “ancestor”) are often localized to loop and turn regions.

The secondary structural elements described above often combine into sim-ple motifs that occur frequently in protein structures. Such motifs (or folds) are also called supersecondary structure. Supersecondary and tertiary struc-turecan be described by the specific topological arrangement of the secondary structural motifs. Proteins can be monomeric or multimeric, with subunits that fold in a dependent or independent manner with respect to other domain.


1.1 P R O T E I N S, F U N D A M E N TA L S 7

The different polypeptide domains can be connected by disulphide bonds, hy-drogen bonds, or weaker van der Waals interactions. Tertiary structure is also affected by the environment. Hydrogen bonding with solvent water molecules can stabilize the native conformation, and the salt concentration can affect the compact arrangement of the folded chain.

X-ray crystallography and multi-dimensional NMR spectroscopy (and re-cently also Cryo Electron Microscopy ) yield detailed pictures of protein struc-ture at an atomic level. The 3-D structural data are collected in a database, the Protein Data Bank (PDB), which now contains thousand of resolved struc-tures.

Based on the known protein structures at atomic resolution, some major classes can be used to describe the arrangement in space of the various do-mains of polypeptides:

• α-proteins– proteins which form compact aggregates by packing mainly α-helices;

• β-proteins– proteins which pack together mainly β-sheets, with adja-cent strands linked by turns and loops and various hydrogen bonding networks formed among the individual strands, often resulting in lay-ered or barrel structures;

• α/β-proteins– proteins that are folded with alternating α helices and β strands, often forming layered or barrel-like structures.

Proteins in the β-class display a flexible and rich array of folds. Various connectivity topologies can exist within networks of parallel, antiparallel, or mixed β-sheets that twist, coil and bend in different ways. As an example, a β sandwich forms normally via interaction of strands at an angle and connected to each other via short loops. The layers of the sandwich can be aligned with respect to each other or arranged orthogonally.

Finally, quaternary structures describe the structure of multiple polypeptide chains, each independently folded,possibly interacting with other molecules (nucleic acids, lipids, ions, etc.). The interactions are stabilized, again, by hydrogen bonds, salt bridges, and various other complex intermolecular and intramolecular associations in space.

Once again, it should be noted that discovering the tertiary structure of a protein, or the quaternary structure of its complexes, can provide important clues about how the protein performs its function, with which other proteins it interacts, and information about the biological mechanisms in which it is involved.


1.2 P R O T E I N F O L D I N G, M I S F O L D I N G A N D A G G R E G AT I O N 8

1.2 P R O T E I N F O L D I N G, M I S F O L D I N G A N D A G G R E G AT I O N

The search for protein folding pathways and the principles that guide them has proven to be one of the most difficult problems in all of structural biology. Protein folding is in itself not a trivial task since the number of possible in-teractions between each amino acid side chains far exceeds the total number of protein molecules within the cell and establishing the correct interactions is vital if the protein is to fold correctly. In addition, the protein must fold within the crowded environment of the cell and so the chance to make inap-propriate contacts with other protein is very high. The driving forces that pushes to attain its lowest free energy state (i.e. its native conformation in the majority of cases) ensures the most proteins fold spontaneously and rapidly (in the order of micro- to milliseconds) and, more often than not, folding occurs without problems.There is strong evidence that the native state of a protein corresponds, except in very rare circumstances, to the structure that is the most stable under physiological conditions. Sometimes, however, subtle changes in the balance of forces lead to misfolded and aggregated proteins. Surprisingly, small changes may produce remarkably different outcomes. The failure of a protein to fold correctly can have serious consequences. It is now recognized that protein misfolding lies at the origin of a number of debilitating diseases. In fact, during the past decade or two the role of pro-tein misfolding and aggregation in at least 40 diseases has been recognized. The case can be made for a causative role of protein aggregation diseases including Alzheimer’s disease (β-amyloid peptide), Parkinson’s disease (α-synuclein), amyotropic lateral sclerosis (superoxide dismutase), Huntington’s disease (hungtintin with expanded polyglutamine), spongiform encepahlopa-thies (prion proteins), and dialysis related amyloidoses (β2-Microglobulin).

Although there are no sequence similarities among these proteins, the diseases share a common characteristic: these proteins aggregate into insoluble, usu-ally fibrillar, β-sheet-rich deposits, commonly referred to as "amyloid". The factors that trigger aggregation and the role of aggregation in cellular degen-eration are still not understood.

It was widely assumed until recently that the ability to form amyloid fib-rils was limited to a relatively small number of proteins, largely those seen in disease states, and that these proteins must possess some specific sequence motifs encoding this apparently aberrant structure. Studies have now shown, however, that the ability of polypeptide chains to form such structures is com-mon, and indeed may be considered a generic feature of polypeptide chains [18]. In particular, it has been shown that fibrils can be formed by many pro-teins that are not associated with disease once they are placed under conditions


1.3 β2-M I C R O G L O B U L I N 9

that destabilize the native structures, including such well-known proteins as myoglobin [20] .

One of the primary features of a generic model of amyloid formation is that the ability to form fibrils is common but the relative propensities to form fibrils vary substantially with the polypeptide sequence [86]. There is consid-erable evidence supporting this assumption. Indeed, the mutation of single amino acids in a protein can change the rate at which aggregation occurs from its denatured state by an order of magnitude or more. Furthermore, the change in aggregation rate caused by such mutations can be correlated with the pre-dicted changes in properties such as charge, secondary structure propensity, and hydrophobicity. It is well established that aggregation, like crystalliza-tion or indeed protein folding, is a nucleated process. The ability to under-stand some features of the aggregation process that results in amyloid fibrils, has prompted investigation of the mechanism by which they are assembled from the precursor species and has been shown that conditions favouring for-mation of amyloid fibrils are the same stimulating at least partial unfolding, for example low pH or elevated temperature. It appears from studies carried out so far that there are many common features in the mechanism of formation of amyloid fibrils by different peptides and proteins . The first phase of the aggregation process involves the formation of oligomeric species as a result of relatively nonspecific interactions, although in some cases specific structural transitions may be involved if such processes and increase the rate of aggrega-tion. These early prefibrillar aggregates then appear to transform into species with more distinctive morphologies, sometimes described as protofibrils or protofilaments. These latter structures are commonly short, thin, sometimes curly, fibrillar species that are thought, in some cases at least, to self-assemble into mature fibrils, perhaps by lateral association accompanied by some de-gree of structural reorganization.

In the next sections, the pathological aggregation of β2microglobulin (β2m)

is examined starting from the relevance of some structural aspects of the pro-tein.

1.3 β2-M I C R O G L O B U L I N

The study of β2-Microglobulin (β2m) aggregation and fibrillogenesis in

vitro provides interesting insights to understand some general features of the amyloid formation. The number of investigations on β2m aggregation has

been steadily increasing over the last years also because of its valence as model study for protein folding and misfolding researches.

Here the importance of conformational dynamics for the initiation and de-velopment of β2m amyloid formation starting from the natively folded state


1.3 β2-M I C R O G L O B U L I N 10

will be highlighted . Detailed analysis of the folding, stability and amyloido-genicity of a number of different proteins have revealed that a polypeptide chain can adopt a diversity of structures spanning a multidimensional energy landscape, the thermodynamics and kinetics of which are dependent on the protein sequence and environmental condition.

1.3.1 Structural Aspects

β2microglobulin is the non-covalently bound light chain of the major his-tocompatibility complex class I (MHC-I), an essential molecular complex for the adaptive immune system, wherein the protein plays an essential role in chaperoning assembly of the complex for antigen presentation (Fig6) [46].

Figure 6.: a) Cartoon representation of human MHC I showing the heavy chain (α1,α2,α3 in red) and the light chain (β2m in blue).

High-lighted are the residues Pro5,Pro14,Pro32,Pro72 and Pro90 (in green sticks, spheres) and the disulphide bond between residues Cys25 and Cys80 (in yellow sticks). b) Cartoon representation of the structure of monomeric native wild-type β2m in solution (PDB

code 2XKS [32]) showing β-strands A (6-11), B(21-28), C(36-41), C’(44-45), D(50-51), E(64-70), F(79-83) and G(91-94). High-lighted are residues Pro5, Pro14, Pro32, Pro72 and Pro90 (in stocks and spheres) and the disulphide bond between residue Cys25 and Cys80 (in sticks). N, N-terminus. C, C-terminus.

Wild-type β2m contains 99 amino acids and has a classical β-sandwich fold

comprising seven anti-parallel β-strands stabilized by a single inter-strand disulphide bridge between β-strands B and F (Fig6). The structures of monomeric native β2m from humans and several of its variants have been solved at high


1.3 β2-M I C R O G L O B U L I N 11

resolution by solution NMR [32] and X-Ray crystallography [44]. β2m

con-tains five proline, one of which, Pro32, forms a thermodinamically unfavoured cis peptide bond in the native state (Fig 6). Another interesting feature of monomeric native β2m is the conformational dynamics of the D-strand and

the loop that connects the D- and E- strands (DE-loop) (Fig7). This region forms contacts with the MHC-I heavy chain, but shows dynamics on a mi-crosecond to millisecond time-scale when a monomer is in solution (Fig7a) and variability in different crystal structures (Fig7b). This rationalizes hydrogen-deuterium exchange studies on monomeric β2m showing that the DE-loop

re-gion exhibits enhanced backbone dynamics compared with the MHC-I bound state. Notably, a link between the dynamic properties of monomeric native β2m, particularly in the D-strand and in the DE-loop region, and its potential

to assemble into amyloid fibrils has been proposed.

Figure 7.: a) Structures displaying a β-bulge and an attached AB-loop: wild-typeβ2m (PDB code 1JNJ) in red, H31Y (PDB code 1PY4) in

green, W60G (PDB code 2V85) in blue. b) Structure displaying a straight β-strand D: wild type β2m (PDB code 1LDS) in red, H13F

(PDB code 3CIQ) in green, pdb code 2XKS in blue.

1.3.2 The role of β2m in amyloid disease

As discussed previously, the precise mechanism of amyloid fibril formation in DRA is unknown, although numerous factors have been suggested to


influ-1.3 β2-M I C R O G L O B U L I N 12

ence the advent and impact of β2m amyloid in the disease. Biochemical and

biophysical investigations in vitro have been used to shed light on the mecha-nism of β2m fibril formation in vivo, and also as a model system, which may

yield general insights into amyloid fibril formation, which could be applied to other pathologies. In vitro investigation has been undertaken since the dis-covery of β2m as an amyloid-forming protein in 1986 [89], and has resulted

in the discovery that in absence of denaturant, and at physiological temper-ature and pH, β2m does not spontaneously form amyloid fibrils even when

incubated for long periods of time at high protein concentration [64]. As a consequence of these findings, factors have been sought that could facilitate protein aggregation of β2m in vivo, including post transcritional

modifica-tions of full-length β2m and collision between β2m and molecules abundant

in osteoarticular tissues or encountered during dialysis. As a result, a multi-tude of factors have been shown to enhance the aggregation of β2m in vitro

and are implicated in vivo, including copper ions (Cu2+), local inflammation and pH lowering, glycosaminoglycans, collagen. Most of the research effort have been focused on the molecular mechanism of aggregation and, therefore, on the structure of the protein in response to environmental conditions. The aim was identifying a structural target that could prevent or revert aggregation by either the monomeric protein, or the oligomeric species, or the stabilized aggregate.

1.3.3 Role of thecis-trans isomerization

Seminal work by Chiti and coworkers [64] showed that wild type β2m folds

via two structurally distinct intermediates, known as I1and I2, toward the

glob-ular native state. The first intermediate along the folding reaction coordinate, I1, is populated in milliseconds. The second folding intermediate, I2 forms

within milliseconds of population of I1 and displays native-like secondary

structure. Further folding of I2occurs in one timescale of seconds to minutes,

suggesting substantial energetic barriers to the attainment of the globular na-tive fold. Building on this observations, more detailed studies of the folding and unfolding mechanism of wild-type β2m, combined with mutagenesis of

the sequence, demonstrated that the transition between the slow folding inter-mediate I2 and the native fold is rate limited by trans to cis isomerization

of the His31-Pro32 peptide bond, which led to kinetically trapped intermedi-ate IT. Consistent with these findings, folding studies of a variant of β2m in

which Pro32 is replaced with Val [82] revealed that the slow folding step is abolished, trapping β2m in a species presumably with a trans His31-Val32

peptide bond. Pro32 is highly conserved in β2m in different organisms and


1.3 β2-M I C R O G L O B U L I N 13

be responsible for the slow refolding commonly found in other immunoglob-ulin domains. Interestingly, however, the variant P32V is not able to elongate amyloid fibrillar seeds in vitro or to nucleate fibril formation, suggesting that a trans His31-X (where X represents an amino acid) peptide bond is necessary, but not sufficient, to endow β2m with its amyloidogenic properties.

To gain a more detailed understanding of the kinetic folding mechanism of β2m and the role of different partially folded species in linking the

fold-ing and aggregation energy landscapes, it has been analyzed the foldfold-ing and unfolding kinetics of β2m under an array of conditions, including analysis

of the folding mechanism of the variant P32G [80]. The authors of the study proposed a five-state model for the folding mechanism of wild-type β2m

in-volving parallel folding pathways initiated from cis or trans His31-Pro32 in the unfolded state. The five-state model (but also a later more simplified one, by [90], involving four states) suggest that IT is low but significantly

populated under physiological conditions, consistent with the poor ability of wild-type β2m to elongate fibrillar seeds at neutral pH in vitro. Replacement

of Pro32 with glycine (P32G) resulted in a simple three-state folding mecha-nism in which an intermediate, presumibly with a trans His31-Gly32 peptide bond similar to IT, accumulates during folding. Importantly, the authors were

able to demonstrate that the population of IT correlates with the rate of

fib-ril elongation in vitro, suggesting that IT is a key link between the folding

and aggregation energy landscapes for this protein. Exploration of the con-formational properties of P32G using NMR suggested large concon-formational changes involving residues in the BC- and FG-loops, the D-strand and the N-terminal region of the protein that presumably arise from the isomerization of Pro32 and subsequent partial unfolding of the protein (Fig. 8). In this frame, the isomerization of the hystidil-prolyl bond appears to be just one of the recipe ingredients. DE-loop mutations such as D59P that introduce loop strain, show a decreased folding free energy compared with the wild type pro-tein and an enhanced potential to aggregate, whereas a release of loop strain such as in W60G leads to stable variants which have reduced amyloidogenic features [58]. However, DE-loop cleavage variants have been demonstrated to be highly aggregation prone. Together these studies are indicative of a fragile and delicate amino acid network required for the stabilization of the cis iso-mer at His31-Pro32 that is required both for binding to MHC I heavy chain and to maintain a soluble native structure for the monomeric protein.

Assembly mechanism at atomic resolution

Clinical studies have shown that dialysis patients treated with Cu2+-free fil-ter membranes have reduced incidence of DRA, suggesting that Cu2+ ions may play a role in initiating or enhancing aggregation of wild-type β2m


1.3 β2-M I C R O G L O B U L I N 14

in DRA. Indeed, Cu2+ has been shown to bind to native human β2m with

moderate affinity and specificity. Binding involves coordination to the im-idazole ring of His31 [32]. Non-native states of wild-type β2m also bind

Cu2+ ions; in this case the other three histidines (His13, His51, His84) co-ordinate Cu2+. As a consequence, binding of Cu2+ ions increases the con-centration of non-native (called activated) forms of monomeric β2m, which

triggers the formation of dimeric, tetrameric and hexameric species believed on-pathway to amyloid-like fibrils [10]. Cu2+binding is required for the con-formational changes leading to this activated states and to the formation of early oligomeric species. However, once these oligomeric species and subse-quent fibrillar aggregates are formed, Cu2+ is not essential for their stability. By creating two variants, P32A and H13F, Miranker and co-workers were able to crystallize dimeric and hexameric forms of β2m (after Cu2+-induced

oligomerization). These studies revealed that dimeric P32A and hexameric H13F contain a trans His31-Ala32 and a trans His31-Pro32 peptide bond, re-spectively. Each oligomer is composed of monomers that retain a native-like fold, yet display significant alterations in the organization of aromatic side chains within the hydrophobic core, most notably Phe30, Phe62 and Trp60 (Fig: 8), which the author speculate to be important determinants of amyloid assembly. How these structure relate to the transient intermediates formed during folding or populated during aggregation, however, remain unclear. Im-portantly in this regard, P32A and H13F lack an enhanced ability to assemble into amyloid fibrils compared with wild-type β2m. Despite containing a trans

His31-X32 peptide bond, these species lack structural and/or dynamical prop-erties critical for amyloid formation.

Of particular interest is a ∆N6 variant, since this species is found as a signif-icant component in ex vivo amyloid deposits and exhibits an increased affinity for collagen compared with the wild type protein, suggesting a role for this protein in the development of DRA. Pioneering work by Esposito and col-leagues showed that ∆N6 experiences a global decrease in conformational stability compared with wild-type β2m and, using molecular dynamics

simu-lations, the authors proposed that the D-strand facilitates intermolecular inter-actions to form oligomeric assembles prior to the development of long straight amyloid fibrils at neutral pH. In addition, most recently the difficulties in de-termining the conformational properties of IT have been overcome by using

β2m truncation variant ∆N6 as a structural mimic of this species (Fig. 9

B). High resolution NMR studies directly comparing spectra of ∆N6 and IT

revealed that major species populated by ∆N6 in solution closely resembles the transient folding intermediate IT. Using ∆N6 as a structural model for

IT, full resonance assignment and structural elucidation were possible,


1.3 β2-M I C R O G L O B U L I N 15

Figure 8.: Description of the ITstate. The ribbon overlay shows one monomer

of the hexameric crystal structure of H13F (PDB vode 3CIQ, in blue) and the lowest energy structure of ∆N6 (PDB code 2XKU, in red). The residue Phe30, Pro32, Trp60, Phe62 and His84 are highlited in sticks.

β2m. The results showed that under the conditions employed ∆N6 retains a

native fold but undergoes a major re-packing of several side chians within the hydrophobic core to accomodate the non-native trans-conformation of the His31-Pro32 peptide bond (Fig: 8). The side chians involved map pre-dominantly to the same residues that undergo structural reorganization in the presence of Cu2+ ions, although the precise repacking of residues remains different in many cases. Despite adopting a thermodinamically stable native like topology, ∆N6 is a highly dynamic entity. The data suggest that increased conformational dynamics of ∆N6 correlate with an increase in amyloidogenic properties presumably by enabling the formation of one of more rarely pop-ulated conformers that have an enhanced potential to assemble into amyloid fibrils.

One of the key events in this amyloid switch is protonation of His84, which experiences a large pKa shift upon isomerization of His31-Pro32 peptide

bond (Fig. 8). The involvement of His84 in the initiation of β2m amyloid

fibril formation has been proposed previously using computational methods. Oligomeric structures which become availble after peptdyl-prolyl isomeriza-tion and exploraisomeriza-tion of conformaisomeriza-tional space upon His84 protonaisomeriza-tion have been proposed previously in association with Cu2+ binding [8,9] or by bind-ing of nanobodies [12]. Interestingly, the last two conditions result in the for-mation of oligomers that are domain swapped, as proposed hitherto for β2m

assembly under native conditions using computational methods or Cu2+ treat-ment. Whether domain swapped occurs in DRA remain to elucidate.


1.3 β2-M I C R O G L O B U L I N 16

1.3.4 Prion like conversion

Despite the finding that ∆N6 comprises∼ 26% of β2m in amyloid deposits

in patients with DRA, this species is not found in the serum of people with renal dysfunction. As a consequence of this findings, formation of ∆N6 has been proposed to occur as a post-assembly event (Fig. 9B). Most recently, however, it has been demonstrated that ∆N6 is not only able to nucleate fibril-logenesis efficiently in vitro at physiological pH as discussed above (Fig: 9B) but, as a persistent trans-Pro32 state, ∆N6 is also able to convert wild-type β2m into an aggregation-competent conformer by bimolecular collision

be-tween the two monomers (Fig: 9B) [77]. Detailed interrogation of bimolecu-lar collision between native wild-type β2m and ∆N6 using NMR revealed the

molecular mechanism by which this prion-like conversion might occur. First, ∆N6 binds specifically, but transiently, to native wild-type β2m, possibly

in-volving residues of β-strands A, B and D and the DE-loop. This interaction changes the native configuration of Pro14 within the AB-loop which is highly dynamic as indicated by molecular dynamics simulation [29, 30] and X-ray crystallography. Pro14 dynamics have been shown hitherto to be responsible for an alternative β2m conformation in which the hydrogen bonding between

β-strands A and B is severely impaired. Inter-strand hydrogen bonding be-tween those two strands (Fig. 9A, orange sticks), togheter with the correct attachment of the N-terminal hexapeptide, has been demonstrated to be cru-cial in mantaining a low concentration of IT at equilibrium [78]. Binding of

∆N6 to wild-type β2m, therefore leads to the disruption of important

interac-tions between the N-terminal hexapeptide and the BC-loop, leading to accel-erated relaxation kinetics towards the the amyloidogenic trans His31-Pro32 isomeric state (Fig. 9A, right). The truncation variant ∆N6 is thus capable of driving the innocuous native wild-type protein into aggregation-competent entities, reminiscent of the actions of prions. Such an observation rationalize the lack of circulating ∆N6 in the serum and, given the natural affinity of this species for collagen (which is enhanced relative to wild-type), explains why assembly of fibrils occurs most readily in collagen-rich joints. Rather than be-ing an innocuous post-assembly event, therefore, proteolytic cleavage of β2m

to create one or more species truncated at the N-terminus could be a key initi-ating event in DRA, enabling the formation of a species that is not only able to assemble into amyloid fibrils but can enhance fibrillogenesis of wild-type β2m. The latter is accomplished by initiating the ability of the wild-type

pro-tein to nucleate its own assembly, or by cross-seeding fibril elongation of ∆N6 seeds with wild-type monomers (Fig: 9). Identifying the proteases responsi-ble for the production of ∆N6 or using the high resolution structure of ∆N6 as


1.3 β2-M I C R O G L O B U L I N 17

Figure 9.: Prion-like conversion during amyloid formation. (A) Structures of wild-type β2m (PDB code 2XKS) and a model of IT. Above, keys

for these conformational states. Native w.t. β2m, shown above as

circle with cis His3Pro32 and the N-terminal region (residues 1-6, blue arrow). Backbone atoms of residues which establish strong hydrogen bonding between β-strands A and B in the native state are shown in sticks. Upon dissociation of the N-terminal region, the His31-Pro32 peptide bond is free to relax into trans conformation, causing further conformational changes that lead to the formation of the non-native IT conformer (shown as a circle above a model of

its structure). Protonation of His84 (which lies near to Pro32) under acidic conditions, enhances the amyloid potential of IT.

Oligomer-ization of these aggregation-prone species leads to the formation of β2m amyloid fibrils. (B) Consequences of β2m cleavage of

the N-terminal hexapeptide that generates ∆N6 as a persistent IT

state (PDB code 2XKU). Once formed ∆N6 is able to nucleate and elongate its own fibrils and also cross-seed elongation of its fibril-lar seeds with the w.t. protein leading to the development of long straight amyloid-like fibrils. Futhermore, ∆N6 can transform the native state of β2m via bimolecular collision. Figure adapted from


1.3 β2-M I C R O G L O B U L I N 18

a target for the design of small molecules able to intervene in assembly may provide new approaches for therapeutic intervention in DRA.

1.3.5 Summary

Figure 10.: Unified model of β2m amyloid formation. Native state exists as

well folded monomer with a characteristic cis conformation of Pro32. The isomerization of Pro32 may act as a trigger events to-ward aggregation, leading to oligomer formation. The path from oligomer to mature aggregates is not known.

The large amount of experimental observations analytically reported, can be very roughly summarized in the following schematic points.

Empirically, a certain number of thermodynamic states of the protein can be distinguished:

• a monomeric crystallographic state • a MHC-I bound state

• a monomeric solution state

• an activated fibrillogenic (intermediate) state • an oligomeric state (in small oligomer) • a fibrillar state


1.3 β2-M I C R O G L O B U L I N 19

While the first three states are well identified, the latest are postulated and the activated fibrillogenic might be very elusive. The different thermodynamic states should correspond to different structural states. The assignment thermo-dynamic state ⇐⇒ structure can be very hypothetical, in particular, a certain number of structures are claimed as the activated aggregation intermediate. Assignment of the oligomeric state must be taken with care, since the crystal-lographic oligomer might be different from the solution one. Finally, almost nothing is known about the fibrillar state.

Figure 10 summerizes the set of structural changes common to all β2m

aggregation pathways: under physiological conditions, β2m exixsts as stable,

well folded monomer characterized in part by conserved cis proline at residue 32. Upon exposure to a variety of amyloidogenic triggers the native structure is perturbed and amyloid formation commences. This transition involves cis-transisomerization of Pro32. In acidic solvent conditions the transition may involve rotation of Phe30 from the hydrophobic. One can postulate that all pathways converge on a state resembling the activated monomer in which broad rearrangements occur to compensate for the cavity left by movement of Pro32 (and Phe30). These rearrangements precede oligomerization which terminates in a hexamer. The path from oligomer to mature aggregates is not known, however, as the hexamer represents a closed state, aggregation requires a ring-breacking event.

The work conducted in this Thesis has the aim to give some clues, on the relevance of the trans geometry of the His31-Pro32 peptide bond and of the IT intermediate. As illustrated above, the development of the ideas on β2m

fibrillogenic transition has been essentially focused on the importance of this issues. In particular, it is repeatedly reported in literature that trans to cis isomerization step seems to be a necessary but not sufficient condition for fibrillogenesis. However, a complete interpretation of this subject is still miss-ing.

We use computational simulations, which have proven to be invaluable tools for studying biological phenomena, both as discovery instrument and to complement experimental studies. In the next Chapter, all the methods applied are described in detail.




[. . . ] it is that everything that living things do

can be understood in terms of the jiggling and wiggling of atoms. —RICHARDFEYNMAN

The conformational dynamics of protein molecules is encoded in their struc-tures and is a critical element of their function. Therefore, to realize how proteins work requires an understanding of the connection between three-dimensional structure and dynamics. Molecular dynamics (MD) simulations provide links between structure and dynamics by enabling the exploration of the conformational energy landscape accessible to protein molecules. The first molecular dynamics simulation of a protein was reported in 1977 and consisted of a 9.2 ps trajectory for small proteins in vacuum [62]. Eleven years later, a 210 ps simulation of the same protein in water was reported [49]. Thanks to the phenomenal increase in computing power since then, now it is routine to run simulations of much larger proteins that are 1.000–10.000 times as long as the original simulation (10–100 ns), in which the protein is surrounded by water. The increase in the number of studies using MD to simulate the properties of biological macro molecules has been feeded by the general availability of programs and of the computing power required for meaningful studies.

However MD simulations on very large systems may require such large com-puter resources that they cannot be easily studied by traditional all-atom meth-ods. Similarly, simulations of processes on long timescales (beyond about 1 microsecond) are prohibitively expensive. In these cases, one can sometimes tackle the problem by using reduced representations, which are also called coarse-grained models. Instead of explicitly representing every atom of the system, one uses pseudo-atoms or beads to represent groups of atoms.

This chapter introduces the methodological framework that was applied in this thesis. The results have been derived from molecular dynamics (MD) and coarse grained (CG) simulation that are briefly discussed in the following section.


2.1 M O L E C U L A R D Y N A M I C S 21

2.1 M O L E C U L A R D Y N A M I C S

Conventional atomistic MD simulations are based on three-approximations: (i) the decoupling of the motion of nuclei and electrons (Born-Oppenheimer approximation), (ii) the assumptions that the nuclei motions can be described classically, and (iii) the assumption that the electrons effect can be implicitly described by empirical forcefields [37].

(i) Born-Oppenheimer approximation

The time evolution of a molecular system is described by the time de-pendent Schr¨odinger equation

Hψ = i h∂ψ

∂t (1)

where H denotes the Hamiltonian of the system ψ the wave function, h the reduced Planck constant. The wave function ψ is a function of the positions of both the nuclei and the electrons, i.e., ψ = ψ(~R,~r, t). Here, ~

R denotes positions of the k nuclei, ~R = {~R1, . . . , ~Rk}, ~r the positions

of the m electrons, ~r = {~r1, . . . ,~rm}, and t the time. The idea behind

the Born-Oppenheimer approximation is to decouple the fast electron motions from the slow nuclei degrees of freedom. The approximation is to assume that the electronic structure adapts instantaneously to given nuclei positions. Within this framework the electrons move in a field of fixed nuclei. The wave function can be separated by the ansatz

ψ(~R,~r, t) = ψn(~R, t)ψe(~r; ~R) (2)

with a wave function ψn(~R, t) of the nuclei and the electronic wave

function ψe(~r; ~R)– the latter being only parametrically dependent on

the nuclei positions. Given fixed nuclei positions, the electronic wave function may be determined by solving a time-independent Scr¨odingher equation that contains the nuclei positions ~R only parametrically,

He(~R)ψe(~r; ~R) = Ee(~R)ψe(~r; ~R) (3)

Here, the electronic Hamiltonian He = H − Tn with the kinetic energy

of the nuclei Tnbeing subtracted from the Hamiltonian of the complete

system H. Hence, He(~R) contains only derivatives with respect the

positions of the electrons{~ri}. The eigenvalues of the equations3, Ee(~R),

are termed potential energy surface and are the potential that the nuclei "feel" upon motion. Applying eqs. 2 and3 into eq. 1 yields the time evolution of the nuclei as a time-dependent chr¨odinger equation

(Tn+ Ee(~R))ψn(~R, t) = i h

∂ψn(~R, t)


2.1 M O L E C U L A R D Y N A M I C S 22

The Born-Oppenheimer approximation holds as long as the eigenvalues of eq. 3significantly differ, i.e., the potential energy surfaces of distinct excited states do not approach each other. For molecules in the ground state this is usually the case.

(ii) Classical motion of nuclei

As a second approximation, MD simulations replace the nuclei by clas-sical particles that evolve in time according to Newton’s second law

mi∂ 2~R i ∂t2 = −∇~RiV(~R) mi~ai= ~Fi (5)

Here, midenotes the mass of atom i and V(~R) = E(0)e (~R) the potential

energy surface of the ground state of the eq. 3. This second approxi-mation can be justified by the fact that the de Broglie wavelength of the nuclei is small compared to the average distances.

(iii) Force Fields

Figure 11.: Force field bonded interactions: r governs bond stretching; θ rep-resents the bond angle; φ gives the dihedral angle; the out-of-plane angle α may be controlled by an improper dihedral ϕ

Due to the large number of electrons in the system, solving the time-independent Schr¨odinger equation is prohibitively expensive. therefore, the potential energy surface Ee(~R) is approximated by a sum of

empiri-cal expressions that yield a sufficiently accurate approximation to Ee(~R)

but are also computationally cheap to evaluate. The empirical approxi-mation to Ee(~R) is referred to as force field. Within the force field, the


2.1 M O L E C U L A R D Y N A M I C S 23

A typical expression for the potential energy reads:

Utot = Ubonds+ Uangles+ Udihedrals+ Unonbond pairs

Ubonds = X bonds 1 2kr(r − req) 2 Uangles = X angles 1 2kθ(θ − θeq) 2 Udihedrals = X dihedrals 1 2Vn[1 + cos(nφ) − γ] Unonbond pairs = atomsX i<j " qiqj 4π0rij +Aij r12ij − Bij r6ij # (6)

The first three summations are over bonds, angles, and torsions. These are illustrated in Fig.11. The torsion term can also include so-called “improper” torsions, where the four atoms defining the angle are not all connected by covalent bonds. The final sum (over pairs of atom i and j) excludes 1-2 and 1-3 interactions and often uses separate parameters for 1-4 interactions as compared with those used for atoms separated by more than three covalent bonds. It describes electrostatics that use par-tial charges qion each atom that interact via Coulomb’s law. The

com-bination of dispersion and exchange repulsion forces are represented by a Lennard-Jones 6-12 potential; this is often called the “van der Waals” term.

The parameters used in this kind of potentials are typically obtained from quantum chemical calculations and experimental data (e.g. crys-tallographic data, spectroscopic data, etc). Among the popular sets of parameters (force fields)for Md simulations of proteins we can cite for example AMBER [2], GROMOS [33], CHARMM [52] and OPLS[40]. They all use the potential function expression given above for all the atoms simulated system except for the GROMOS and CHARMM19 force fields, in whicha united atom description is used for non-polar hydrogens.

In MD simulations the description of the solvent (water for most of the biologically interesting systems) can be explicit or implicit. In the first case, solvent molecules with a full atomistic force filed description are added in the simulation box,e.g. trying to resemble the experimental density. The implicit solvent treatment is clearly a more approximated description but it is computationally much more efficient since in many practical cases the solvent constitutes the majority of atoms.


2.1 M O L E C U L A R D Y N A M I C S 24

In this Thesis we used for atomistic simulations the OPLS force field with an explicit description of the solvent given by the SPC [83] water model.

Force fields allow to compute interactions in a sufficiently cheap man-ner while describing the system in sufficient atomistic detail to inves-tigate, for example, biological macromolecules. However, for any ap-plication the validity and accuracy of the force field should be checked, for example by experimental data. It should be mentioned that biolog-ical process may include chembiolog-ical reactions, charge transfer processes, or molecules in excited electronic states. Such process cannot be de-scribed by force fields similar to eq.6. A quantum mechanical descrip-tion of (parts of) the system is required in those cases.

2.1.1 Atomistic simulations of β2m

Computational tools and especially atomistic MD simulations, are increas-ingly being applied to solve the protein aggregation problem, providing in-sight into amyloid structures and aggregation mechanisms.

Simulations of various fragments derived from β2m peptides [72],

consti-tute a large portion of the computational studies conducted so far. One of the important features from these works is the interaction of experiment and computation in studies of protein aggregation, demonstrating the power of simulations as analytical tools to obtain informations and to use it to make useful predictions.

In order to elucidate the effect of ion ligation at atomic detail, Deng et al. [68] carried out a series of MD simulations on apo- and Cu2+-β2m systems

in explicit aqueous solutions, with varying numbers of bound ions. Analysis of the MD trajectories suggests that the changes in the hydrophobic environ-ment near the copper-binding sites lower the barrier of conformational transi-tion and stabilize the more disordered conformatransi-tion. The results also indicate that the binding of Cu2+at His13 has a little effect on the conformational sta-bility, whereas the copper-binding site His31, is primarly responsible for the observed changes in the protein conformation and dynamics.

To examine the conformational response of β2m to a change in the

ambi-ent pH, Park et al.[42] simulate the dynamics of the protein at different pH values. The structure of monomeric human β2m solved at pH 5.7 shows a

different backbone arrangement compared to structures solved at a higher pH. The study shows a pH-dependent modulation of the local backbone of stand D between the straight and the bulged conformations.

More recently, Fogolari and coworkers [31], use atomistic MD simulations to address the role of low pH in triggering amyloid formation. They also


2.2 C O A R S E G R A I N E D M O D E L S 25

conduct adaptive biasing force[1] MD simulations to force cis-trans isomer-ization at Pro32 and to calculate the relative free energy in the folded and unfolded state. The native-like trans conformer, is simulated for 10 ns, de-tailing the possible link between cis-trans isomerization and conformational disorder.

Furthermore, in the same sudy, through molecular dynamics simulation of highly concentrated doxycycline (a molecule able to suppress fibril forma-tion) in the presence of β2m they provide details of the binding modes of the

drug and a rationale for its effect. 2.2 C O A R S E G R A I N E D M O D E L S

2.2.1 One bead models

As already mentioned, the key motivation for coarse grained (CG) molec-ular modeling and simulations primarily derives from the need to bridge the atomistic and mesoscopic scales. Tipically speaking, there are two to three orders of magnitude in length and time separating these regimes. At the meso-scopic scale, one sees the emergency of critically important phenomena (e.g., self-assembly in biomolecular systems).

The reduction of the amount of variables used in the description of the system brings a saving in computational cost, and the consequent possibility to sim-ulate large systems for a longer time. CG simulations can, therefore, play a crucial role in the exploration of mesoscopic phenomena and, in turn, of the behavior of real biomolecular (and materials) systems [5,75].

CG can be done at many different levels: the coarser the description, the larger the saving in computational cost. The simulation speeds up not only because the degrees of freedom are fewer, but also because the highest vibra-tional frequency of the system are eliminated, allowing s coarser sampling of the trajectory in the time domain and a more efficient and faster phase space sampling1.

However, the elimination of internal degrees of freedom implies that their ef-fect must be taken into account implicitly in the efef-fective forces acting among the explicit degrees of freedom. This task becomes harder as the level of coarse graining is made stronger: parameterizing force fields that are both accurate and transferable (that is capable of describing general dynamics of systems with different compositions and different configurations) becomes increasingly difficult.

1 The elimination of degrees of freedom tipically causes the potential energy surface of the system to be smoothed, so the simulated dynamics is unphisically accelerated


2.2 C O A R S E G R A I N E D M O D E L S 26

Different recipes were proposed to solve the related problems, and a very large variety of different CG models, differing by the level of coarse graining and by the philosophy of parameterization of the force fields, are available. We refer to bead models, i.e. models based on a united-atom representation of the amino acid, that represent different compromises between accuracy and trasferibility [5].

In the so called one bead models for protein, the whole amino acid is rep-resented by a single interaction center. There are several advantages in using one-bead models. First, they are the coarsest models for which the internal conformational variables needed to describe changes in the secondary struc-ture are still definable. This is a very important issue, because many important biological process involve a transition in the secondary structure as a trigger-ing step. Additionally, the one bead CG level corresponds to the resolution of the structural data obtained with electron cryomicroscopy.

The force field expression for a generic one bead model can be written as

U = Ub+ Uθα+ Unb (7)

that is, the sum of a pseudo bond term, a conformational term and a non-bonded interaction term. Different one bead models differ in functional form of the terms and in the strategy used for the parameterization.

The biased model

The simplest idea to fix structural parameters is to completely bias them toward a reference structure, usually experimental.

In elastic network models (ENMs)[63], the system is represented by a net-work of beads connected by elastic springs, usually one bead per amino acid centred on the Cα. The equilibrium positions are exactly equals to that of the

reference structure, the elastic constant is the same for all the springs and is a parameter to be adjust in order to reproduce the correct experimental fluctua-tions. In spite of its simplicity an ENM correctly includes the topology of the system and is able to reproduce the principal modes (i.e. the modes with the largest amplitude) that are usually the most relevant to protein function.

In the class of biased model, the G¯o model [65] was originally proposed for the simulation of protein folding. The protein is represented as a chain of one-bead amino acids whose structure is biased toward the native configurations by means of simple attractive or repulsive interactions between beads.

The extreme simplicity of the model represents indisputably an advantage for the force field parameterization and for the computational efficiency but is also a limitation as it fails in describing intermediate metastable folding states. The bias of ENM and G¯o models towards a reference configuration makes them only weakly transferable to general dynamics studies.


2.2 C O A R S E G R A I N E D M O D E L S 27

Most of the currently available one bead models are an evolution of G¯o models that includes more sophisticated potential, but still retains a partial bias towards a reference configuration. This is due to the difficulty of includ-ing, in only a few parameters, generic effects of amino acid size, geometry and conformation.

Unbiased model

The model developed by Sorenson and Head-Gordon [75], is an improve-ment towards a greater trasferibility: a priori knowledge of a reference sec-ondary structure is required for the parameterization of the angle and dihedral terms, whereas only the sequence specificity is included in non-bonded terms describing hidrophobicity/hidrophilicity.

This model has, in principle, a high degree of transferibility-predictivity, since no knowledge is necessary on the protein structure. However the predicted structures are not expected to be very accurate due to a poor description of the local structure.

The model, however, is able to discriminate the different folding kinetic and dynamics of proteins with the same native topology.

A possible way to improve the accuracy preserving the predictivity and simplicity is to statistically optimize the few parameters present in a one bead potential.This could involve considering an amino acid specific parameteriza-tion, choosing appropriate functional forms for the potential terms, and fitting them based on some objective criterion instead of using empirical rules. The statistical information included in a data set, like an experimental set of protein structures, can be extracted and used by means of the distribution func-tions of internal variables Q = (Q1, . . . , QI, . . . , QN). These are assumed to

be a good approximation of the probability distribution functions P(QI). For

any given internal variable QI, its probability function is related to the

poten-tial of mean force

W(QI) = −KT ln  P(QI) P0(Q I)  (8) where P0(QI) is the probability distribution of a reference system, usually a

system with non-interacting particles. This formula is also called Boltzmann inversion.

It should be noticed that the single variable W(QI) coincide with the potential

force field terms U(QI) only if the total energy of the system can be exactly

decomposed as the sum of uncorrelated terms, i.e.

U(Q) =X



2.2 C O A R S E G R A I N E D M O D E L S 28

where UI = Ub, Uθ, etc. and QI = ri,i+1, αi, etc. Thus,the probability

distribution of a single internal variable is P(QI)∝


dQ1, ..., dQI−1, dQI+1, ..., dQNexp

 −U(Q) KT  =exp  −U(QI) KT  (10) This condition is never exactly satisfied especially in the case of one-bead force fields, it was shown by several authors that WI(QI) is generally more

structured because of the influence of the correlated degree of freedom. Thus in general UI(QI) could be evaluated with an empirical physical-driven

ap-proach, i.e. an iterative procedure consisting in numerically evaluating WI(QI)

on a statistical set of configurations of the system, fitting the parameters of UI(QI) on it, revaluating PI(QI) and WI(QI) on a simulation performed with

UI(QI), and repeating the iteration.

The accuracy of the parameterization derived with this method is affected by the statistical set chosen. For instance its statistical relevance and its com-position, if this is a set of experimental structures, the parameters might be affected by crystal-packing artefacts and do not explore all the possible con-formations of the molecules.

An alternative method to derive a coarse-grained potential is the force-matching method. It consists in force-matching the forces of the CG model to those from all-atom simulation trajectories with a least-square fitting proce-dure. This method ensures high accuracy and the mechanical consistency of the CG model by definition, and it can clearly be used in multiscale approach. The transferibility/predictivity of potentials derived based on these method critically depend on which conformations are sampled during the all-atom simulation.

Whatever the method to obtain the effective potential is, the parameteri-zation of the nonbonded interactions for one-bead models remains the most critical issue. The reason is basically due to the fact that in the non bonded interaction potential for one-bead models several effects different in nature must be included. In particular, this term must account for short-range non-bonded interactions: that is, hydrogen bonds and Van der Waals packing ef-fects. It is important to to accurately reproduce this part, because it deter-mines the secondary structure of the protein. It is extremely difficult to pa-rameterize with an isotropic potential due to the intrinsic directionality of the hydrogen bond. Obviously, this term must includes hydrophobicity and elec-trostatics.This problem of the double-nature of the nonbonded interactions in CG models can be solved separating the term into local and non-local part with different parameterization.


Figure 1.: A skeletal model of the generalized amino acid showing tha amino (blue) carboxyl (red) and R groups attached to a central α carbon.
Figure 3.: Rotational flexibility in polypeptides: definition of φ,ψ and ω di- di-hedral angles.
Figure 4.: An α helix. Only heavy atoms (C, N and O) are shown and the side chains are omitted for clarity.
Figure 5.: A β sheet. Only heavy atoms (C in cyan, N in blue and O in red) are shown and the side chains are omitted for clarity.


Documenti correlati

study, neonatal antibiotic exposure was associated with reduced weight and height gain in boys whilst antibiotic use later in infancy and childhood was associated with increased

• Camelina Cartamo, Crambe and Flax are oilseed crops with short growth cycle, high oil content and low agronomic inputs requirements, recognized as good feedstocks for

research mainly focused on the study of the open monitoring issues of the DIs developed by the candidate’s research group (e.g. OpenAIRE), and on the design of solutions for

L'istituto prevede che, nei confronti di persona chiamata ad espiare una pena detentiva, anche congiunta a pena pecuniaria, non superiore a sei anni, o a quattro anni se relativa

§ 3 of the ACHR. Furthermore, since Mr. Montesinos was not formally notified of the charges against him – until the issuance of the order for the prosecution of the crime of

The failure of the Sgorigrad tailings dam occurred after three days of heavy rains. The basin was in “high pond” conditions and the tailings were saturated. The decant towers had

Once subjected to input patterns identi- cal to those used in whole-cell recordings, the granule cells in the model generated RI/frequency plots for the resonance parameters (sc,

sie sind nur über die Funktion Phrase auffindbar, was die Such- kombinatorik erheblich beschränkt (hierzu weiter unten in diesem Beitrag). Probeläufe mit Verben