• Non ci sono risultati.

Protocol standardization to purify Octarellin V mutants and biophysical characterization

N/A
N/A
Protected

Academic year: 2021

Condividi "Protocol standardization to purify Octarellin V mutants and biophysical characterization"

Copied!
60
0
0

Testo completo

(1)

Corso di Laurea Magistrale

in

BIOTECNOLOGIE MOLECOLARI E INDUSTRIALI

Protocol

standardization to purify Octarellin V mutants

and biophysical characterization

RELATORI CORRELATORI

Esterno Prof. Andrè Matagne Prof.ssa Vittoria Raffa

Interno Prof.ssa Del Corso Prof. Mario Cappiello

CANDIDATO

Alessandro Genco

(2)

1

Index

1. Abstract pag.2

2. Introduction

2.1. Protein-folding problem “

2.2. Protein design pag.6

2.3. TIM barrels and Octarellin series pag.8

2.4. Octarellin V, Octarellin V.I and direct evolution pag.10

2.5. Octarellin V.I Δ4P and Octarellin V 4P and PCR Site-direct mutagenesis pag.15

2.6. Prolines role and influence in structural features pag.17

3. Materials and methods pag.19

3.1. Reagents and instrumentation “

3.2. Methods “

3.2.1. Competent cells pag.20

3.2.2. Protein-expressing plasmid “

3.2.3. Protein expression pag.23

3.2.4. Agarose gel electrophoresis pag.25

3.2.5. Protein purification “

3.2.5.1. Ion exchange chromatography “

3.2.5.2. Size exclusion chromatography pag.26

3.2.5.3. Protein visualization and quantification “

3.3. Biophysical characterization pag.27

3.3.1. DLS “

3.3.2. Circular Dichroism pag.28

3.3.3. Thermal stability “

3.4. Molecular modelling pag.29

4. Results pag.30

5. Discussion and conclusions pag.55

(3)

2

1. Abstract

The main goal of this work is the optimization of a new purification protocol and protein quality analysis (secondary structure, thermal stability) of two artificial proteins, Octarellin V 4P and Octarellin V.I Δ4P to allow further detailed structural analysis such as crystallography/NMR and better understand how Prolines in certain positions within the primary structure can affect folding stability and solubility. The expression vectors for these two proteins were already available from a previous work realized by Maximiliano Figueroa and Cristina Elisa Martina (Ulg – GIGA research center). The proteins were produced in inclusion bodies using competent bacterial cultures, and then it was optimized a protocol of extraction, solubilization and purification. After several attempts, once they were purified, it was carried out a DLS analysis to check the quality of the samples, at dispersity level, before starting with the biophysical characterization. The biophysical experiences were useful to check if the secondary structure was well defined and to test the folding stability at high temperature exposure: far UV circular dichroism spectrum defining the secondary structure; temperature melting curves for evaluating the folding stability. Furthermore, computational models of the proteins were realized to study and analyze all the interactions in which the prolines are involved in the structure. These experiments will give us some clues about the structure and the role of the prolines, the subject of study in this work.

2. Introduction 2.1 Protein folding

One of the most complex problem in biochemistry is the protein-folding problem and it was firstly raised about one half-century ago. It posed for the first time questions about the relationship between an amino acidic sequence, 1D string of amino acid monomers, and the three-dimensional (3D) native structure defined from it[1]. The final protein 3D structure and its complexity is due to the presence of several forces interacting and influencing the correct folding. A Second set of questions were asked about how it could be a so fast folding process. Most of the studies about the protein folding focused on developing special software to predict the structure of a protein starting from the amino acidic sequence. In the early 1960s, biochemists at the U.S. National Institutes of Health (NIH) recognized that each protein folds itself into an intrinsic shape. If a protein is exposed to heat its 3D structure will generally unravel, and it will be restored after a following cooling down. This suggested that their structure stems from the interactions between different amino acids, rather than from some independent molecular folding machine inside the cell. Whether it were possible to quantify the strength of all these internal interactions it would be possible to trace back how any amino acid sequence would reach its final shape. The solution would be characterising the structure using X-Ray crystallography or nuclear magnetic resonance (NMR), but both of them are expensive and not so fast, considering the presence of just 130 000 protein structures in the Protein Data Bank, out of the hundreds of millions or more thought to exist. So, instead of waiting for the experimentalists, computer modellers have tackled the folding problem with computer models. At first, these software utilised just the homology modelling technique, that reconstruct the 3D model making a comparison between the amino acid sequence of a target protein with that of a template, a protein similar in structure and already well

(4)

3

known structures. As an alternative, it raised the ab initio techniques, which one consider and predict all the interactions for a folding protein based on the primary structure only[3]. One of the first ab initio folding software realised was Rosetta, ideated and realised by David Baker and Kim Simons, which calculates the push and pull between neighbouring amino acids to predict a

structure. Going forward with the research it was discovered that there are in the backbone some amino acids that are present in precise positions because they participate in crucial interaction between them. The developing and spreading of new technologies in genome sequencing it have been possible to compare the DNA sequences coding for similar proteins with some relevant results that represented a breakthrough. In these sequences’ transcripts there were found some conserved codons co-evolved in couple[4].

Since Anfinsen’s original demonstration of spontaneous protein refolding many experimental studies about protein folding of natural proteins[5] were conducted as well as complementary analytical and computational studies to better understand properties of folding free-energy landscapes[6]. The general path that the polymer chain takes through space, its topology, can be similar between proteins. According to three independent lines of investigation, protein-rates and mechanisms are largely determined by a protein’s topology rather than its interatomic

interactions[7]. All these investigation have the main goal to study folding of small protein

molecules, less than 100 residues, through real experimental data to develop new models able to predict the results of the experimental measurements on real proteins. This strategy results much more convenient considering that the number of conformations accessible to a polypeptide chain grows exponentially with chain length as well as the complexity of their analysis. Experimental data about such small proteins grew in the last twenty years providing a new view of protein folding process that allows for a much more heterogeneous transition state than before, when it was concentrated on a single, well-defined folding pathway[8] . The primary measurements of the highly cooperative folding reactions are: folding rate; distribution of structures in the transition state ensemble, with the hope that snapshots of the chain caught in the act of folding would give insights into folding “mechanisms,” the rules by which nature performs conformational searching.; and the structure of the native state. Firstly, either experimental[9] or evolutionary[10] large changes in amino-acid sequence do not alter the overall topology of a protein, a clear clue that evolution has not optimized protein sequence for rapid folding. Secondly, the consequences of mutations on folding kinetics to probe the transition states of proteins, with similar structures and different sequences, have shown that the structure of these transition states are relatively

insensitive to large-scale changes in sequence[11]. Thirdly, folding rates of small proteins correlate with the average sequence separation between residues that make contacts in the 3D structure (figure 1).Proteins with a large fraction of their contacts between residues close in sequence (`low' contact order) tend to fold faster than proteins with more non-local contacts (`high' contact order) [12]. The correlation between contact order and folding rate is evident and it occurs both within each structural subclass and within sets of proteins with similar overall folds (i.e.proteins structurally similar to the α/β protein acyl phosphatase in fig.1).

(5)

4

The important role of native-state topology can be understood by considering the relatively large entropic cost of forming non-local interactions early in folding. The formation of contacts between residues that are distant along the sequence is entropically expensive, because it greatly restricts the number of conformations available to the intervening segment. So, for a given topology, local interactions are more likely to form early in folding than non-local interactions. As proteins'

sequences determine their three-dimensional structures, both protein stability and protein-folding mechanisms are ultimately determined by the amino-acid sequence. But whereas stability is sensitive to the details of the inter-atomic interactions (removal of several buried carbon atoms can completely destabilize a protein), folding mechanisms appear to depend more on the low resolution geometrical properties of the native state. The results described above indicate that simple models based on the structure of the native state should be able to predict the coarse-grained features of protein-folding reactions. Several such models have recently been developed, and show considerable promise for predicting folding rates and folding transition-state structures. Three approaches[13] [14] [15] have attempted to model the trade-off between the formation of attractive native interactions and the loss of configurational entropy during folding. Each of them assumes that the only favourable interactions possible are those formed in the native state. All folding model use a binary representation of the polypeptide chain in which each residue is either fully ordered, as in the native state, or completely disordered. To limit the number of possible configurations, all ordered residues are required to form a small number of segments, continuous in sequence. The entropic cost of ordering is a function of the number of residues ordered and the length of the loops between the ordered segments. Folding kinetics are modelled by allowing only one residue to become ordered (or disordered) at a time. As the number of ordered residues increases, the free energy first increases, owing to the entropic cost of chain ordering, and then decreases, as large numbers of attractive native interactions are formed. Such simple models can potentially be used to predict experimentally measurable quantities such as the folding rate, which depends on the height of the free-energy barrier, and the effects of mutations on the folding rate, which depend on the region(s) of the protein ordered near the top of the barrier. Predictions of both folding rates and folding transition-state structures using these simple models are quite encouraging. These results showed that the fundamental physics underlying folding may be simpler than previously thought and that the folding process is surprisingly robust. The

topology of a protein's native state appears to determine the major features of its folding

free-Figure 1 – a. Schematic representation of a low- and high-contact-order structure for a pour-strand sheet. b. The correlation between contact order and folding rate (kf). Red circles represent all-helical proteins, green squares sheet proteins, orange diamond proteins comprising both helix and sheet structures, blue triangles proteins structurally similar to the α/β protein acyl phosphatase.

b. a.

(6)

5

energy landscape. Both protein structures and protein-folding mechanisms can be predicted, to some extent, using models based on simplified representations of the polypeptide chain. The challenge ahead is to improve these models to the point where they can contribute to the interpretation of genome sequence information.

Each protein encompasses in its sequence an historical record of millions of evolutionary

experiments. The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. It is possible to infer evolutionary constraints from a set of sequence homologs of a protein, distinguishing true co-evolution couplings from the noisy set of observed correlations. Using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.

The beauty of this evolutionary record, reported in protein family databases such as PFAM[16], is the balance between sequence exploration and constraints: conservation of function within a protein family imposes strong boundaries on sequence variation and generally ensures similarity of 3D structure among all family members[17] (Figure 2). In particular, to maintain energetically favourable interactions, residues in spatial proximity may co-evolve across a protein family[18]. This suggests that residue correlations could provide information about amino acid residues that are close in structure[19]. However, correlated residue pairs within a protein are not necessarily

Figure 2 - Correlated mutations carry information about distance relationships in protein structure. The sequence of the protein for which the 3D structure is to be predicted (each circle is an amino acid residue, typical sequence length is 50–250 residues) is part of an evolutionarily related family of sequences (amino acid residue types in standard one-letter code) that are presumed to have

essentially the same fold (iso-structural family). Evolutionary variation in the sequences is constrained by a number of requirements, including the maintenance of favourable interactions in direct residue-residue contacts (red line, right). The inverse problem of protein fold prediction from sequence addressed here exploits pair correlations in the multiple sequence alignment (left) to deduce which residue pairs are likely to be close to each other in the three-dimensional structure (right). A subset of the predicted residue contact pairs is subsequently used to fold up any protein in the family into an approximate predicted 3D shape (‘fold’) which is then refined using standard molecular physics techniques, yielding a predicted all-atom 3D structure of the protein of interest.

(7)

6

close in 3D space. Confounding residue correlations may reflect constraints that are not due to residue proximity but are nevertheless true biological evolutionary constraints or, they could simply reflect correlations arising from the limitations of our insight and technical noise. Solving this inverse problem would enable novel insight into the evolutionary dynamics of sequence variation, and the role of evolutionarily constrained interactions in protein folding. Determination of protein structure, by experiment or theory, provides one essential window into protein

function, evolution and design. However, our knowledge of protein structure remains incomplete and is far from saturation. In spite of significant progress in the field of structural genomics over the last decade[20], only about half of all well-characterized protein families (PFAM-A, 12,000 families), have a 3D structure for any of their members. At the same time, the current upper limit on the total number of protein families (~200,000; PFAM-B) is an order of magnitude larger, and

continues to grow with no clear limit in sight. Although the challenge of the computational sequence-to-structure problem remains unsolved, methods that use fragment libraries[21] or other strategies to search conformational space[22], followed by sophisticated energy

optimization or molecular dynamics refinement, have been successful at predicting the 3D structures of smaller proteins (~80 residues) [23].

2.2. Protein design

The term protein design refers to a group of methodologies that use computational techniques to project and create new protein molecules to fold to a target protein structure. The end goal is to create novel function and behaviour. At the beginning of this new discipline, it was based just on sequence composition without taking account of interactions between side-chains at the atomic level[24]. During these years there have been several improvements in the computational tools, such as Force Fields (models of potential energies that are used in computer simulations), protein design algorithms, libraries of amino acid conformations that have led to make complex

calculations and to search over vast configuration spaces analysing energy and flexibility. De novo protein design (also referred to as the inverse protein folding problem) is an attractive way to assess current knowledge of the relationships between a protein amino acid sequence and the three-dimensional structure that the polypeptide chain finally adopts after folding in solution. Calculating the stability of all possible conformations is difficult and time-consuming, because proteins are very complex macromolecules involving thousands of atoms. It results a much more possible task by dividing the conformational space of a protein into two distinct conformational spaces of similar complexity[25]: one associated with the backbone conformation (or main chain) and defining the target structure (the fold or topology) and one associated with side-chain

conformation.

It is possible to consider that all the proteins naturally occurring are essentially “accidents of evolution”, as result of a selective pressure lasted hundreds of millions of years. Selective pressure operated incrementally and in a random way to arise variants of primordial proteins and to make them functional. Considering the number of distinct sequences that are possible for a typical protein length of 200 amino acids, 20200, and also considering the number of distinct proteins produced in nature, in order of 1012, it is possible to assume that the evolution process is not so efficient in exploring the conformational space. And because evolution proceeds by incremental mutation and selection, naturally occurring proteins are not spread uniformly across the full sequence space; instead, they are clustered tightly into families(figure 3). The evolution has

(8)

7

probably explored only a tiny region inside the sequence space, representing all possible

sequences accessible to proteins, whereas de novo design can start from scratches, on the basis of our understanding of biophysics principles, to generate new proteins and make virtually possible the exploration of all the sequence space[26].

At the beginnings, in the 70’ and 80’ years, first engineered proteins were optimized manually with a process that involved modification of a protein structure based on analysis of other known proteins, sequence composition, amino acid charges and geometry of the desired structure. Recently several progresses in protein design pushed towards outstanding results unimaginable until about 10 years ago: first, more powerful computers have been developed improving the accuracy of methods for sampling the space of possible protein structures and sequences as well as method for calculating the energy of a protein chain; second, the synthetic manufacture of DNA cost reduction has determined an increasing number of computational designs that can be tested experimentally manufacturing the synthetic genes, not naturally occurring, that encode the

designed amino acid sequences. Both of the previous approaches are based on the hypothesis that proteins fold into the lowest energy states accessible to their amino acid sequences[27]. The driving force for protein folding is the burial of hydrophobic residues in the core, followed by the formation of intra-protein hydrogen bonds of polar groups that interact with the solvent in the unfolded state and become buried upon protein folding[28]. Predicting and designing protein structures requires methods for sampling alternative backbone and side-chain conformations and for identifying which structures and sequences present the lowest free energy. These methods use an energy function describing the interactions between the atoms of the protein and they can be distinguished into main groups depending on the target: protein structure prediction and de novo protein design. In protein structure prediction, since the amino acid sequence is known, just consist of a discrete combinatorial optimization in which the search covers the discrete rotameric states of each side chain. The main problem of this approach is that it is necessary to start the prediction process from a well-known homologous protein structure[29] to make less heavy the computational calculations to find the energy minimum also exploiting extra sources of

information, such as co-evolution based distance constraints[30]. Otherwise in protein design the amino acid sequence is unknown and it involve the backbone and side-chain sampling. Backbone sampling often frames the initial stages of the search and it can be made by assembling short peptide fragments[31] or by using algebraic equations to specify the geometry parametrically[32]. Each backbone designed has to be subjected to combinatorial sequence optimization calculations to identify the lowest energy sequence for the structure. In later stages of refinement, other methods are used to fine-tune the packing, the electrostatic interactions, H-bonds of the

structure. Only if the structure-prediction calculations that start from the design protein sequence

Figure 3 - Methods for de novo protein design. a, A schematic of the protein sequence space. Evolution has sampled only a tiny fraction of the total possible sequence space (blue), and the incremental nature of evolution results in tightly clustered families of native proteins (beige), which are analogous to archipelagoes in a vast sea of unexplored territory. Directed evolution is restricted to the region of sequence space that surrounds native proteins, whereas de novo protein design can explore the whole space.

(9)

8

converge on the designed structure, it is worth to experimentally characterize the de novo designed protein of interest. A limitation of de novo protein design is that only a fraction of protein designs adopt stable folded structures when produced in E. coli. The most frequent reasons for failure are insolubility and the formation of unintended oligomeric states (polydispersity).

One of the most successful goal achieved was constructing de novo proteins with ideal backbone arrangement designed to have an internal symmetry in which a single idealized unit is repeated numerous times[33] [34]. The internal symmetry allows to reduce the size of the sequence space that must be searched and enables a relatively small unit with a known sequence–structure combination to be reused repeatedly to build larger proteins. The symmetry is the main feature in particular folds like the α-helical toroids and TIM barrel, both characterized by a closed structure in which the final repeat unit is juxtaposed with the first.

2.3. TIM barrels and Octarellin series

TIM barrel, or (βα)8-barrel, is one the most common protein functional structure and it has been an alluring target for protein engineering. It is estimated that 10% of all naturally occurring

enzymes adopt this fold[35] because of its highly suited topology for catalysis due to the presence of active sites residues on loops spanning concentric rings of the alternating α-helical and β-sheet structural elements. Furthermore a cavity at one end of the barrel provides an ideal receptacle for substrate binding. All these features make it a perfect scaffold on which it could be possible working on and engineering new artificial protein catalysts. Thanks to ROSETTA protein design software it has been realized a series of works in which sequence modifications were introduced into loops of natural (βα)8-barrels transforming them into efficient catalysts capable to react with non-biological substrates[36]. Since Triose phosphate isomerase (TIM) high resolution structure has been solved 40 years ago, sequence of attempts to design (βα)8-barrel from scratch using repeated sequence elements have led to obtain designs showing not only mixed α/β structures but lack of solubility or partial disorder. These have been represented the main obstacle to high-resolution structure determination. Recently it has been computationally design and structurally determined a thermostable four-fold-symmetric (βα)8-barrel[37] representing an important step forward in addressing this longstanding design challenge. Before reaching this considerable result, many attempts in design the body of the barrel itself can be enumerate, generally without

successful results and potentially limiting the breadth of catalytic processes accessible to this fold. Nevertheless it is indisputable that all the previous works done till now provided a basis to define rules, parameters and constraints regulating this particular fold structure design.

Designing TIM barrel requests to derive design principles. The structure must be formed by an inner eight-stranded parallel β-barrel (n=8), surrounded by 8 α-helices on the periphery and a shift of 8 residues required to return to the same starting point. At first it was considered as an eight-fold symmetric structure, but soon after it was demonstrated that is not feasible due to the alternating pleat of paired β-strands. The highest symmetry attainable is four-fold repeat of βαβα units. De novo protein design is used as means of critically testing principles believed to govern protein folding and structure. Some rules about strand register, loop stabilisation and hydrogen bonding interactions were extrapolated and used to reach the best results: no strand register-shift

(10)

9

within a unit; with the first residue in the β-strands pointing into the barrel; shift of 2 residues between units (put image); 36° is the average angle made by the β-strand axes with barrel axis; helices and loops of appropriate length, more precisely, two unique helices and four unique loops in each unit; backbone H-bonding groups in the loops must be satisfied; the sheet-facing side of the alpha helices cannot be all alanines. Important improvements are: first and second helix positioned at notably different angles relative to overall barrel axis; first and second beta-strands are in a unique structural context, emphasizing that is best considered as four (α/β)2 units; optimization connections between alpha helices and beta-barrel.

Octarellins

During the last 20 years there were several improvements in protein design that led to achieve unimaginable goals. Despite one the longstanding dogma in structural biology stated that the tertiary structure of a protein is largely determined by its primary structure, it is also

demonstrated that diverging sequences fold in a similar tertiary structure as arrangement of secondary structure elements. This has been for a long time a serious problem for predicting a structure starting from a sequence with the use of a computer, but there have fortunately been over the years several improvements in the algorithms and computational performances.

In the actual state of art there are many examples of large size artificial protein and enzymes with catalytic functions[38], both single and multiple domain proteins[39]. It was not possible, until recently, to produce an artificial well-structured single domain protein larger than 200 amino acids, an ideal threshold that was exceeded with the production of Octarellin V and the improved version in stability and solubility, Octarellin V.1. Just with Octarellin series it has been reached a well-defined and stable secondary structure even though without any proof of a tertiary structure consistent with the original design.

Octarellin series started in the 90´s, as a project involving all the knowledge of the time to design proteins. It started from a first test sequence, codifying for an 8 times repetition of a single structural unit, to finally fold like an eight stranded parallel β-barrel structure. This sequence was deducted from a series of parameters: lengths of secondary structures; α/β packing; fitting on average Garnier profiles[40], expressing interactions of distant residues; deductions by analysing three natural α/β-barrel proteins. This eight-fold unit polypeptide forms a compact structure that shows cooperative and rapid two-state folding transition, involving long-range molecular

interactions[41]. Aiming at defining an artificial α/β unit sequence, common features in structures and sequences of three enzymes as reference were analysed to obtain a final set of 24 α/β

structural units. Lengths were computed showing: turns 2 and alpha-helices are more variable; turns 1 and beta-strands are relatively well-conserved. The structure deriving from this α/β unit of 30 residues showed lack of solubility and packing, but high secondary structure content and α-helices amount were consistent with the TIM models.

Second generation of Octarellin were realized after more detailed analysis of geometric principles governing natural α/β proteins β-barrel organization[42]. Working on these new de novo designed protein families a special attention was given to charge distribution and β-strand composition. Charges were carefully distributed through suppression of positively charged residues doublet

(11)

10

from the top of the barrel in Octarellin III. This change may partially explain their higher solubility. β-strand residues were examined according a “layer model” proposed by Lesk et al. (1989), that takes into account volume, charge distribution and hydrophobicity of their side chains. It was demonstrated through a statistical analysis of the residues in and around the β-strands that within a set of natural protein structures and sequences β-residues point towards the following α-helix are in general bigger and more hydrophobic than β-residues pointing towards the barrel center. Developing two alternatives for β-strand sequence design, two different artificial protein series were produced: Octarellin II and III. Octarellin II keeps an 8-fold β-strand symmetry, in which residues were selected in best agreement with both “central” and “α/β” criteria. Octarellin III shows a modified symmetry consisting of 4-fold at β-strand level, important for overall protein stability, and reflecting more accurately the architecture of natural α/β barrel. Basic structural units were made 4 amino acids shorter than that of Octarellin I for two reasons: to enhance residue contacts; to increase the polypeptide’s compactness. The resulting protein behaves like a molten globule, and no structure has yet been solved.

2.4. Octarellin V, Octarellin V.I and direct evolution Octarellin V

The following work concerning Octarellin series was published in 2003 showing the production a brand new artificial protein family, also known as Octarellin V, realised from idealised

parametrised backbones. There were many attempts to design and produce larger de novo designed proteins with a repetitive structure, mainly the parallel (α/β)8-barrel (TIM-barrel), but none of them has yielded a structure with all the properties of natural native proteins[43]. In the best cases, the resulting protein behaves like a molten globule, and no structure has yet been solved[44]. Octarellin V, was designed as an idealized, 216 residue α/β-barrel backbone, using geometric parameters such as distances and angles between secondary structures to describe the barrel's topology characterized by 4-fold symmetry. The design was elaborated in two steps. First,

Figure 4 - Parameters used to build the β-sheet scaffold. (a) Top view of the β-sheet scaffold (i.e. the Cα trace). The scaffold consists of eight strands. R is the radius of the barrel. θβodd and θβeven are the rotation angles of odd and even strands about axis 1 and axis 2, respectively. (b) Side-view of the β-sheet scaffold. Tz is the angle between the z-axis and the β-strand axis.

(12)

11

the idealized backbone was defined with geometric parameters representing our target fold: a central eight parallel-stranded β-sheet surrounded by eight parallel α-helices, connected together with short structural turns on both sides of the barrel. Second, an automated sequence selection algorithm, was used to find the optimal amino acid sequence fitting the target structure. Hence, instead of calculating stability of all possible conformation, the protein topology, or target structure must be defined and optimized. This step is called backbone selection or backbone design. Then, during a step called side-chain or sequence design, the lowest-energy sequence fitting and stabilizing the defined tertiary structure must be found. It was used the concept of rotamers, defined as statistically significant

amino-acid side-chain conformations[45], to represent side-chain flexibility. The side-chain rotamers were recorded in a rotamer library and simple exclusion volume criteria were used to enumerate the allowed sequences for a given template. More recently it was created a search algorithm based on the dead-end elimination (DEE) theorem to solve the combinatorial problem of optimizing side-chain conformations[46].This heuristic method finds the most favourable combination of side-chain rotamers in their optimal conformation for the target structure. Sequences are ranked with an appropriate potential energy function depending on the location of each side-chain in the protein: core, surface, or boundary

position[47]. The relative positioning of secondary structures was done with the help of short,

conserved structural motifs[48]. The biophysical characterizations of this protein demonstrated a stable tertiary structure. According to the results obtained, it was strongly thought that it was crucial to take side-chain packing specificity into account in a parallel (α/β)8-barrel "idealized" backbone conformation defined from first

principles.It was used five geometric parameters to model a β-sheet with 4-fold symmetry to form the idealized artificial backbone representing the α/β-barrel topology (Figure 4). The system was subjected to 300 steps of gradient conjugate

energy minimization. The final sequence is shown in Figure 3 with the secondary structures and the energy profile. The eight α-helix barrel surrounding the central β-sheet was built with five additional geometric parameters (Figure 5). Some of these parameters were adjusted with the help of short structural motifs called αβ1 and αβ3 turns. In the second step, the side-chain

sequence and conformations were chosen with the help of an automated selection algorithm. The residues were classified as occupying core, surface, or boundary positions according to the

Figure 5 - Description of the parameters used to construct the α-helix scaffold around the β-sheet barrel. (a) θαodd and θαeven are the rotation angles of odd and even helices about their axes. The offset radius H is the distance between the α-helix axis and the β-sheet plane at the equatorial plane. Angle y is the offset angle defining rotation of the α-helix scaffold about the β-sheet barrel, or the shear between the β-strand barrel and the α-helix barrel. A zero value for offset angle y means that the helix axis and the strand axis are colinear with the barrel center. (b) T is the helix axis shift defining the relative displacement of the α-helix scaffold relative to the β-sheet barrel.

(13)

12

distances of their Cα and Cβ atoms with respect to a surface calculated from the Cα atom positions in the backbone. The side-chain optimal combination was design testing the optimal

conformations compatible with the idealized backbone. For the core position residues it was used a classification based on distance along Cα-Cβ vector, stating that a residue is classified in core position when the distance from Cα atom to the surface (d1) is greater than 5 Å and the distance from the Cα atom to the nearest surface point (d2) is greater than 2 Å. There were selected 92 core residues of which 14 pointing into the barrel and 78 the rest, avoiding to decrease the number of aromatic residues.For the surface position residues it was used a classification based on the distance discrimination by which the sum of d1 and d2 must be less than 2.7 Å to classify a residue as surface residue. There were selected 83 polar and charged surface residues, avoiding to include in the design cysteine, to avoid di-sulphide bridge, and proline, to avoid cis-trans isomerization that could slow the folding process. For the boundary position residues were included all residues not belonging to either of these two categories. There were selected 39 residues containing both hydrophobic side-chains and the charged and non-charged polar side-chains (Figure 6).

The protein yielded was demonstrated through many experiments to be stable, non-aggregated, well-packed in solution, with an α-helix content similar to the model (around 50%) , a marginal stable and sufficiently soluble for biophysical characterization but not enough for crystallization or 3D NMR. The sequence was apparently not so optimized to avoid self-aggregation, in fact the

Figure 6 - Amino acid sequence, secondary structure, solvent-accessibility of residues(c, core residue; b, boundary residue; and s, solvent-exposed residue), and energy profile of the artificial protein. The designed sequence is displayed as an alignment of four subunits (1-54, 55-108,109-162 and 163-216). Each subunit is composed of two β-strands (in red), two α-helices (in blue), two β/α loops, one αβ1 loop, and one αβ3 loop. The energy profile of the modeled structure is shown. The eight deep minima corresponding to the β-strand region of the structure are typical of the (α/β)8-barrel topology. High-energy regions are located in helices and loops and are solvent-exposed in the model.

(14)

13

following work was carried out with the intention of solving the solubility and stability problem by directed evolution.

Direct evolution mutagenesis by using Error prone PCR and PCR site-directed mutagenesis The direct evolution method is very useful to optimize a de novo designed protein for folding stability and solubility. The most successful method is error prone PCR (epPCR) that is an easy and not so expensive technique for creating a combinatorial library based on a single gene: consist of doing amplification cycles of the DNA sequence, codifying for the man-made design protein, under particular conditions that enhance the random mutagenesis by using a low-fidelity Taq DNA

polymerase. The mutated PCR products are then cloned into an expression vector; the mutant library produced is the result of a simple iterative Darwinian optimization process and it can be screened for changes in function. This technique emerged as the best way to improve features of de novo designed proteins, such as catalytic activity[49], stability[50] and solubility[51]. This process of random mutagenesis allowed to produce Octarellin V.I starting from the sequence of Octarellin V by directed evolution of surface positions, using folding reporter protein (GFP fluorescence) to select the more soluble variants from a library and testing them for improved stability.

Octarellin V.I has been selected between all the mutants firstly for its solubility and then for its folding stability, tested by exposure to heat and chemicals, and improved solubility. Eight rounds of error prone PCR based direct evolution were performed to build a library of potential selectable proteins for their improved solubility. Between them it was chosen one variant, displaying 16 mutations located mainly in the N- and C-terminal regions, and 4 of them are proline residues that were not expected to be present in those positions (L9P, Q16P, L26P, L210P) and in such quantities for random mutagenesis. It could be possible that the presence of these Pro residues play a crucial role in both solubility and protein stability. This theory is supported by some studies in which proline residues play a role in both solubility and stability on natural proteins[52] [53]. Octarellin V.I

The main goal of Octarellin project was to design and characterize a well-structured single-domain artificial protein crossing the threshold of 200 amino-acid residues folded as (β/α)8 barrel. The previous attempt in the project, Octarellin V, seemed to be positive due to the biophysical characterization results, but it lacked of any evidences to support the presence of a well folded

Table 1 – Sequence alignment with Clustal O (1.2.1) between Octarellin V and Octarellin V.I with the 16 mutations in evidence. Proline residues in yellow and the others in green.

(15)

14

tertiary structure (lack of NMR or X-Ray crystallization). Using the direct evolution to optimize the previous structure of Octarellin V, a new artificial protein was selected for its better solubility, called Octarellin V.I. After producing and isolating, the protein was crystallized with the help of different crystallization chaperones and it was determined its tertiary structure. It is still the largest de novo designed protein whose structure was experimentally characterized so far. Since the protein structure could be altered by direct evolution process, it was biophysically characterized to compare both Octarellin V and V.1. The results suggested a well-structured secondary structure from Far-UV CD spectra measurements and fluorescence spectroscopy. Thermal and chemical stability assays demonstrated a high resistance to heat-induced

denaturation and a simple two-state reversible transition, consisting of only the native and the denatured state populations. There were realized two crystals of the protein with two different crystallization chaperones: nanobodies and α-Rep proteins, o minimize the risk of alter the tertiary structure of Octarellin V.1. The complexes chaperone-Octarellin V.I were structurally determined by X-ray diffraction studies: in both cases the structure of the Octarellin was not complete, missing information in some segments and those segments not observed using the nanobody are

observed using αRep and viceversa. The final model is based on information obtained by the nanobody, theαRep structure, and modeling being done with the Modeler software (figure 7). There is complementarity between the structures. The X-Ray model of the protein showed a structure much more different than expected: a 3-layer (αβα) sandwich architecture, but with a topology Rossmann-like fold, rather than a (α/β)8 TIM barrel fold. Whereas the archetypal Rossmann topology consist of two domains (S6||S5||S4||S1||S2||S3) with all β-strand having parallel contact (||), in Octarellin V.I the central β-sheet has an architecture of

(S8||S7||S3||S4||S5xS1xS6) with two antiparallel β-strand contact (x) (Figure 8).

Despite it was demonstrated that roughly 74% of the designed secondary structure features are present in the final structure and other features suggesting the reliability of the original secondary structure level design, it lacked of the interactions among secondary structure elements required to get the intended tertiary structure. This problem led to obtain a different fold from the in silico model.

Figure 7 – Final model constructed from the

complementary structural data belonging from nanobody (red), αRep (blue) and modelling (white).

(16)

15

2.5. Octarellin V.I Δ4P and Octarellin V 4P and PCR Site-direct mutagenesis

It is likely that the 4 proline residues give the solubility and the folding stability features as some works in literature indicate it. Starting from these two proteins some mutants were generated by PCR site-directed mutagenesis,a highly versatile technique that can be used to introduce specific nucleotide substitutions (or deletions) in a tailored manner.In brief, point-mutations can be introduced to plasmids using primers (with the desired mutation) in a PCR protocol that amplifies the entire plasmid template. Octarellin V.I Δ4P sequence was generated by reversion of the 4 prolines to the original Octarellin V amino acids in those positions; Octarellin V 4P sequence was generated adding these 4 prolines to the Octarellin V amino acidic sequence in the same positions of interest.

Below the sequences of the original genes and the mutated ones by PCR site-directed mutagenesis: >Octarellin V MAFLIVEGLSEKELKQAVQIANEQGLRAIAFLKQFARNHEKAERFFELLVREGVEAIIIARGVSEREIEQAAKLAREK GF EALAFLAEYERRDRQFDDIIEYFERYGFKAVIVATGLDEKELKQAAQKIEEKGFKALAFLGRIDQENRNINDIFELLQ RQ GLRAIIAATGLSERELSWALRAARQYGLDIIFAYGQFDEQDNQFKHFLELIRRLGAA >Octarellin V 4P MAFLIVEGPSEKELKPAVQIANEQGPRAIAFLKQFARNHEKAERFFELLVREGVEAIIIARGVSEREIEQAAKLAREK GF EALAFLAEYERRDRQFDDIIEYFERYGFKAVIVATGLDEKELKQAAQKIEEKGFKALAFLGRIDQENRNINDIFELLQ RQ GLRAIIAATGLSERELSWALRAARQYGLDIIFAYGQFDEQDNQFKHFLEPIRRLGAA

Figure 8 - Comparison of the de novo design with the actual structure of Octarellin V.1. The different folds are

shown, and also the expected secondary structure elements; note that the expected strand 2 (S2), helix 1 (H1), and the majority of the helix 8 (H8) are missing in the actual structure.

(17)

16 >Octarellin V.I MAFLIVKGPSEKDLNPAVQIANEQDPSAIAFLKQFARNHEKAERFFELLVREGVEAIIIARGVSEREIEQAAKLARE KGF EALAFLAEYERRDRQFDDIIEYFERYGFKAVIVATGLDEKELKQAAQKIEEKGFKALAFSGRIDQENHNINDIFELLQ RQ GLRAIIAATGLSERELSWAQRAAQQYGLDIIFANGQFDEQDNRFKHFLEPIRRQGAA >Octarellin V.I Δ4P MAFLIVKGLSEKDLNQAVQIANEQDLSAIAFLKQFARNHEKAERFFELLVREGVEAIIIARGVSEREIEQAAKLARE KGF EALAFLAEYERRDRQFDDIIEYFERYGFKAVIVATGLDEKELKQAAQKIEEKGFKALAFSGRIDQENHNINDIFELLQ RQ GLRAIIAATGLSERELSWAQRAAQQYGLDIIFANGQFDEQDNRFKHFLELIRRQGAA

Octa V.I ∆4P Octa V 4P

Number of amino acids: 217 Number of amino acids: 217

Molecular weight: 24912.2 Da Molecular weight: 24937.4 Da

Theoretical pI: 5.33 Theoretical pI: 5.51

(18)

17

2.6. Prolines role and influence in structural features

Proline is a very unusual amino-acid, in that the side-chain is cyclized back on to the backbone amide position. This characteristic implies three consequences: first, the backbone conformation of proline itself is very restricted. The available backbone dihedral angles are limited to a small range around φ = -65°.Similar restrictions do not apply to ψ, which is able to populate either the α-helical region (ψ ≈ -40°) or the β-sheet region (ψ approx. + 150 °). Proline's side chain gives it an exceptional conformational rigidity compared to other amino acids thanks to its distinctive cyclic structure. Second, α-helix conformation is disfavored by the bulkiness of the N-CH2 group that places restrictions on the conformation of the residue preceding proline. Third, because when proline is bound as an amide in a peptide bond, its nitrogen is not bound to any hydrogen, meaning it cannot act as a hydrogen bond donor, but can be a hydrogen bond acceptor. Proline acts as a structural disruptor in the middle of regular secondary structure elements such as α-helices and β-sheets; however, proline is commonly found as the first residue of an alpha helix and also in the edge strands of beta sheets and it is also commonly found in turns (another kind of secondary structure), and aids in the formation of β-turns. This may account for the curious fact that proline is usually solvent-exposed, despite having a completely aliphatic side chain. Usually because of its hydrophobicity they tend to adopt positions within the interior of a protein.

Multiple prolines and/or hydroxyprolines in a row can create a polyproline helix, the predominant secondary structure in collagen. Few examples are given by some works described in literature in which the high propensity of proline residues to enhance the characteristics in term of

stability[54]. The exceptional conformational rigidity of proline affects the secondary structure of proteins near a proline residue and may account for proline's higher prevalence in the proteins of thermophilic organisms. It was tested the correlation between the presence of prolines and the effect on the thermostability in proteins, Bacillus neutral proteases (NP) belonging to these extremophile organisms. On the basis of the observation that is high sensitive to the mutations in position 65 and 69, two simple substitution Ser-65→Pro and Ala-69→Pro were generated in two different mutants to detect the effect. The hypothesis suggested is that flexible surface-located regions at high temperatures are irreversibly inactivated as a result of autolysis preceded by local unfolding processes. The choice to substitute with prolines was taken to aim at stabilizing NP-site.

Figure 9 – “Representative thermostability curves of wild-type B stearothermophilus neutral protease (+) and Ser-65→Pro (o) and Ala-69→Pro (●) mutants. The curves show the relative residual proteolytic activity after a 30-min incubation period as function of the incubation temperature.”(F. Hardy et al. – 1993)

(19)

18

The result was an increase in thermostability, enhancing of 5°C the maximum temperature of residual activity[55] (figure 9).

It was demonstrated the effect on secondary structure in four different proteins in E.coli. Between the mutants it was demonstrated an increase in terms of structure stability without any alteration of the structure: far-UV CD spectra for mutants are also very similar to the corresponding WT (Figure 10). The spectroscopic data therefore suggest that in all cases the Pro residue has been accommodated without appreciable changes in the secondary and tertiary structure of the protein. In literature there are different experiments showing that prolines may not always have stabilizing effects but also destabilizing effects[56].

Considering the previously described studies about prolines within protein structures and their stabilizing role in some cases and destabilizing in others, it gives cause to suspect and raises questions about the correlation between proline

presence in a specific position in the structure and its influence on stability and solubility features.

Probably the insertion of a proline in a particular flexible zone may have a much more profound effect, but it is still impossible to precisely predict whether the effect of a mutation by

substitution with a proline has a positive or negative effect. In Octarellin V.I it was observed substantial improvement of

solubility and stability features compared to the previous version, with its 16 mutations. Between these substitutions 4 prolines appear in not expected positions according to statistical calculations made on EPP generated mutations. Is the improvement of these features due to a global effect of the overall substituted amino acids or just to the prolines? Have the only prolines a positive or negative effect? Preliminary unpublished results ( Maximiliano Figueroa, Universidad de

Concepción, formerly Ulg ) indicate that the protein carrying the substitution at the level of the 4 Prolines cannot be expressed in soluble form and it is recovered in inclusion bodies. To study the only proline effect indeed there were created the two mutants, which are the study subject in this work. The target of creating Octarellin V 4P is to study if substituting with the only prolines the original WT sequence it is obtained the enhancing effect on the features of interest, whereas in Octarellin V.I Δ4P is studying if the reversion of the 4 prolines to the original WT sequence amino acids cancels the positive features of interest.

Figure 10 – Far-UV CD spectra for WT and Pro mutants of LIVBP, MBP, RBP, and Trx at pH 7 and 298k

(20)

19

3. Materials and methods

3.1. Materials - Cultural media, solutions and buffers

List of solutions according to the methods: Concentrations are given in Molarity (M). For simplicity, for some of the reagents, exact weights and volumes are specified.

MgCl2-CaCl2 Solution, CaCl2 Solution,E.coli BL21 (DE3) One Shot®.kit NucleoBond® Xtra Midi (MACHEREY-NAGEL GmbH & Co. KG), Luria-Bertania (LB) growth media, IPTG (aqueous), SOC media, Ampicillin, cOmplete™ Protease Inhibitor Cocktail (Sigma Aldrich), APS (aqueous), SDS, acrylamide mix, Tris (pH 8.8), TEMED, isopropanol, Tris-HCl (pH 6.8), glycerol, β-mercaptoethanol, Bromophenol Blue, Precision Plus Protein™ unstained Standard (BioRAD Laboratories), glycine, InstantBlue™ (Expedeon), TBE buffer, Sybr Green, bromophenol blue, dialysis membranes and ultrafiltration filter tubes Amicon (Millipore); HiTrapQ HP 5ml anion exchanger column (GE Healthcare), HiLoad 16/600 Superdex 75 PG prepacked XK gel filtration column (GE Healthcare). • Buffer 1 – Tris HCl 50 mM pH 8.5

• Buffer 2 – Tris HCl 50 mM pH 8.5 Urea 8 M (denaturating buffer for IEX)

• Buffer 3 – Tris HCl 50 mM pH 8.5 NaCl 1 M (elution buffer for IEX)

• Buffer 4 – Na Phosphate 50 mM pH 8 NaCl 150 mM (elution buffer for SEC). -All reagents were utilized in analytical purity

- All buffers were prepared in ultrapure MilliQ water (ddH2O).

Instruments: Centrifuge (Beckman Coulter Allegra X-15R , Beckman Coulter Avanti J-E); Spectrophotometer (Libra S32 Biocrom) , NANODROP 2000 spectrophotometer (Thermo scientific); Mini-PROTEAN® Tetra Vertical Electrophoresis Cell (Bio-Rad),Accuris myGel™ Mini Agarose Gel Electrophoresis Apparatus (LabREPCO), pHmeter (Denver instrument company); French press (EmulsiFlex-C3 Avestin), Sonicator (Bandelin Sonipuls), Zetasizer Nano S research grade dynamic light scattering system (Malvern Instruments), ÄKTA Purifier 10 Purification System with UNICORN 6.4 software (GE Healthcare), Jasco J-810 spectropolarimeter.

3.2. Methods

3.2.1. Competent cells

E.coli BL21 (DE3) One Shot® from -80°C were streaked on a LB Petri dish and let them grow overnight at 37°C.

An isolated colony was picked and transferred to 4 ml of LB media in a Falcon tube and grown overnight at 37°C at 250 rpm.

400 µl of overnight culture was transferred to 400 ml LB media and the cells were grown at 37°C until the OD600 reached 0.4. Hence the culture was centrifuged in 2 centrifuge tubes (200 ml each) at 4°C for 10 minutes at 2800 g.

(21)

20

The supernatant was discarded and each recovered pellet was dissolved in 120 ml of MgCl2-CaCl2 solution. The mixture was centrifuged at 4°C for 10 minutes at 2800 g. Again the pellet was recovered and dissolved in 4 ml of CaCl2 solution.

Aliquot of 50 µl were made and stored at -80°C.

Bacterial strain description and growth conditions

E.Coli BL21 (DE3) One Shot® strain has been used to produce the proteins of interest. The genotype of this strain is F- ompT hsdSB (rB-mB-) gal dcm (DE3), and it presents these features: doesn’t carry F plasmid (F-); it is deficient in the outer membrane protease, OmpT. The lack of two key proteases reduces degradation of heterologous proteins expressed in the strains; endogenous restriction endonucleasis-deficient (hsdSB); galactose non-utilizing (gal); abolishes endogenous cytosine methylation at CCWGG sequences (dcm), that is important to avoid undesired a

modification state of plasmid DNA can affect the frequency of transformation in special situations; contain the DE3 lysogen that carries the gene for T7 RNA polymerase under control of the lacUV5 promoter so IPTG is required to induce expression of the T7 RNA polymerase. Using the pTRC system (see in the next paragraph), the (DE3) part is wasted because trc promoter does not need T7 RNA polymerase.

This strain was stocked in 1,5 ml eppendorf at -60°. After thawing it was carried out the

transformation according the method of thermic shock and selection was made on plates with ampicillin (100 µl/ml). Bacteria were grown in LB medium liquid colure in flasks of 250 ml, filled with 100ml of media each, for the pre-culture, and flasks of 2 liters, filled with 1 liter of LB media each, for the production culture.

3.2.2. Protein-expressing plasmid

The plasmid pBJ122 (Fig. ), a derivative of pNS3785, has been selected to express the DNA insert coding for the Octarellin V and V.I mutants under the control of Trc Promoter. High levels of expression are possible using the trc (trp-lac) promoter (Egon et al., 1983) and the rrnB anti-termination region (Li et al., 1984). The trc promoter contains the -35 region of the trp promoter together with the -10 region of the lac promoter (Brosius et al., 1985; Egon et al., 1983; Mulligan et al., 1985). Transcription of the ribosomal RNA operons (rrn) in Escherichia coli is subject to an anti-termination mechanism whereby RNA polymerase is modified to a termination-resistant form during transit through the rrn leader region (Friedman et al., 1973; Adhya et al., 1974; Albreehtsen et al., 1990). Translation is enhanced by the presence of a minicistron that provides highly efficient translational restart into the open reading frame (ORF) of the multiple cloning site (MCS).

Additional plasmid encoded lacI genes are often not enough to control basal expression; so many systems instead supply the lacIq gene, whose mutated promoter increases LacI repressor

expression ten-fold. The Tl-T2 terminator cluster found at the end of rrnB operon (Young, 1979; Schmidt & Chamberlin, 1987) can effectively stop transcription modified by the rrn leader region. It is conceivable that these terminators may encode signals to disrupt the elongation mode of RNA polymerase irrespective of whether it has been modified to an anti-termination proficient state.

(22)

21

According to this model, Tl-T2 terminators would be resistant to any active anti-termination mechanism.

Amplification of the plasmid:

The plasmid with the synthetic gene encoding Octarellin V 4P and Octarellin V.I Δ4P ordered was amplified to ensure a reserve stock and to have enough amount to carry out the experiments. 100 µl of E.coli BL21 (DE3) One Shot® chemically competent cells were mixed with 1 µl of plasmid and incubated in ice for 30 minutes.

The cells were then heat shocked in a water bath for 30 seconds at 42° C (without shaking) and immediately placed on ice. 250 µl of SOC media was added and incubated for 1 hour at 37°C at 250 rpm.

2 LB plates with ampicillin were taken. One plate was plated with 50 µl of cells and another was plated with 200 µl of cells.

The plates were left growing overnight at 37° C.

The day after 2 colonies from each plate were chosen and transferred to 5 ml of LB media with ampicillin (100 µg/ml). The cells were grown overnight at 37° C in agitation at 250 rpm.

Purification of the plasmid:

Plasmid DNA purification was carryied out using the kit NucleoBond® Xtra Midi (MACHEREY-NAGEL GmbH & Co. KG) following the low copy protocol.

The overnight grown cells were harvested by centrifuging at 6000 x g for 15 minutes at 4°C in a table top centrifuge. The pellet was recovered.

It was re-suspended in 16 ml resuspension Buffer RES + 16 ml Buffer LYS by pipetting the cells up and down. Incubate the mixture at room temperature (18–25 °C) for 5 min.

NucleoBond® Xtra Column was equilibrated together with the inserted column filter with 12 ml Equilibration Buffer EQU. The buffer was applied onto the rim of the column filter as shown in the picture and make sure to wet the entire filter. the column was allowed to empty by gravity flow. After adding Neutralization Buffer NEU to the suspension and it was immediately mixed the lysate gently by inverting the tube until blue samples turns colourless completely. The lysate was loaded on NucleoBond® Xtra Column Filter.

At first it was applied 5 ml Equilibration Buffer EQU to the funnel rim of the filter to wash all residual lysate out of the filter onto the column and then pulled out and discarded the column filter. It is essential to wash the NucleoBond® Xtra Column without filter for a second time with 8 ml Buffer WASH.

5 ml of buffer ELU was added, followed by 3.5 ml of isopropanol. Everything was vortexed and centrifuged for 30 minutes 13000 x g at 4°C. 2 ml room-temperature 70 % ethanol was add to the

(23)

22

pellet. After one more centrifugation at ≥ 4,500 x g, preferably ≥ 15,000 x g for 5 min at room temperature (18–25 °C), the ethanol was carefully remove from the tube with a pipette tip. The pellet was left to dry up at room temperature (18–25 °C) for 10-15 minutes.

In a final step the DNA pellet was dissolved in an appropriate volume of buffer TE or sterile H2 O. under gentle pipetting up and down. Plasmid yield was determined by UV spectrophotometry. The procedure for the purification of plasmid belongs to MACHEREY-NAGEL protocols. (Source:

http://www.mn-net.com/Portals/8/attachments/Redakteure_Bio/Protocols/Plasmid%20DNA%20Purification/UM_

pDNA_NuBoXtra.pdf).

Determination of DNA concentration: The concentration of the DNA was measured by NANODROP 2000 spectrophotometer at 260 nm wavelength. 1µl of elution buffer (EB) was used as blank. 1µl of the plasmid sample was used for the measurement.

Sequencing of the purified plasmids:

The sequence of the purified plasmid was verified by sequencing. Two tubes were used: one with the forward primer and another with the reverse primer were sent sequencing.

3.2.3. Protein expression

Competent cells transformation and induction:

it was used LB medium for growing the competent cells E.coli BL21 (DE3). The bacterial strain grew in both solid and liquid medium. The solid media were prepared adding 1% agar to the liquid ones. The growth was carried out in incubation at 37°C. As antibiotic it was used Ampicillin at 100 μg/mL final concentration. For inducing the protein production it was used IPTG at final concentration of 1 mM (stock solution 1M).

It is important to prepare at first all the flasks for the production: the 4 liters of LB (4 flasks of 2 liters, filled with 1 liter of LB media each), 200 ml of LB media (2 flasks of 250 ml filled with 100ml of LB media each); all of them were sterilized by autoclaving.

On the other hand it is important to start the transformation of competent cells with our plasmid: it has been used a strain of competent cells “E.coli BL21 (DE3) One Shot” and the plasmid pBJ122 with the insert of Octarellin V.I ∆4P in one and Octarellin V 4P in another one (everything under flame):

Eppendorf tubes with 100 µl of competent cells (BL21) stored in aliquot at -80°C were put on ice for 5-10 minutes to thaw them.

(24)

23

The cells were heat shocked for 30 seconds at 42°C without shaking and immediately transferred back to ice for 2 minutes.

200 µl of SOC (Super Optimal Broth with Catabolite Repression) medium was added to all the tubes.

The content of each tube was put in different petri dishes (LB/agar with ampicillin at 100 μg/mL) in incubation overnight at 37°C.

The transformed cells were used to inoculate with an inoculation loop in separated flasks of 250 ml filled with 100ml of LB media and 100 µl of Ampicillin (PRE-CULTURE flasks), one for Octarellin V.I ∆4P and one for Octarellin V 4P transformed colonies, and put in agitation at 37°C overnight. The content of one pre-culture was transferred for one half of the content to one CULTURE flask and one half to another CULTURE flasks, two flasks of 2 liters, filled with 1 liter of LB media with Ampicillin 100 μg/mL stock solution. The same procedure was carried out for the other pre-culture. They were put at 37°C in agitation to mid-log phase.

When the OD600 reached the value 0,6, it was added IPTG (1mM final solution) to start synthesis of the proteins of interest. The culture flasks were incubated for 4 hours at 37°C in agitation.

After that the 4 liters culture were centrifuged at 4300 rpm for 20 minutes with a slow deceleration (decel. value: 2) using the centrifuge Beckman Coulter Allegra X-15R. Finally, the supernatant was discarded and the pellet stored at -20°C.

Protein extraction and solubilisation:

Each of the two pellet was re-suspended it in 100 ml of buffer 1. The cells were lysed by using EmulsiFlex-C3 Avestin (3 cycles) at 2000 bar on ice. Each lysate was centrifuged at 12000 rpm for 45 minutes at 4°C using the centrifuge Beckman Coulter Avanti J-E. The pellet was resuspended in 100 ml of buffer 1 and it was performed a first sonication treatment for 8-10 cycles of 30 s interval at 50% power (Bandelin Sonipuls sonicator) on ice. After centrifuging for 10 min at 16000 g the treated samples it was discarded the supernatant.

The pellet was re-suspended in 100 ml of Buffer 1 and after that it was checked the presence or absence of the nucleic acids contamination in both samples by OD260.

The sonication treatment was repeated one more time followed by a final centrifugation for 45 minutes at 12000 rpm, 4°C.

The pellet was re-suspend in 200 ml of denaturating buffer 2 under gentle agitation using a rotating mixer or slow rocking at 4°C until reaching an homogeneous solution. The re-suspended solution was centrifuged for 45 minutes at 12000 rpm and further it was recovered the

supernatant and discarded the pellet.

The suspension was filtrated using a 0.45 µm cut off filter paper to remove any fine particles and stored at 4°C, preserving a small aliquot of it apart (input sample) for SDS page analysis.

(25)

24

3.2.4 Agarose gel electrophoresis

Agarose gel electrophoresis was carried out to determine the presence of nucleic acids contamination inside the protein samples after treatment with Benzonase and sonication. 1 % agarose gel was prepared mixing 1 gram of agarose was mixed in 100 ml of 0.5 X TBE buffer. The mixture was heated in a microwave oven to solubilize the agar. 10 µl of SYBR® Green (BioRad) nucleic acid passive reference dye fluorescein was added for the purpose of visualization.

The samples for agarose gel electrophoresis were prepared by adding 5 µl of protein sample with 1 µl of 6X gel loading dye. 6 µl of 1 kb DNA ladder was used as a DNA size marker.

The agarose gel was placed into the gel box (electrophoresis unit) that was filled with 1x TBE) until covering the gel. Carefully samples and DNA ladder were loaded your into the wells of the gel and it was run for 45 minutes at 100 V.

The bands were visualized in UV lights.

3.2.5. Protein purification

The protein purification comprises two chromatographic steps that allow to obtain a final sample as pure as possible. All purifing operations were carried out at 4° C.

3.2.5.1. Ion Exchange Chromatography (IEX)

Proteins are often characterized by their isoelectric points (pI), the pH value at which they carry no net charge. It is assumed that the proteins will not bind to the chromatography media at their pI, but will be retained by strong anion exchange columns at pH above their pI. Considering the pI of the proteins (∆4P pI 5.33 and 4P pI 5.51), and the buffer pH 8.5, the experiment was carried on using astrong quaternary ammonium (Q) Sepharose High Performance anion exchanger, HiTrapQ HP (GE Healthcare). In all experimental setups two column in row were at first equilibrated with running buffer 1 for 5 CV (column volume, length of linear elution gradient) and after with buffer 2 for 5 CV with a flowrate of 5 ml/min using ÄKTA Purifier 10 Purification System and the Prime View software. This step was followed by an automated sample load of 100 mL protein mixture. Firstly, they were flushed with 2 CV of buffer 2 to remove unbound proteins, and then a 10 CV gradient from 0 to 100% of buffer 1 to remove Urea (on-column renaturation step). This step is necessary to allow the sample proteins bound on the resin to recover the native state folding by switching from a denaturant environment to a physiological one. Next an elution gradient was applied from 0% to 100% of NaCl, adding buffer 3 (Tris HCl 50mM pH 8.5 + NaCl 1M), in two steps: in first one the elution has been prolonged for 3 CV till reaching the 15% conc. of NaCl; in the second the gradient was of 20 CV from 15% to 100%. Both steps are monitored by checking the absorbance at 280nm for proteins, 260nm for nucleic acids, 215nm for detecting peptidic bond, in each fraction. 3.2.5.2. Size Exclusion chromatography (SEC)

The enriched fractions belonging to the IEX step were pooled and concentrated, by centrifuging in 10K C.O. Amicon tube (Merck) for 10 minutes at 4000 xg swinging bucket rotor and 4°C, to be

(26)

25

loaded in a size exclusion chromatography column. In size exclusion chromatography or gel filtration, the bed is packed with a porous gel which separates the compounds of a mixture

depending on their difference in molecular mass and shape. The larger compounds elute first since they cannot enter the pores. Smaller molecules permeate the pores and move through the column slowly. It was carried out through a Superdex 75 16/60 column (GE Healthcare). For first it was equilibrated with 2 CV of buffer 1, and then the SEC was performed using buffer 4 (sodium

phosphate 50 mM NaCl 150 mM pH 8) and carrying out the elution for 1.2 CV. 2 ml were collected with a flow rate of 0.5 ml/min at RT.

Preparation of samples before running SDS gels:

20 µl of samples were mixed with 5 µl of Loading buffer 5%. They were put for 15 minutes at 95°C and centrifugated for 12 seconds before loading in the SDS-PAGE gels. 10 µl of Precision Plus Protein Unstained Standard (Bio-Rad) was loaded in the last well as molecular marker (ladder) to identify the protein of interest by MW. The gels were stained and visualized with Instant Blue. The concentration of stacking gel was chosen to be 4% and the resolving gels were 12%.

3.2.5.3. Protein visualization and quantification SDS-PAGE

Before carrying out SDS-PAGE all the buffers and gels were prepared. The gel is formed by two parts:

• 12% SDS Resolving gel (24 ml for 4 gels): - 9.5 ml ddH2O. - 8 ml 30% acrylamide mix, - 6 ml of 1.5 M Tris (pH 8.8), - 0.36 ml of 10% SDS, - 0.12 ml of 10% APS, - 0.040 ml of TEMED.

All the ingredients were mixed being careful to add APS and TEMED at the end. An appropriate amount of separating gel solution was pipetted into the gap between the glass plates.

To make the top of the separating gel be horizontal, isopropanol was filled in into the gap until an overflow, waiting for 20-30min to let it jellify.

• 4% Stacking gel (10 ml for 4 gels): - 5.81 ml of ddH2O.

- 1.33 ml of 30% acrylamide mix. - 2.5 ml of 0.5 M Tris (pH 6.8), - 0.1 ml of 10% SDS,

Riferimenti

Documenti correlati

The sudden growth of the magnitude of F in equation (1) is detected by an increase in the local truncation error, which causes the integrator to choose a shorter step size and a

The proteomic analysis of the present study identified various proteins involved in the different phases of protein synthesis and protein degradation allowed an in-depth description

3.6 Validation of the multilinear regression model with non-polar solvation, Lennard-Jones and electrostatic contributions scaled 95 3.7 Experimental and calculated folding free

Oppido, squadra di calcio dilettantistica della Basilicata, e sullo schema proposto da Andersen e Williams (1998) abbiamo sottoposto a 20 atleti quattro tipi di questionari, di

Nel caso delle reazioni chimiche il valore di z è di circa 30 °C e quindi un aumento della temperatura di trattamento termico accelera molto di più la velocità di abbattimento dei

Keywords: Crohn’s disease; Dupilumab; inflammatory bowel disease; interleukin-4 antagonists; interleukin- 13; ulcerative

A ran- domised controlled trial of fluid restriction compared to oesophageal Doppler-guided goal-directed fluid therapy in elective major colorec- tal surgery within an

Both the churn prevention and the predictive maintenance problems have been modeled as bi- nary classification problems, and the corresponding most suitable learning algorithms