15 maggio 2015 1
/ 45Neri Niccolai
Dipartimento di Biotecnologie, Chimica e Farmacia
Università degli Studi di Siena
Post-genomic Revolution
in Life Sciences
15 maggio 2015 2
/ 45α haemolisin
from staphylococcus aureus
20,324 atoms 2,051 amminoacidi
Protein structure and dynamics…
Something good for Engineers!
Neri Niccolai
Dipartimento di Biotecnologie, Chimica e Farmacia
Università degli Studi di Siena
15 maggio 2015 3
/ 45A journey around Life Sciences?
http://it.wikipedia.org/wiki/Legge_di_Moore
15 maggio 2015 4
/ 45100100101101101001000100100101001101100100101101101001000100 101001000100101110001010100101001101100100101101101001000100
101110001010100101001101100101001101100100101101101001000100 100101001101100100101101101001000100100100101101101001000101
A journey around Life Sciences?
Meeting point @ Genome data banks !
15 maggio 2015 5
/ 45A journey around Life Sciences?
Meeting point in Genome data banks !
15 maggio 2015 6
/ 45UniProtKB/TrEMBL: la banca dati delle sequenze di amminoacidi delle proteine individuate genomicamente
Il primo aprile 2015 UniProtKB conteneva la sequenza di 46.714.516 proteine diverse, comprendenti 15.391.501.940 amminoacidi (lunghezza media 329 aa)
15 maggio 2015
/ 45From protein sequences to 3D structures
3 experimental techniques to solve molecular structures
X-ray diffraction
1a structure: 1957 deposited in PDB in 1972
Cryo-electronic microscopy
1a structure: 1996 deposited in PDB in 1996
Nuclear Magnetic Resonance Spectroscopy
(NMR)
1a structure: 1985 deposited in PDB in 1988
7
715 maggio 2015 8
/ 45Protein Data Bank: the only Bank you can rely on
15 maggio 2015 9
/ 45pdb files: a structural treasure
to be parsed around!
/ 45
Databanks New tools +
New insights = ?
15 maggio 2015 10
15 maggio 2015 11
/ 45Protein Data Bank: the only Bank you can rely on
15 maggio 2015 12
/ 45Protein Data Bank: the only Bank you can rely on
15 maggio 2015 13
/ 45Protein Data Bank: the only Bank you can rely on
15 maggio 2015 14
/ 45a database of protein singles
In PDB, as of Tuesday May 7, 2013 at 5 PM PDT there are 90,424 Structures Experimental Method: X-RAY (79,770)
Chain Type: Protein (74,456)
Only 1 chain in asym. unit: (28,803) Oligomeric state: 1 (21,193)
Number of Entities: 1 (3,517)
Homologue Removal @ 95% identity (2,410)
2,410 proteins in the dataset 4,657,574 atoms 589,383 residues
0 2 4 6 8 10 12
DOOPS:
15 maggio 2015 15
/ 45a database of protein singles
2,410 proteins in the dataset 4,657,574 atoms 589,383 residues
DOOPS:
0 2 4 6 8 10 12
Swiss-Prot: 540,052 proteins in the dataset (191 Maa)
0 1000 2000
/ 45
Databanks New tools +
New insights = ?
15 maggio 2015 16
/ 45
protein folding Birth of the Earth
Digging inside objects to discover their origins
3D atom depth analysis
Depth index analysis
15 maggio 2015 17
/ 45
atom depth analysis
Atom depth * = atom distance from molecular surface.
* Chakravarty,S. and Varadarajan,R.
(1999) Residue depth: a novel parameter for the analysis of protein structure and
stability. Structure Fold. Des., 7, 723–
732
15 maggio 2015 18
/ 45
atom depth analysis 3D
15 maggio 2015 19
/ 45
Depht index defined as:
exposed volume sphere volume
A0 r
atom depth analysis 3D
15 maggio 2015 20
r r i r
i V
D V
, 0
, ,
2
15 maggio 2015 21
/ 453D atom depth analysis
15 maggio 2015 22
/ 453D atom depth analysis
1UBQ PDB_SADIC file Di 1UBQ PDB file B
15 maggio 2015 23
/ 45N 0.19 CA 0.30 C 0.25 O 0.23 CB 0.50 CG 0.68 CD 0.91 CE 1.11
NZ 1.29K63
N 0.38 CA 0.52 C 0.50 O 0.52 CB 0.76 CG 0.95 CD 1.17
OE1 1.24 OE2 1.24E24
3D atom depth analysis
N 0.10 CA 0.05 C 0.11
O 0.18CB 0.02 CG 0.02 CD1 0.02 CD2 0.00
L43
Dima x
Dima x Dima x
from PDB ID 1UBQ
http://www.sbl.unisi.it/prococoa/
15 maggio 2015 24
/ 45Ln Di % atoms color L0 < 0.2 17,9 violet L1 0.2 - 0.4 17,1 indigo
L2 0.4 -0.6 17,7 blue
L3 0.6 –
0.8
17,4 green
L4 0.8 –
1.0
14,4 yellow
L5 1.0 –
1.2
9,4 orange
L6 > 1.2 5,7 red
L0 L1 L2 L3 L4 L5 L6
Dimax analysis of protein singles
15 maggio 2015 25
/ 45Ln Di % atoms color L0 < 0.2 17,9 violet L1 0.2 - 0.4 17,1 indigo
L2 0.4 -0.6 17,7 blue
L3 0.6 –
0.8
17,4 green
L4 0.8 –
1.0
14,4 yellow
L5 1.0 –
1.2
9,4 orange
L6 > 1.2 5,7 red
L0
L6
Dimax analysis of protein singles
15 maggio 2015 26
/ 45Dimax analysis of protein singles
0 2 4 6 8 10
12
DOOPS: all structural layers from 3,515 pdb
%
/ 45
Databanks New tools +
New insights = ?
15 maggio 2015 27
15 maggio 2015 28
/ 45L0
L6
Dimax analysis of protein singles
protein folding protein – protein
interactions
15 maggio 2015 29
/ 45Post-genomic Era and System Biology
15 maggio 2015 30
/ 45Brownian Dynamics simulation of protein motion inside E. coli cell (1,109 molecules selected from
51 different kinds of proteins and RNA
15 maggio 2015 31
/ 45A caccia di tasche sulla superficie proteica:
una ricerca complessa...
0 10 20 [ns] 30
15 maggio 2015 32
/ 45A screeshot of PSTP-Finder results window. Histograms refer to the surface pockets found during a 50 ns MD from 1QG7 PDB file
discovering disruptors of
protein-protein interactions
15 maggio 2015 33
/ 45discovering disruptors of protein-protein interactions
:O
15 maggio 2015 34
/ 45discovering disruptors of protein-protein interactions
:O
/ 45
Codon codes assigned by
chance?
15 maggio 2015 35
/ 45 Release 2015_04 of 01-Apr-15 of UniProtKB/Swiss-Prot contains
548,208 sequence entries, comprising 195,282,524 amino acids
Ala (A) 8.26 Gln (Q) 3.93 Leu (L) 9.66 Ser (S) 6.58 Arg (R) 5.53 Glu (E) 6.74 Lys (K) 5.83 Thr (T) 5.34 Asn (N) 4.05 Gly (G) 7.08 Met (M) 2.41 Trp (W) 1.09 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.37 Ile (I) 5.94 Pro (P) 4.71 Val (V) 6.87
15 maggio 2015 36
6
6
6 4
4 4 2
3 2 2 4
2 2 2 2
2 1 2
2 1
/ 45
The frequencies of DNA bases in human genome (NT 46.742.881) are:
A: 24.54 %; T: 26.39 %; C: 24.27 %; G: 24.79 %
The expected frequency of a particular codon can then be calculated by multiplying the frequencies of each DNA base comprising the codon. The expected frequency of the amino acid can then be calculated by adding the frequencies of each codon that codes for that amino acid.
As an example, the RNA codons for tyrosine are UAU and UAC, so the random expectation for its frequency is (26.39)(24.54)(26.39) + (26.39)(24.54)(24,27) = 32,807.952996
15 maggio 2015 37
0 2 4 6 8 10 12
0 2 4 6 8 10 12
SWISS Prot Codoni
S L R
E K
C A
D P
expected codon frequency vs SwissProt_amino acid frequency
expected trend*
* (0.25)3 x (64-3)/20 = 4.76 x 10- 2
15 maggio 2015 38
/ 450 2 4 6 8 10
12
amino acid frequency @ proteina-DNA interface (from 209 pdb files)
amino acid frequency @ structural layers with Dimax>1.20 2 4 6 8 10 12
15 maggio 2015 39
/ 4515 maggio 2015 40
/ 452.3 Å
2.5 Å