Nozioni di base 3

(1)

Protein Structure

Prediction and Display

Goal

Take primary structure (sequence) and, using rules derived from known

structures, predict the secondary structure that is most likely to be adopted by each residue

(2)

Secondary structure

Alpha helix

Secondary structure

(3)

Secondary structure

Random coil

Structural Propensities

Due to the size, shape and charge of its side chain, each amino acid may “fit” better in one type of secondary

structure than another

Classic example: The rigidity and side chain angle of proline cannot be

(4)

Structural Propensities

Two ways to view the significance of this preference (or propensity)

z It may control or affect the folding of the

protein in its immediate vicinity (amino acid determines structure)

z It may constitute selective pressure to use

particular amino acids in regions that must have a particular structure (structure

determines amino acid)

Secondary structure

prediction

In either case, amino acid propensities should be useful for predicting

secondary structure

Two classical methods that use previously determined propensities:

z Chou-Fasman

(5)

Chou-Fasman method

Uses table of conformational

parameters (propensities) determined primarily from measurements of

secondary structure by CD spectroscopy

Table consists of one “likelihood” for each structure for each amino acid

Chou-Fasman propensities

(partial table)

Amino Acid P_α P_β P_t Glu 1.51 0.37 0.74 Met 1.45 1.05 0.60 Ala 1.42 0.83 0.66 Val 1.06 1.70 0.50 Ile 1.08 1.60 0.50 Tyr 0.69 1.47 1.14 Pro 0.57 0.55 1.52 Gly 0.57 0.75 1.56

(6)

Chou-Fasman method

A prediction is made for each type of structure for each amino acid

z Can result in ambiguity if a region has high

propensities for both helix and sheet (higher value usually chosen, with exceptions)

Chou-Fasman method

Calculation rules are somewhat ad hoc Example: Method for helix

z Search for nucleating region where 4 out of

6 a.a. have P_α > 1.03

z Extend until 4 consecutive a.a. have an

average P_α < 1.00

z If region is at least 6 a.a. long, has an

average P_α > 1.03, and average P_α > average P_β, consider region to be helix

(7)

Garnier-Osguthorpe-Robson

Uses table of propensities calculated primarily from structures determined by X-ray crystallography

Table consists of one “likelihood” for each structure for each amino acid for each position in a 17 amino acid

window

Garnier-Osguthorpe-Robson

Analogous to searching for “features” with a 17 amino acid wide frequency matrix

One matrix for each “feature”

z α-helix z β-sheet

z turn

z coil

Highest scoring “feature” is found at each location

(8)

Accuracy of predictions

Both methods are only about 55-65% accurate

A major reason is that while they consider the local context of each sequence element, they do not consider the global context of the sequence - the type of protein

z The same amino acids may adopt a different

configuration in a cytoplasmic protein than in a membrane protein

“Adaptive” methods

Neural network methods - train network using sets of known proteins then use to predict for query sequence

z nnpredict

Homology-based methods - predict structure using rules derived only from proteins

homologous to query sequence

z SOPM

(9)

Neural Network

methods

A neural network with multiple layers is presented with known sequences and structures - network is trained until it can predict those structures given those sequences

Allows network to adapt as needed (it can consider neighboring residues like GOR)

Neural Network

methods

Different networks can be created for different types of proteins

(10)

A C D E F G H I K L M N P Q R S T V W Y . H E L D (L) R (E) Q (E) G (E) F (E) V (E) P (E) A (H) A (H) Y (H) V (E) K (E) K (E)

Neural Network for

secondary structure

Homology-based

modeling

Principle: From the sequences of proteins whose structures are known, choose a subset that is similar to the query sequence

Develop rules (e.g., train a network) for just this subset

Use these rules to make prediction for the query sequence

(11)

Structural homology

It is useful for new proteins whose 3D structure is not known to be able to find proteins whose 3D structure is known

that are expected to have a similar structure to the unknown

It is also useful for proteins whose 3D structure is known to be able to find

other proteins with similar structures

Finding proteins with known

structures based on sequence

homology

If you want to find known 3D structures of proteins that are similar in primary amino acid sequence to a particular sequence, can use BLAST web page and choose the PDB database

This is not the PDB database of structures, rather a database of amino acid sequences for those proteins in the structure database

(12)

Finding proteins with similar

structures to a known protein

For literature and sequence databases,

Entrez allows neighbors to be found for a

selected entry based on “homology” in terms (MEDline database) or sequence (protein and nucleic acid sequence databases)

An experimental feature allows neighbors to be chosen for entries in the structure

database

Finding proteins with similar

structures to a known protein

Proteins with similar structures are termed “VAST Neighbors” by Entrez (VAST refers to the method used to evaluate similarity of structure)

VAST or structure neighbors may or may not have sequence homology to each other

(13)

PHD se c H L E 4+1 " " " " " " 20 4 4 4 output layer input layer hidden layer 20 4 4 4 21+3 " " " " " " H L E 0.5 0.1 0.4

percentage of each amino acid in protein length of protein (Š60, Š120, Š240, >240) distance: centre, N-term (Š40,Š30,Š20,Š10) distance: centre, C-term (Š40,Š30,Š20,Š10)

input global in sequence input local in sequence

local align-ment 13 adjacent residues ::: AAA AA. LLL LII AAG CCS GVV ::: global statist. whole protein %AA Length ² N-term ² C-term

A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.42 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.74 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.74 0 0 0 33 0 0 66 0 0 0.48 first level structure second level structure HTM non HTM output layer input layer hidden layer 20 4 4 4 21+3 " " " " " "

local align-ment 13 adjacent residues ::: AAA AA. LLL LII AAG CCS GVV ::: global statist. whole protein %AA Length ² N-term ² C-term

A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.42 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.74 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.74 0 0 0 33 0 0 66 0 0 0.48 HTM non HTM 3+1 " " " " " " 20 4 4 4 first level

sequence-to- second level

(14)

hidden layer 0 1 2 9 16 25 36 49 64 81 output layer input layer 20 4 4 4 21+3 " " " " " "

A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.42 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.74 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.74 0 0 0 33 0 0 66 0 0 0.48 local align-ment 13 adjacent residues ::: AAA AA. LLL LII AAG CCS GVV ::: global statist. whole protein %AA Length ² N-term ² C-term

first level only

PHD a c c

Retrieving 3D structures

Protein Data Bank (PDB)

z using web browser

zhome page = http://www.pdb.bnl.gov/ z using anonymous FTP

Entrez

z using web browser

BLAST

(15)

Displaying Structures

with RasMol

The GIF image of Ribonuclease A is static - we cannot rotate the molecule or recolor portions of it to aid visualization For this we can use RasMol, a public domain program available for wide range of computers, including MacOS, Windows and Unix

Displaying Structures

with RasMol

Drs. David Hackney and Will McClure have developed an online tutorial for RasMol - a link may be found on the 03-310, 03-311 and 03-510 web pages

(16)

PDB files

In order to optimally display, rotate and color the 3D structure, we need to

download a copy of the coordinates for each atom in the molecule to our local computer

The most common format for storage and exchange of atomic coordinates for biological molecules is PDB file format

PDB files

PDB file format is a text (ASCII) format, with an extensive header that can be read and interpreted either by programs or by people

We can request either the header only or the entire file; the next screen

(17)

http://www.pdb.bnl.gov/pdb-bin/opdbshort

(18)