• Non ci sono risultati.

Nozioni di base 1

N/A
N/A
Protected

Academic year: 2021

Condividi "Nozioni di base 1"

Copied!
19
0
0

Testo completo

(1)

Computational

Biology:

Basics & Interesting

Problems

Summary

Sources of information

Biological concepts: structure &

terminology

Sequencing

Gene finding

(2)

Sources of information

Too many sources ! Some selected lectures:

z Course on computational biology

z http://www.math.tau.ac.il/~rshamir/algmb.html

z Human Genome project

z http://genome.ucsc.edu/

z Artificial Intelligence and Molecular Biology

z http://www.aaai.org/Library/Books/Hunter/hunter.html

z Another course on Molecular Biology

z http://cmgm.stanford.edu/biochem218/

z Follow links in these sites

(3)

The Cell

Example: Tissues in

Stomach

(4)

DNA Components

Four nucleotide types: Adenine Guanine Cytosine Thymine Hydrogen bonds: A-T C-G

The Double Helix

Source: Alber

(5)

DNA Duplication

Source: Mat h ew s & van H olde

DNA Organization

Source: Alber ts et al

(6)

Genome Sizes

E.Coli (bacteria)

4.6 x 10

6

bases

Yeast (simple fungi)

15 x 10

6

bases

Smallest human chromosome

50 x 10

6

bases

Entire human genome

3 x 10

9

bases

Genes

The DNA strings include:

Coding regions (“genes”)

z E. coli has ~4,000 genes z Yeast has ~6,000 genes

z C. Elegans has ~13,000 genes z Humans have ~32,000 genes

Control regions

z These typically are adjacent to the genes z They determine when a gene should be

expressed

(7)

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides:

z Similar to DNA, slightly different backbone

z Uracil (U) instead of Thymine (T)

Source: Mat h ew s & van H olde

(8)
(9)

RNA roles

Messenger RNA (mRNA)

z Encodes protein sequences

Transfer RNA (tRNA)

z Adaptor between mRNA molecules and

amino-acids (protein building blocks) Ribosomal RNA (rRNA)

z Part of the ribosome, a machine for translating

mRNA to proteins

...

Transfer RNA

Anticodon:

matches a codon (triplet of mRNA nucleotides)

(10)

Translation

Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA molecules

The ribosome attaches to the mRNA at a translation initiation site

Then ribosome moves along the mRNA sequence and in the process constructs a

poly-peptide

When the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.

Translation

Source: Alber

(11)

Translation

Source: Alber ts et al

Translation

Source: Alber ts et al

(12)

Translation

Source: Alber ts et al

Translation

Source: Alber ts et al

(13)

Gli Aminoacidi

(14)

Genetic Code

Protein

Structure

Proteins are

poly-peptides of

70-3000 amino-acids

This structure is

(mostly)

determined by

the sequence of

amino-acids that

make up the

protein

(15)

Protein Structure

Evolution

Related organisms have similar DNA

z Similarity in sequences of proteins

z Similarity in organization of genes along

the chromosomes

Evolution plays a major role in biology

z Many mechanisms are shared across a

wide range of organisms

(16)

Evolution

Evolution of new organisms is driven by

Diversity

z Different individuals carry different variants of the

same basic blue print Mutations

z The DNA sequence can be changed due to single

base changes, deletion/insertion of DNA segments, etc.

Selection bias

Four Aspects

Biological

z What is the task?

Algorithmic

z How to perform the task at hand efficiently?

Learning

z How to adapt parameters of the task form

examples Statistics

(17)

Example: Sequence

Comparison

Biological

z Evolution preserves sequences, thus similar

genes might have similar function

Algorithmic

z Consider all ways to “align” one sequence against

another

Learning

z How do we define “similar” sequences? Use

examples to define similarity

Statistics

z When we compare to ~106sequences, what is a

random match and what is true one

Topics I

Dealing with DNA/Protein sequences:

Genome projects and how sequences are found

Finding similar sequences

Models of sequences: Hidden Markov Models Transcription regulation

(18)

Topics II

Gene Expression:

Genome-wide expression patterns

Data organization: clustering

Reconstructing transcription regulation

Recognizing and classifying cancers

Topics III

Models of genetic change:

Long term: evolutionary changes among species

Reconstructing evolutionary trees from current day sequences

Short term: genetic variations in a population Finding genes by linkage and association

(19)

Topics IV

Protein World:

How proteins fold - secondary & tertiary structure

How to predict protein folds from sequences data alone

How to analyze proteins changes from raw experimental measurements (MassSpec) 2D gels

A Computational Biology

Project

From DNA Chip data: individuate expressed genes

Collect DNA sequences of expressed genes Extract promoter regions of expressed genes from sequence

Characterize common regulatory signals in the promoter regions

Riferimenti

Documenti correlati

a low forage (LFM) or high forage (HFM) diet: (A) leptin (L), (B) adiponectin (A), (C) leptin:adiponectin (L/A) ratio, (D) tumor necrosis factor-α (TNF-α), (E) IL-1, (F) IL-10,

The nitrophile yeast Rhodotorula diobovata DSBCA06 has been identified and studied to define its nitrogen-based metabolism, optimize the assimilation of nitrogen compounds,

This work presents the results of a survey that reviews a number of tools meant to perform dense linear algebra at “Big Data” scale: namely, the proposed approach aims first to define

Combined pulmonary fibrosis and emphysema (CPFE) is a strong determinant of secondary pulmonary hypertension and further studies are needed to ascertain the aetiology, morbid-

La d´efinition de groupe de Galois motivique et l’application du th´eor`eme 8.17 de [D90], sont la raison pour laquelle on est oblig´e de supposer que le coprs k est de

La situazione generale di riduzione delle risorse disponibili,che impone una più attenta programmazione delle attività,con una gestione più efficiente e una

Further improvements with respect to the previous analyses include a larger integrated luminosity, higher center-of-mass energy, extension of the signal acceptance to the

All’interno della linea guida sono citate alcune scale di screening per il riconoscimento dello stato mentale del paziente: da una attenta ricerca abbiamo notato come alcune di