Computational
Biology:
Basics & Interesting
Problems
Summary
Sources of information
Biological concepts: structure &
terminology
Sequencing
Gene finding
Sources of information
Too many sources ! Some selected lectures:
z Course on computational biology
z http://www.math.tau.ac.il/~rshamir/algmb.html
z Human Genome project
z http://genome.ucsc.edu/
z Artificial Intelligence and Molecular Biology
z http://www.aaai.org/Library/Books/Hunter/hunter.html
z Another course on Molecular Biology
z http://cmgm.stanford.edu/biochem218/
z Follow links in these sites
The Cell
Example: Tissues in
Stomach
DNA Components
Four nucleotide types: Adenine Guanine Cytosine Thymine Hydrogen bonds: A-T C-GThe Double Helix
Source: Alber
DNA Duplication
Source: Mat h ew s & van H oldeDNA Organization
Source: Alber ts et alGenome Sizes
E.Coli (bacteria)
4.6 x 10
6bases
Yeast (simple fungi)
15 x 10
6bases
Smallest human chromosome
50 x 10
6bases
Entire human genome
3 x 10
9bases
Genes
The DNA strings include:
Coding regions (“genes”)
z E. coli has ~4,000 genes z Yeast has ~6,000 genes
z C. Elegans has ~13,000 genes z Humans have ~32,000 genes
Control regions
z These typically are adjacent to the genes z They determine when a gene should be
expressed
Transcription
Coding sequences can be transcribed to RNA
RNA nucleotides:
z Similar to DNA, slightly different backbone
z Uracil (U) instead of Thymine (T)
Source: Mat h ew s & van H olde
RNA roles
Messenger RNA (mRNA)
z Encodes protein sequences
Transfer RNA (tRNA)
z Adaptor between mRNA molecules and
amino-acids (protein building blocks) Ribosomal RNA (rRNA)
z Part of the ribosome, a machine for translating
mRNA to proteins
...
Transfer RNA
Anticodon:
matches a codon (triplet of mRNA nucleotides)
Translation
Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA molecules
The ribosome attaches to the mRNA at a translation initiation site
Then ribosome moves along the mRNA sequence and in the process constructs a
poly-peptide
When the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.
Translation
Source: Alber
Translation
Source: Alber ts et alTranslation
Source: Alber ts et alTranslation
Source: Alber ts et alTranslation
Source: Alber ts et alGli Aminoacidi
Genetic Code
Protein
Structure
Proteins are
poly-peptides of
70-3000 amino-acids
This structure is
(mostly)
determined by
the sequence of
amino-acids that
make up the
protein
Protein Structure
Evolution
Related organisms have similar DNA
z Similarity in sequences of proteinsz Similarity in organization of genes along
the chromosomes
Evolution plays a major role in biology
z Many mechanisms are shared across awide range of organisms
Evolution
Evolution of new organisms is driven by
Diversity
z Different individuals carry different variants of the
same basic blue print Mutations
z The DNA sequence can be changed due to single
base changes, deletion/insertion of DNA segments, etc.
Selection bias
Four Aspects
Biological
z What is the task?
Algorithmic
z How to perform the task at hand efficiently?
Learning
z How to adapt parameters of the task form
examples Statistics
Example: Sequence
Comparison
Biological
z Evolution preserves sequences, thus similar
genes might have similar function
Algorithmic
z Consider all ways to “align” one sequence against
another
Learning
z How do we define “similar” sequences? Use
examples to define similarity
Statistics
z When we compare to ~106sequences, what is a
random match and what is true one
Topics I
Dealing with DNA/Protein sequences:
Genome projects and how sequences are found
Finding similar sequences
Models of sequences: Hidden Markov Models Transcription regulation
Topics II
Gene Expression:
Genome-wide expression patterns
Data organization: clustering
Reconstructing transcription regulation
Recognizing and classifying cancers
Topics III
Models of genetic change:
Long term: evolutionary changes among species
Reconstructing evolutionary trees from current day sequences
Short term: genetic variations in a population Finding genes by linkage and association
Topics IV
Protein World:
How proteins fold - secondary & tertiary structure
How to predict protein folds from sequences data alone
How to analyze proteins changes from raw experimental measurements (MassSpec) 2D gels
A Computational Biology
Project
From DNA Chip data: individuate expressed genes
Collect DNA sequences of expressed genes Extract promoter regions of expressed genes from sequence
Characterize common regulatory signals in the promoter regions