• Non ci sono risultati.

The bias in the usage of the synonymous codons correlates with the abundance of the corresponding tRNAs

N/A
N/A
Protected

Academic year: 2021

Condividi "The bias in the usage of the synonymous codons correlates with the abundance of the corresponding tRNAs"

Copied!
1
0
0

Testo completo

(1)

Final exam Computational Biology – BIA – winter 2012

Unequal usage of codons in the coding regions appears to be a universal feature of the genomes across the phylogenetic spectra. This bias results mainly in the uneven usage of the amino acids in the existing proteins and the uneven usage of synonymous codons.

The bias in the usage of the synonymous codons correlates with the abundance of the corresponding tRNAs.

By comparing the frequency of codons in a region of a species' genome, read in a given frame with the typical frequency of codons in the species genes, it is possible to estimate a likelihood of the region to be coding for a protein in such a frame. Regions in which codons are used with frequencies similar to the typical species codon frequencies are likely to code for proteins (or part of proteins) while regions which codons distribution is uniform -- 1/64 (0.015625) -- could be considered as non-coding regions (introns, intergenic or non protein coding exons like exons corresponding to untranslated regions of the transcript).

Let S be a sequence of DNA, the higher the number of codons in S matching the more frequent codons in the table, the higher the coding potential of the sequence S (the probability to be encoding a protein). This value can be computed as:

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

where C1, C2, C3, … , Cm are the m codons. This expression is known as likelihood ratio and it is usually computed in logarithmics terms to reduce the magnitude of the numbers (log-likelihood ratio).

Write a Perl script able to test the coding potential of a single sequence stored into a FASTA formatted file. You will be given 3 files:

Fasta1.txt, Fasta2.txt and HCUT.txt . The HCUT.txt file contains the human codon usage table. Fasta1.txt and Fasta2.txt are files containing ‘unknown’ human sequences.

- The objective of this exam is to decide if a protein coding region is contained in Fasta1.txt or Fasta2.txt using a Perl script and to find evidences supporting your conclusions. (I suggest the use of the blat aligner available in the UCSC genome browser: http://genome.ucsc.edu/cgi-bin/hgBlat?command=start ).

- The script should be able to compute the coding potential for each of all the possible reading frames. Why?

- Once identified the (eventually present) protein coding region use the fasta for a BLAT or BLAST search against the human genome and find the IDENTIFIER and the EXON(S) spanned by the alignment.

- Write a very short report containing the pseudocode (or the equivalent flowchart) of the proposed solution, the output produced by the script using the fasta files as input, your interpretation of the produced results. In the case a putative protein coding region is found try to blat the fasta sequence in order to verify if it can be aligned to a protein coding gene (and put in the report the identifier of this gene).

The files are available at:

http://homes.di.unimi.it/~re/Corsi/CB13mat/FinalExam_Erasmus/

Riferimenti

Documenti correlati

In the fourth chapter, the behavior of entanglement in open quantum systems is described: in particular, a two-qubit system is studied, analyzing both entanglement generation

Not all smooth closed manifolds are acted upon by Anosov diffeomorphisms or expanding maps: one class of manifolds which have been investigated with reference to this problem, see

Solution proposed by Roberto Tauraso, Dipartimento di Matematica, Universit`a di Roma “Tor Vergata”, via della Ricerca Scientifica, 00133 Roma, Italy.. Given a quotient

The breast epithelial cells and the stroma of the lobules type 1 found in the breast tissue of postmenopausal parous women has a genomic signature different from similar

 Design and implement a Kalman filter providing an estimate of the population in each class, by using the input data u(t) and the measurements y(t)..  Plot the obtained estimates

Il saggio analizza i percorsi politici di Clara Sereni e Luca Zevi, due ebrei romani nati nella seconda metà degli anni Quaranta.. Nelle loro storie familiari, la lotta antifascista

The purpose of this thesis is the study and analysis of a specic industrial drying system for the production of tissue paper using the software Matlab.. First, equations which

Those activities which work SHOULD SEE THEIR ROLE AS SUPPORTING THE FUNDAMENTAL PURPOSE AND AIMS OF THE MUSEUM and should respect the limitations this may put on