• Non ci sono risultati.

Chapter 1

N/A
N/A
Protected

Academic year: 2021

Condividi "Chapter 1"

Copied!
3
0
0

Testo completo

(1)

1   

Chapter 1

Introduction

In the “post genomic era”, started with the completion of the sequencing of the human genome, the huge amount of information describing the gene sequences of organisms have to be analyzed and interpreted in order to unravel the mechanisms at the basis of many complex cellular behaviours.

This very challenging task will certainly involve multi-disciplinary investigations and will benefit from the new technologies and laboratory techniques that are now emerging. By following the flow of the genomic information, the first logical step has been to shift from the genomic level to the proteomic level [25] and subsequently the focus will probably move towards other fields like metabolomics (i.e. the study of the metabolites within an organism). In fact, proteins can be seen as the principal “tools” that are employed by cells to perform all the operations needed to survive, such as detecting changes in the environment, producing energy to sustain life and organize single entities in more complex structures like tissues and organs [12].

As stated [25], “proteomics is not only the systematic separation, cataloguing and study of all of the proteins produced in an organism, it is also the study of how proteins change structure, interact with other proteins, and ultimately give rise to disease or health in an organism”.

Comprehensive proteomics studies will eventually highlight the crucial role played by proteins in cellular dynamics. Given a particular sample under investigation, the first step in a proteomics study is to identify its protein content. This task can be accomplished by various techniques, one of the most widely used nowadays is mass spectrometry [2].

The idea is to infer the identity of proteins based on the mass/charge of fragments of the peptides normally obtained from proteins by enzymatic cleavage (such as by using trypsin). Characteristic profiles can therefore be obtained by using mass spectrometers for each peptide present in the original sample and are afterwards combined to obtain evidence of the proteins forming the initial mixture. This

(2)

2   

approach is usually known as Peptide Mass Fingerprinting (PMF). Two mass spectrometers can be coupled in so called Tandem Mass Spectrometry, whereby more specific profiles (spectra) can be obtained for the proteins within the sample. Although very expensive, the Tandem Mass spectrometers available today can process very complex samples and produce a huge amount of spectra that need to be analyzed and identified afterwards. This requires the use of some bioinformatics tools to process all the spectra produced by the instrument with the purpose to obtain the list of proteins present in the initial sample (together with some confidence measure).

The most widely used method consists in the comparison of the observed spectra with some theoretical spectra obtained from in silico generated peptides derived from particular databases (such as those provided at www.ensembl.org). Due to the large amount of data involved, this process is usually carried on by automatic pipelines such as the Genome Annotating Proteomics Pipeline [2] developed by the Bioinformatics group of the University of Cranfield.

These systems are able to deliver a large amount of protein identifications in a reasonable time. This introduces the problem of how to visualize this kind of information to the end user, who typically wants some evidence for the results displayed by the computer. This is a very important aspect because in this stage all the information collected by means of mass spectrometry is transferred into knowledge to the end user.

The work of this thesis is particularly focused on this aspect and its main purpose is to deliver, in an easy to use and accessible way, to the users the results of the GAPP proteomics pipeline developed at Cranfield University. Firstly, it required collaboration with experts in the field of proteomics to identify the most convenient way to show the information. The technical work involved the development of extensive data mining algorithms to retrieve the information from the GAPP database and present it in a useful way.

This thesis is organized in the following main chapters:

• Chapter 2: Proteomics data analysis. Mass spectrometry is discussed as a technique to carry on proteomics analysis of biological samples. The

(3)

3   

principal instruments used nowadays are showed together with the principles of their functioning.

• Chapter 3: Pipelines and GAPP. The state of the art for bioinformatics proteomics pipelines is reviewed in this chapter. Particular attention is devoted to the system used as data source for the work presented in this thesis, the Genome Annotating Pipeline (GAPP).

• Chapter 4: Novel GAPP views. The novel work carried out during this thesis is presented. In particular, the three different data views identified by experts as necessary to improve the usability of GAPP system are described. They are: Experiment view, Protein view and Differential View and respectively contain information on experiments performed in the GAPP, on the proteins identified and allow differential comparisons between different experiments present in the system. Their main aim is to provide proteomics information in an immediate and easy to understand way, for this reason several graphic elements have been used. The most challenging view was the latter since it can involve a lot of proteins. The idea was therefore to make the view specific by using Gene Ontology annotations to filter proteins by their features.

• Chapter 5: Conclusions. The conclusions of the work are presented and some possible future extensions are suggested.

Riferimenti

Documenti correlati

The expression “audiovisual translation” refers to all those translation techniques that aim to transfer the original dialogues of an audiovisual product into another

In questa tesi vengono proposte estensioni del linguaggio LAILA di coordinamento tra agenti logico/abduttivi, con particolare riguardo alle primitive di comunicazione e ai

The final table, holding the informations to obtain non linear process plans are achieved by the utilization of the two software.. The future development is the

Similarly, each time we add another level of random shortcuts to the skip list, we cut the search time roughly in half, except for a constant overhead, so after O(log n) levels,

Giacomo de Angelis, INFN - LNL, Legnaro Domenico Di Bari, University of Bari Alessandro Drago, University of Ferrara Carlotta Giusti, University of Pavia Nunzio Itaco, University

Currently, the null-type bridge is rather not often used as a measuring device, while the deflection-type bridge circuit is commonly used as the conditioning circuit enabling

2) In the figure are reported the results of a life cycle assessment (LCA) of some polymers following the TRACI impact categories. Explain why profiles are so different, in

The estimated contact resistance is the sum of three different terms: current leads copper resistance, Indium resistance (Indium is placed between the contact