• Non ci sono risultati.

Stability selection for metabolomics data

N/A
N/A
Protected

Academic year: 2021

Condividi "Stability selection for metabolomics data"

Copied!
2
0
0

Testo completo

(1)

Challenges for integrated analysis

of omics datasets

April 27, 2013

Leiden University Medical Centre

(2)

Stability Selection for Metabolomics Data

Ron Wehrens

Computational Biology Fondazione Edmund Mach Italy

Abstract

In the field of metabolomics, one aims to obtain a holistic view of all small molecules (metabolites), hopefully providing information on relevant biological processes in the system under study. Data are typically measured by

hyphenated mass-spectrometry based detection, leading to thousands of variables for every sample. In the typical patient-control context, this means that we are performing many tests to find relevant differences between the two classes, with the unavoidable risk of false positives, or, alternatively, very little power.

Stability selection (Meinshausen and Buehlmann, 2010), using the lasso as a primary variable selection method, provides a way to avoid many of the false positives in biomarker identification by repeatedly subsampling the data, and only considering those variables as putative biomarkers that consistently show up as important. In our own work, we have shown that also selecting the largest coefficients in non-sparse regression models such as PLS works well, when combined with the stability selection framework (Wehrens et al. 2011). We support these claims with the analysis of several experimental and

simulated data sets.

In particular, the BioMark package for R (Wehrens and Franceschi, 2012), implementing stability selection as well as Higher Criticism thresholding, contains an experimental spike-in data set from the area of metabolomics, which can aid in further algorithm testing and development. From these analyses, it follows that stability selection is a very general and robust framework for variable selection.

Riferimenti

Documenti correlati

As a result, the Genetic Algorithm of Zhang et al., which gives the best solution to the Flexible Job Shop Problem, is transformed and adapted to a real time manu-

Some university staff, librarians and research support officers were pushed to embrace Open Science courses for accomplishing funders requirements (Open Access to scientific

Ševčenko nella versione che accompagna l’editio princeps dell’opuscolo (Ševčenko 1960, 222); e sulla medesima linea si sono invariabilmente mossi i traduttori

Amiata, Siena and in quiescent and active volcanic areas: Distribution in air and comparison with other atmospheric pollutants CAMPAGNOLA S.: Large-scale plinian eruptions of the

All’interno di uno strumento per valutare l’inclusione in corso di elaborazione, costituito da 3 dimensioni (culture, politiche, pratiche inclusive) il Repertorio degli Impegni