• Non ci sono risultati.

Meta-statistics for biomarker selection in the omics sciences

N/A
N/A
Protected

Academic year: 2021

Condividi "Meta-statistics for biomarker selection in the omics sciences"

Copied!
15
0
0

Testo completo

(1)

As Different as Chalk and Cheese:

Meta-statistics for Biomarker Detection

Ron Wehrens Pietro Franceschi

Biostatistics and Data Management

http://cri.fmach.eu/BDM

StatSeq Workshop, Verona April 18–19, 2012

(2)

Biomarker identification (2-class situation)

G1

G2

G1

(3)

Biomarker identification (2-class situation)

G1

G2

G1

G2

(4)

Common approaches

Context Tools Challenges

Univariate statistical tests multiple testing

correlations

Multivariate PCLDA, PLSDA, VIP, ... model validation

variable selection optimization criterion

(5)

Common approaches

Context Tools Challenges

Univariate statistical tests multiple testing

correlations

Multivariate PCLDA, PLSDA, VIP, ... model validation

variable selection optimization criterion

local optima

(6)

Spike-in apple data

10 control apples

10 treated apples - spiked 9 chemical compounds 3 sets of concentrations ESI+ and ESI−

In each: 22 “true” biomarkers

Total: six comparisons between treated and control

(7)

One apple sample – the result

(8)
(9)

Higher criticism

“God loves 0.05 just as much as 0.06” – selecting optimal thresholds

Second-level of significance testing (Tukey, 1976) “Real differences are rare and weak”

Here: extension to multivariate methods Compare to “standard” thresholds:

α = 0.05,VIP= 1

Dohono and Jin, PNAS (2008) Wehrens and Franceschi – submitted.

(10)

Higher Criticism: theory

HC= max i √ N(i/N − pi) p i/N(1 − i/N) ● ● ● ● ● ● ● ●● ●● ●● ●● ●●● ●●●● ●●● ●●●●●●●● ●●●●●● ●● ●●●● ●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●● ● ●● 0.00 0.02 0.04 0.06 0.08 0.10 1.0 2.0 3.0 4.0 HC thresholding, alpha = 0.1 Ordered scores HCi

(11)

Results for apple data: selection efficiency

Efficiency (in %) 0.1 0.2 0.3 0.4 0.5 Negative ionization PCLDA (2 LVs) PCLDA (4 LVs) PCLDA (10 LVs) PLSDA (2 LVs) PLSDA (4 LVs) PLSDA (10 LVs) VIP (2 LVs) VIP (4 LVs) VIP (10 LVs) t test 0.1 0.2 0.3 0.4 0.5 Positive ionization Group 1 Group 2 Group 3

(12)

Stability-based biomarker selection

Central idea: good biomarkers are stable

Perturb data by subsampling (samples as well as variables)

Fit many models on many perturbed data sets Which variables come up consistently at the top? Parameters: defining subsampling, consistency

Wehrens et al., Anal. Chim. Acta (2011)

(13)

SBBS results

Simulations based on real apple data

0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.0 Scenario E FPR TPR ● ● ● ● ● ● ● PCLDA (4 PCs): CBBS PCLDA (4 PCs): SBBS t−test: CBBS t−test: SBBS 0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.0 Scenario E FPR TPR ● ● ● ● ● ● ● PLSDA (4 LVs): CBBS PLSDA (4 LVs): SBBS t−test: CBBS t−test: SBBS

(14)

Biomarker selection – conclusions

Both SBBS and HC form viable alternatives Much better than “standard” thresholds HC: very few parameters

... but time-consuming SBBS: faster

... but more settings

... and you need > 10 replicates

Both implemented in R package BioMark, available from http://cran.r-project.org

(15)

Acknowledgements

BDM: Pietro Franceschi Matthias Scholz Nir Shahaf Yonghui Dong Nikola Dordevic FQAN: Fulvio Mattivi Urska Vrhovsek Domenico Masuero

Riferimenti

Documenti correlati

Successivamente, la Segreteria del Gabinetto ha pubblicato un’altra ipotesi relativamente a quali saranno gli effetti del CPTPP, concludendo che l’abbassamento delle

Può cercare di avvicinarsi al personaggio quanto più gli è possibile, o al contrario può fare di quella distanza l’elemento caratterizzante la propria pre- senza in scena, ma

Metabolomic and epigenomic characterization of bivalves, on the other hand, still constitute emerging disciplines that would complete the necessary framework for

2 e 3 del Medio Sarno, comparati con quelli posti a base della progettazione e rimodulazione, evidenzia come alcune variabili di carattere analitico-demografico,

“ Studio degli scarichi anomali di origine industriale del comparto alimentare in ingresso all'impianto di depurazione di Scafati- S.Antonio Abate: effetti sul processo depurativo

74 However, no data are available on transgenerational effects of antimicrobial treatments, and, thus, on their potential to limit the larval development in unripe olive

Primo Primo Primo Primo componente componente componente componente Altri Altri Altri Altri componenti componenti componenti componenti >18, >18, >18,