• Non ci sono risultati.

Machine learning applied to vaccine research

N/A
N/A
Protected

Academic year: 2021

Condividi "Machine learning applied to vaccine research"

Copied!
33
0
0

Testo completo

(1)

Siena, Aprile 2016

Machine learning applied to vaccine research

Alessandro Brozzi, PhD Exploratory Data Analytics

Data Science & Clinical Systems

(2)

GSK in a nutshell

Ethical medicines

Main areas:

•  Breathing apparatus

•  Cardiovascular – metabolic

•  Immunology and infectious diseases

•  Central nervous system

•  Dermatology

•  Urology

•  HIV

•  Rare diseases

Vaccines

World leader in the prevention in childhood and adulthood.

Over 30 vaccines, including:

•  diphteria, tetanus, pertussis, hepatitis

A+B, polio, haemophilus influenzae, parotitis, meningitis, rotavirus, HPV and flu

Consumer Healthcare

Main areas :

•  dermatology (eg Physiogel)

•  oral hygiene (eg Iodosan, Aquafresh)

•  nutrition (eg Horlicks)

•  OTC (eg Zovirax, NiQuitin)

2

4 billion

packs in 2014

860 million

doses of vaccines in 2014

18 billion

packs in 2014

(3)

Our global presence

Hamilton

Marietta

Ste Foy

Rixensart

Wavre

Marburg

Rosia/Siena

Singapore

Shanghai Tian Yuan (JV)

Nashik

Ankleshwar Gödollö

Saint-Amand-Les-Eaux

Dresden

Moscow

Rockville

R&D Hubs Manufacturing

Facilities

(4)

Research and Development

Siena

75 clinical

studies in 2014

R&D Center

generated some of the most innovative vaccines, included

MenB

€1,3 billion

investments in R&D managed from

Siena between 2006-2015

182 people in

Research

163 in

Development

(5)

Introduction

(6)

In scope:

-  present the biological problem to be addressed by ML -  present results of a case study

Out of scope

-  the mathematical and theoretical aspects behind ML -  the formal comparisons between ML models

6

What is in scope and out of scope of this presentation

(7)

Essential bibliography

7 00 Month 0000

Presentation title in footer

(8)

A vaccine is to convince our immune system to treat as an invading pathogen an harmless substance

8

What is a vaccine?

(9)

9

Microrganisms: bacteria and viruses

Who is an invading pathogen?

(10)

10

Penetrance and multiplication

How a so small organism might harm

Staphylococcus aureus

tonsillitis

(11)

11

Immune system cells and antibodies

How our organism defend itself

(12)

Four most common types of vaccines

12

subunit vaccines

attenuated microrganism killed microrganisms

fractions of microrganisms

pathogen

harmless

(13)

Yes ok, but which subunit?

13

(14)

Car metaphor

14

mechanical pieces = proteins

> 2000 subunits

(15)

Experimental procedure to select subunit candidates

15

time 5 -15 years

other assays

(16)

Experimental standard procedure

16 00 Month 0000

Presentation title in footer

experimental result

etc…

candidate

Main issues:

Time consuming High costs

Pathogen specific

(17)

The advent of genomics

(18)

DNA sequence and protein information

18

(19)

In-silico pipeline

19

length

number of structures

localization

ATFLPRYNDIRQQFYHNFRGKW WCFCQNDMVQMEYRALIKSVAD YDMGLRSFKKTRGMHPMKQYYG LMEVMQQAYDAIECTSPSRDFG GFDICVRFAWEYKADAYMYAPK TEQIVLPTFN

hydrophobicity

other features

Bioinformatic programs

(20)

Data matrix

20

length 100

150

30

20

# helices 3

0

1

3

localization membrane

membrane

nuclear

nuclear

experimental outcome

Independent variables

(21)

Machine learning

21

Breiman, 2001

00 Month 0000 Presentation title in footer

nature X

(independent variables) Y

(outcome)

unknown X

Y

Neural networks SVM

Random forests Naïve Bayes

f(x)

(22)

Siena, Aprile 2016

Case study

(23)

Study in silico vaccine candidates

23

University of Technology, Sidney

(24)

Dataset

•  organisms of 4 different species

•  923 proteins of known experimental results to train the models

•  140 proteins of known experimental results to test the models

•  7 protein features

24

General characteristics

(25)

Data matrix

25

[ 923 ] …

[ 7 ]

(26)

Results

(27)

Single rule

27

Only numerical features

feature exp.

protein

(28)

Duble rule

28

(29)

Machine learning algorithms

29

(30)

Results

30

method sensitivity specificity

single rule 0,96 0,73

double rule 0,43 0,97

method sensitivity specificity

neural networks 0,97 0,97

naïve bayes 1 0,98

k-nearest neighbor 0,92 0,97

random forest 1 0,99

adaptive boosting 1 0,98

decision tree 1 0,97

svm 0,93 0,98

(31)

General overview

31

(32)

Conclusions

•  Strong need to use information gathered in the past to guide experiments in the future

•  Need for a procedure general for every pathogen

•  ML is better than basic analyses

•  A pool of algorithms might be a solution to increase efficiency Issues:

•  Commonly dealing with very rectangular matrices (p >> n)

•  Heterogeneous input data: categorical and numerical

•  Noise effect and feature selections

32

(33)

In the future

•  Make the program that infer protein features more precise

•  Effort to unify in a single repository all the experimental data available

33

Riferimenti

Documenti correlati

Referring to Figure 9.1 and Figure 9.4 it is possible to see that 4-3, has a lower performance even if compared to specimen taken from plate three but showing a visible defect,

Se  quindi  si  è  prima  detto  che  la  tendenza  altro  non  è  che  un  movimento  che  si  forma  come  risposta  contraria  a  qualcosa  che 

The joint action of these two effects within the multilateral and the regional trade systems gives rise to the result that, for the same number of direct trade partners, the R&D

In its November report, the OECD assumes lower global growth than the IMF estimates for the current year (two tenths lower), while for 2014 its forecast is at 3.6%!. The Expected

For obtaining GNRSs with LSPR peaks in the NIR region, namely with high values of R, as already said, 5BrSA can be used in addition to CTAB and this allows to reduce CTAB

Come conferma il sogno del bardo Bracy, Christabel è dunque la colomba, la vergine, l’innocente, mentre Geraldine è un coagulo di archetipi malvagi: Caino, 25 il biblico

In Experiment 1, the oddball paradigm consisted of objectified female and male targets; doll-like avatars reflected the infrequent stimuli that appeared within an array of

In un caso di revisione è stata anticipata la stabilizzazione posteriore mentre in tutti gli altri casi la stabilizzazione è stata eseguita nella stessa seduta chirurgica in