• Non ci sono risultati.

Exploiting Randomness in Neural Networks

N/A
N/A
Protected

Academic year: 2022

Condividi "Exploiting Randomness in Neural Networks"

Copied!
25
0
0

Testo completo

(1)

Exploiting Randomness in Neural Networks

Daniele Di Sarli

Mauriana Pesaresi seminars - 2020

(2)

Recurrent Neural

Network

(3)

Backpropagation Through Time

error = (output – expected)2

∂error

∂W

(4)

…, 3, 2, 1.5, 0.75, 1, -2.3, 4, … PATTERNS, INTERACTIONS

PREDICTION

(5)

Reservoir Readout

(6)

Reservoir Readout

(7)

Reservoir Readout

Echo State Network

(8)

(a) (b)

(c) (d)

(9)

Cover’s theorem

(10)

Echo State Property

(11)

Echo State Network starter pack

1. Randomly initialize the weights (sparse)

2. Rescale the weights to guarantee contractivity of the state transition function (=> ESP)

3. Feed data, collect states

4. Compute optimal linear regression parameters

(12)

«RC […] provides explanations of why biological brains can carry out accurate computations with an “inaccurate” and noisy physical substrate»

In the primary visual cortex, «computations are performed by complex dynamical systems while information about results of these computations is read out by simple linear classifiers.»

— Nikolić et al.

— Lukoševičius et al.

(13)

My work

(14)

From Strubell, E., Ganesh, A., McCallum, A.:

Energy and Policy Considerations for Deep Learning in NLP

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

0 100000 200000 300000 400000 500000 600000 700000 Transformer w/ neural arch. search

Car, avg incl. fuel, 1 lifetime

CO2 emissions (lbs)

LSTM GRU Transformer BERT

Natural Language Processing

(15)

‘My input sentence’ RNN

-0.76, 0.35, -0.02

sentence embedding linear classifier input sequence

(word embeddings)

Text Classification pipeline

TRAINING

(16)

‘My input sentence’ ESN

-0.76, 0.35, -0.02

sentence embedding linear classifier input sequence

(word embeddings)

Text Classification pipeline

TRAINING

(17)

Question Classification

What was the name of the first Russian astronaut to do a spacewalk?

HUMAN

What's the tallest building in New York City?

LOCATION

… also ABBREVIATION, ENTITY, DESCRIPTION, and NUMERIC VALUE

(18)

Improvements are needed

Bidirectional

• Attention

• Multi-ring

What's the tallest building in New York City?

(19)

Improvements are needed

• Bidirectional

Attention

• Multi-ring

What's the tallest building in New York City?

(20)

Improvements are needed

• Bidirectional

• Attention

Multi-ring

(21)

Results

88 90 92 94 96 98 100

Ada-CNN

Bi-LSTM

Paragraph Vector

Transformer + CNN

Bi-GRU

Bi-ESN

Bi-ESN (ensemble)

Bi-ESN-Att

Accuracy

ours

< 1.6M params 200M+ params,

heavy transfer learning

(22)

Results

0 100 200 300 400 500 600

Bi-GRU Bi-ESN Bi-ESN

(ensemble) Bi-ESN-Att

Training time

6 sec 7.5 min

88 90 92 94 96 98 100

Ada-CNN

Bi-LSTM

Paragraph Vector

Transformer + CNN

Bi-GRU

Bi-ESN

Bi-ESN (ensemble)

Bi-ESN-Att

Accuracy

ours

< 1.6M params 200M+ params,

heavy transfer learning

(23)

 How  old  was  the  youngest  president  of  the  United  States  ?   When  was  Ulysses  S.  Grant  born  ? 

 Who  invented  the  instant  Polaroid  camera  ?   What  is  nepotism  ? 

 Where  is  the  Mason/Dixon  line  ?   What  is  the  capital  of  Zimbabwe  ?   What  are  Canada  's  two  territories  ? 

 What  was  the  name  of  the  plane  Lindbergh  flew  solo  across  the  Atlantic  ?   What  is  genocide  ? 

 What  continent  is  Argentina  on  ? 

 What  monastery  was  raided  by  Vikings  in  the  late  eighth  century  ?   What  is  an  earthquake  ? 

 Where  is  the  tallest  roller  coaster  located  ?   What  are  enzymes  ? 

 Who  discovered  oxygen  ? 

 What  is  bangers  and  mash  ? 

 What  is  the  name  given  to  the  Tiger  at  Louisiana  State  University  ?   Where  are  the  British  crown  jewels  kept  ? 

 Who  was  the  first  person  to  reach  the  North  Pole  ?   What  is  an  ulcer  ? 

 What  is  vertigo  ? 

 What  is  the  spirometer  test  ? 

 When  is  the  official  first  day  of  summer  ?   What  does  the  abbreviation  SOS  mean  ?   What  is  the  smallest  bird  in  Britain  ? 

 Who  invented  Trivial  Pursuit  ? 

 What  gasses  are  in  the  troposphere  ? 

 Which  country  has  the  most  water  pollution  ?   What  is  the  scientific  name  for  elephant  ? 

 Who  is  the  actress  known  for  her  role  in  the  movie  ``  Gypsy  ''  ?   What  breed  of  hunting  dog  did  the  Beverly  Hillbillies  own  ? 

 What  is  the  rainiest  place  on  Earth  ? 

 Who  was  the  first  African  American  to  win  the  Nobel  Prize  in  literature  ?   When  is  St.  Patrick  's  Day  ? 

 What  was  FDR  's  dog  's  name  ? 

 What  colors  need  to  be  mixed  to  get  the  color  pink  ?   What  is  the  most  popular  sport  in  Japan  ? 

 What  is  the  active  ingredient  in  baking  soda  ?   When  was  Thomas  Jefferson  born  ? 

 How  cold  should  a  refrigerator  be  ?   When  was  the  telephone  invented  ?   What  is  the  most  common  eye  color  ? 

 Where  was  the  first  golf  course  in  the  United  States  ?   What  is  schizophrenia  ? 

 What  is  angiotensin  ? 

file:///Users/daniele/mnt/tesla2/Research/scripts/qc_attention/...

1 di 2 11/06/19, 16:59

(24)

Wrap up

• A path towards efficient, effective ML models must be taken

• Heavier understanding/exploitation of the architectural properties of RNN models can help towards that goal

• Analysis is preliminary, but WIP results are encouraging

(25)

References

1. Di Sarli, D., Gallicchio, C., & Micheli, A. (2019, November). Question Classification with Untrained Recurrent Embeddings. In International Conference of the Italian Association for Artificial Intelligence.

2. Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science.

3. Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review.

4. Nikolić, D., Haeusler, S., Singer, W., & Maass, W. (2007). Temporal

dynamics of information content carried by neurons in the primary

visual cortex. In Advances in neural information processing systems.

Riferimenti

Documenti correlati

Nanofilled composite resins combined with the use of an etch-and-rinse technique and three-step adhesives posi- tively influence the longevity of direct composite restor- rr

One study investigated the anti-adhesive efficacy of polyethylene glycol amine plus dextran aldehyde polymers and reported a statistically significant decrease in the total score

emergencia. Lo vivido, al contrario, demuestra graves carencias endémicas y que lo que ha hecho y sigue haciendo falta en Italia es un sistema fundado sobre

A total of 598 (341 TGCA, 30 OSU, and 227 ICGC) patients were included in the final multivariable Cox model and the mutation status of genes in the TP53 pathways, the LIHC

neuroni del RMTg identificati in base alla loro risposta eccitatoria, sia alla stimolazione della LHb che al pinzettamento della zampa (Lecca et al., 2011); ii)

For what concerns the Tikhonov algorithm, the main difficulty is the com- putation of the regularization parameter: if the gcv function computes a good regularization parameter,

Consequently, the research activities will be more and more focused on the following topics: smart grids and polygeneration microgrids, renewable power plants and storage