Introduction to Deep Learning and Tensorflow
Day 2
MR ST LF FS SO IB
CINECA Roma - SCAI Department Marco Rorro Stefano Tagliaventi Luca Ferraro Francesco Salvadore Sergio Orlandini Isabella Baccarelli
Rome, 7th-9th Nov 2018
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2Deep topics
Norm Penalties Early stopping Dropout
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2
Deep topics
Norm Penalties Early stopping Dropout
Batch Normalization
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Deep learning
• Representation learning: extract high-level abstract features from raw data (i.e. pixels composing an image).
• Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations.
• Deep learning allows the computer to build complex concepts out of simpler concepts.
• Deep Learning try to solve the problem of the curse of dimensionality.
• in very high-dimensional spaces, traditional machine learning algorithms fail to generalize.
• fail to extract complicated function
• high-computational costs.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Deep Learning
A recently published review
1can help on summarizing main aspects of deep learning.
1
Models are composed of multiple processing layers:
• multiple layers of abstraction to learn data representations.
2
Improved state-of-the-art in:
• speech recognition, object recognition, object detection;
• drug discovery, genomics.
3
Discovers complex patterns in large datasets:
• backpropagation to change layer parameters;
• representation in each layer is based on previous layer results;
4
Specialized networks for different data;
• deep convolutional networks: image, video, speech;
• recurrent networks: sequential data (text, speech).
1Yann LeCun, Yoshua Bengio, Geoffrey Hinton,Deep Learning, Nature 2015
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Limits of conventional ML techniques
LeCun Bengio and Hinton stress that:
• conventional machine-learning techniques were limited in their ability to process natural data in their raw form;
• feature extraction is a necessary step for transforming raw data into an internal representation;
• considerable domain expertise is needed to pick a representation suitable to the task.
On the other side, they consider deep learning methods as representation-learning methods.
"Representation learning is a set of methods that allows a machine to be
fed with raw data and to automatically discover the representation needed
for detection or classification"
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Limits of conventional ML techniques
LeCun Bengio and Hinton stress that:
• conventional machine-learning techniques were limited in their ability to process natural data in their raw form;
• feature extraction is a necessary step for transforming raw data into an internal representation;
• considerable domain expertise is needed to pick a representation suitable to the task.
On the other side, they consider deep learning methods as representation-learning methods.
"Representation learning is a set of methods that allows a machine to be
fed with raw data and to automatically discover the representation needed
for detection or classification"
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Limits of conventional ML techniques
LeCun Bengio and Hinton stress that:
• conventional machine-learning techniques were limited in their ability to process natural data in their raw form;
• feature extraction is a necessary step for transforming raw data into an internal representation;
• considerable domain expertise is needed to pick a representation suitable to the task.
On the other side, they consider deep learning methods as representation-learning methods.
"Representation learning is a set of methods that allows a machine to be
fed with raw data and to automatically discover the representation needed
for detection or classification"
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Limits of conventional ML techniques
LeCun Bengio and Hinton stress that:
• conventional machine-learning techniques were limited in their ability to process natural data in their raw form;
• feature extraction is a necessary step for transforming raw data into an internal representation;
• considerable domain expertise is needed to pick a representation suitable to the task.
On the other side, they consider deep learning methods as representation-learning methods.
"Representation learning is a set of methods that allows a machine to be
fed with raw data and to automatically discover the representation needed
for detection or classification"
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Limits of conventional ML techniques
LeCun Bengio and Hinton stress that:
• conventional machine-learning techniques were limited in their ability to process natural data in their raw form;
• feature extraction is a necessary step for transforming raw data into an internal representation;
• considerable domain expertise is needed to pick a representation suitable to the task.
On the other side, they consider deep learning methods as representation-learning methods.
"Representation learning is a set of methods that allows a machine to be
fed with raw data and to automatically discover the representation needed
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2
Deep topics
Norm Penalties Early stopping Dropout
Batch Normalization
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Neural networks as representation learning methods
Data flow:
• input: raw data;
• output: detection/classification distribution probabilities;
• in the process: a layer is fed with data representation learned from previous layer.
Key aspects:
• no a-priori design of features;
• they are learned from data using a general purpose procedure.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Image Example
edges motifs familiar objects
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2
Deep topics
Norm Penalties Early stopping Dropout
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Deep learning main results (I)
• Good at discovering intricate structures in high-dimensional data.
• Exhibits superior performances (compared to other ML techniques):
• image and speechrecognition;
• prediction of the activity of potentialdrug molecules;
• analyzingparticle accelerator data;
• reconstructingbrain circuits;
• predicting the effects ofmutationsin non-coding DNA on gene expression and disease.
• Shows promising results in natural language processing (NLP):
• topic classification, sentiment analysis, question answering and language translation.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Deep learning main results (II)
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Accuracy on Speech Recognition
Source: Huang, Baker, Reddy,A Historical Perspective of Speech Recognition, Communications of the ACM, January 2014
GMM: Gaussian Mixture Models, HMM: Hidden Markov Models, DNN: Deep Neural Networks
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
How deep is deep learning?
Number of layers in ILSVRC (ImageNet Large Scale Visual Recognition Competion) winners,
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2
Deep topics
Norm Penalties Early stopping Dropout
Batch Normalization
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
How deep learning works?
In the following, we will see:
• the effect of adding a fully connected layer to an existing classifier;
• the effect of describing our data in a “wider” hyperspace.
Idea from a blog post: Olah, Neural Networks, Manifolds, and Topology:
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
How deep learning works?
In the following, we will see:
• the effect of adding a fully connected layer to an existing classifier;
• the effect of describing our data in a “wider” hyperspace.
Idea from a blog post: Olah, Neural Networks, Manifolds, and Topology:
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
2d example (I)
Define a simple network:
Input #1 Input #2
1
Output w1
w2
w3
Input layer
Output layer
o
i=< [x
iy
i] , [w
1w
2] > +w
3Labeled observations: ∀i (x
i, y
i) → l
ioptimize: w = arg min P
i
(l
i− o
i)
2Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
2d example (I)
Define a simple network:
Input #1 Input #2
1
Output w1
w2
w3
Input layer
Output layer
o
i=< [x
iy
i] , [w
1w
2] > +w
3Labeled observations: ∀i (x
i, y
i) → l
ioptimize: w = arg min P
i
(l
i− o
i)
2Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
2d example (I)
Define a simple network:
Input #1 Input #2
1
Output w1
w2
w3
Input layer
Output layer
o
i=< [x
iy
i] , [w
1w
2] > +w
3Labeled observations: ∀i (x
i, y
i) → l
ioptimize: w = arg min P
i
(l
i− o
i)
2Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
2d example (I)
Define a simple network:
Input #1 Input #2
1
Output w1
w2
w3
Input layer
Output layer
o
i=< [x
iy
i] , [w
1w
2] > +w
3Labeled observations: ∀i (x
i, y
i) → l
ioptimize: w = arg min P
i
(l
i− o
i)
2Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
2d example (II)
Add an hidden layer:
Input #1
Input #2
1
1
1
Output w1,1
w2,1
w1,2
w2,2
b1
b2
w1
w2
w3
Hidden layer Input
layer
Output layer
o
i=< f [x
iy
i]
w
1,1w
1,2w w
+
b
1b
T!
, [w
1w
2] > +w
3Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
2d example (II)
Add an hidden layer:
Input #1
Input #2
1
1
1
Output w1,1
w2,1
w1,2
w2,2
b1
b2
w1
w2
w3
Hidden layer Input
layer
Output layer
o
i=< f [x
iy
i]
w
1,1w
1,2w
2,1w
2,2+
b
1b
2 T!
, [w
1w
2] > +w
3Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Hidden layer: evaluated features
Input #1 Input #2
1 1
1
Output w1,1
w2,1 w1,2 w2,2 b1 b2
w1 w2 w3 Hidden
layer Input layer
Output layer
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Hidden layer: evaluated features
Input #1 Input #2
1 1
1
Output w1,1
w2,1 w1,2 w2,2 b1 b2
w1 w2 w3 Hidden
layer Input layer
Output layer
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Increase the dimensionality
• It is impossible for a neural network to classify this dataset without having a layer that has 3 or more hidden units, regardless of depth
• Even if it can get an 80% of classification accuracy
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Increase the dimensionality
• It is impossible for a neural network to classify this dataset without having a layer that has 3 or more hidden units, regardless of depth
• Even if it can get an 80% of classification accuracy
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Neural Network Internals: classification problem
• Given a single input, a trained neural network is able to predict a distribution of probability:
y = f (x; W ) ˆ
input NN parameters
prediction
• NN paramaters are chosen in order to minimize the average error on a given training set n
x
(i), y
(i)o
i=1,...,N
: J(W ) = 1
N
N
X
1
L
f (x
(i); W ), y
(i)• a Stochastic Gradient Descent (SGD) algorithm step is: W
(k +1)= W
(k )−
kg ˆ
(k )where ˆ g
(k )= 1
n ∇
WX
l∈
batch
L(f (x
(l); W
(k )), y
(l)), n = #batch
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Neural Network Internals: classification problem
• Given a single input, a trained neural network is able to predict a distribution of probability:
y = f (x; W ) ˆ
• NN paramaters are chosen in order to minimize the average error on a given training set n
x
(i), y
(i)o
i=1,...,N
: J(W ) = 1
N
N
X
1
L
f (x
(i); W ), y
(i)• a Stochastic Gradient Descent (SGD) algorithm step is: W
(k +1)= W
(k )−
kg ˆ
(k )where ˆ g
(k )= 1
n ∇
WX
l∈
batch
L(f (x
(l); W
(k )), y
(l)), n = #batch
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Neural Network Internals: classification problem
• Given a single input, a trained neural network is able to predict a distribution of probability:
y = f (x; W ) ˆ
• NN paramaters are chosen in order to minimize the average error on a given training set n
x
(i), y
(i)o
i=1,...,N
: J(W ) = 1
N
N
X
1
L
f (x
(i); W ), y
(i)• a Stochastic Gradient Descent (SGD) algorithm step is:
W
(k +1)= W
(k )−
kg ˆ
(k )where ˆ g
(k )= 1
n ∇
WX
L(f (x
(l); W
(k )), y
(l)), n = #batch
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Backpropagation
Image from Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Deep Learning, Nature 2015.
See also:
https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2
Deep topics
Norm Penalties Early stopping Dropout
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Convolutional layer (I)
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Convolutional layer (I)
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Convolutional layer (II)
• Convolution Layer output: H −
K −12, W −
K −12with stride = 1 and without padding
• For each filter we have K × K × 3 weights
• The filter convolves over all spatial locations, producing a scalar for each location
• An activation function is finally applied
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Pooling layer
• makes the representations smaller and more manageable
• operates over each activation map independently
• Example: max pooling computes the maximum with K × K filter
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Dropout layer
• In each forward pass, randomly set some neurons to zero.
• Probability of dropping is a hyperparameter; 0.5 is common
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Convolutional network
• LeNet
2. The first successful convolutional neural network
• Designed to identify hand-written digits in the MNIST dataset
• LeNet-5 takes a single-channel 2D input
• Performs 6 convolution (5 × 5) , then subsamples by max-pooling (2 × 2).
• The convolution-pooling layer sequence occurs again
• Finally 2 fully connected layer followed by a fully connected softmax layer is performed
2Yann Lecun and LÃ c on Bottou and Yoshua Bengio and Patrick Haffner, Gradient-based
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
CNN comparison
Figure 1:Top1 vs. network. Single-crop top-1 vali- dation accuracies for top scoring single-model archi- tectures. We introduce with this chart our choice of colour scheme, which will be used throughout this publication to distinguish effectively different archi- tectures and their correspondent authors. Notice that networks of the same group share the same hue, for example ResNet are all variations of pink.
Figure 2:Top1 vs. operations, size / parameters.
Top-1 one-crop accuracy versus amount of operations required for a single forward pass. The size of the blobs is proportional to the number of network pa- rameters; a legend is reported in the bottom right cor- ner, spanning from 5⇥106to 155⇥106params. Both these figures share the same y-axis, and the grey dots highlight the centre of the blobs.
single run of VGG-161(Simonyan & Zisserman, 2014) and GoogLeNet (Szegedy et al., 2014) are 8.70%and 10.07% respectively, revealing that VGG-16 performs better than GoogLeNet. When models are run with 10-crop sampling,2then the errors become 9.33% and 9.15% respectively, and therefore VGG-16 will perform worse than GoogLeNet, using a single central-crop. For this reason, we decided to base our analysis on re-evaluations of top-1 accuracies3for all networks with a single central-crop sampling technique (Zagoruyko, 2016).
For inference time and memory usage measurements we have used Torch7 (Collobert et al., 2011) with cuDNN-v5 (Chetlur et al., 2014) and CUDA-v8 back-end. All experiments were conducted on a JetPack-2.3 NVIDIA Jetson TX1 board (nVIDIA): an embedded visual computing system with a 64-bit ARMR A57 CPU, a 1 T-Flop/s 256-core NVIDIA Maxwell GPU and 4 GB LPDDR4 of shared RAM. We use this resource-limited device to better underline the differences between network architecture, but similar results can be obtained on most recent GPUs, such as the NVIDIA K40 or Titan X, to name a few. Operation counts were obtained using an open-source tool that we developed (Paszke, 2016). For measuring the power consumption, a Keysight 1146B Hall effect current probe has been used with a Keysight MSO-X 2024A 200 MHz digital oscilloscope with a sampling period of 2 s and 50 kSa/s sample rate. The system was powered by a Keysight E3645A GPIB controlled DC power supply.
3 RESULTS
In this section we report our results and comparisons. We analysed the following DDNs: AlexNet (Krizhevsky et al., 2012), batch normalised AlexNet (Zagoruyko, 2016), batch normalised Network In Network (NIN) (Lin et al., 2013), ENet (Paszke et al., 2016) for ImageNet (Culurciello, 2016), GoogLeNet (Szegedy et al., 2014), VGG-16 and -19 (Simonyan & Zisserman, 2014), ResNet-18, -34, -50, -101 and -152 (He et al., 2015), Inception-v3 (Szegedy et al., 2015) and Inception-v4 (Szegedy et al., 2016) since they obtained the highest performance, in these four years, on the ImageNet (Russakovsky et al., 2015) challenge.
1In the original paper this network is called VGG-D, which is the best performing network. Here we prefer to highlight the number of layer utilised, so we will call it VGG-16 in this publication.
2From a given image multiple patches are extracted: four corners plus central crop and their horizontal mirrored twins.
3Accuracy and error rate always sum to 100, therefore in this paper they are used interchangeably.
2
Figure from: Canziani, Alfredo; Paszke, Adam; Culurciello, Eugenio; An Analysis of Deep Neural Network Models for Practical Applications, 2017
• Inception-v4: Resnet + Inception
• VGG High memory and operations
• GoogLeNet very efficient
• Alexnet few operations but high memory and low accuracy
• Resnet moderate efficency and high accuracy
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Links and credits
• Fei-Fei Li, Justin Johnson, Serena Yeung - CS231n
• https://developers.google.com/machine-learning/crash-course/
• https://eu.udacity.com/course/deep-learning–ud730
• https://www.kaggle.com/competitions
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
Deep Learning: a review
Representation Learning methods DL Applications
Layers and Features Convolutional Networks Binary Classification Example
2Deep topics
Norm Penalties Early stopping Dropout
Batch Normalization
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Binary Classification Example
The goal of the example is to feedforward network to solve a binary classification problem
• Classification is the process in which ideas and objects are recognized
• If only 2 groups are involved the classification is a Binary Classification
• The imaginary lines that separate the groups are called Decision Boundaries
• The aim of the classification is to learn the decision function from a set of labeled samples.
This set of samples is called Training Data
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Binary Classification Example
Linearly separable
• Data sets are linearly separable if exists at least one line that separate all the points of a type on one side of the line and all the points of the other type on the other side.
A linear boundary (e.g. a straight line) is enough to separate the data into 2 groups
Non-Linearly separable
• Data sets that are not linearly separable.
The decision boundary is not a linear boundary but a polygonal or a
circular line
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Binary Classification Example
• Input layer
supply the input data or features to the network
• Output layer give out predictions
• Hidden layers
apply a series of functions to the input
NB: With multiple hidden layer we can compute complex functions by
cascading simpler functions
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Why use hidden layer in Binary Classification ?
If we try to solve the binary classification problem without hidden layers we are using a simple neuron which is able of learning only a linear decision boundary.
In case of linearly separable data a single neuron is able to learn. But in case of non-linearly separable data we can use feature transformations as a model for neuron learning.
Exercise Reminder:
• A single neuron can only learn a linear decision boundary
• For non linear datasets feature transformations (like product of features or square of them) have to be involved.
This approach is not general and can be tricky for data not easy to
visualize
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Why use hidden layer in Binary Classification ?
A neural network with a single hidden layer with nonlinear activation functions is a "Universal Function Approximator" (i.e. capable of learning any function)
The universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R
n(under mild assumptions on the activation function)
A feedforward network is a powerful deep learning model as a universal
function approximator able to model any complex function
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Why use hidden layer in Binary Classification ?
By performing non-linear transformations by layer the input space is projected into a new vector space. It is projected into a higher dimensional space where it becomes linearly separable.
This is equivalent to learn a complex decision boundary in the original input
space.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Outline
1
Deep Learning
2
Deep topics Regularization
Norm Penalties Early stopping Dropout
Batch Normalization
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Outline
1
Deep Learning
2
Deep topics Regularization
Norm Penalties Early stopping Dropout
Batch Normalization
RNN
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Regularization
• Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.
• Many strategies: restriction on parameters, on training algorithm, extra terms in objective function and more.
• In practical complex scenarios, as image or waveform analysis, the search space is huge.
• Very often the optimal choice is not find the minimum size of the space generating the data.
• Optimal solution: bigger spaces with regularization.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Regularization
• Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.
• Many strategies: restriction on parameters, on training algorithm, extra terms in objective function and more.
• In practical complex scenarios, as image or waveform analysis, the search space is huge.
• Very often the optimal choice is not find the minimum size of the space generating the data.
• Optimal solution: bigger spaces with regularization.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm
Regularization
• Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.
• Many strategies: restriction on parameters, on training algorithm, extra terms in objective function and more.
• In practical complex scenarios, as image or waveform analysis, the search space is huge.
• Very often the optimal choice is not find the minimum size of the space generating the data.
• Optimal solution: bigger spaces with regularization.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN
Regularization
• Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.
• Many strategies: restriction on parameters, on training algorithm, extra terms in objective function and more.
• In practical complex scenarios, as image or waveform analysis, the search space is huge.
• Very often the optimal choice is not find the minimum size of the space generating the data.
• Optimal solution: bigger spaces with regularization.
Deep Learning
Deep Learning:
a review Representation Learning methods
DL Applications Layers and Features
Convolutional Networks
Binary Classification Example
Deep topics Regularization
Norm Early stopping Dropout Batch Norm RNN