Here you will learn about:
Perceptron
Neural networksÂ
Training neural networks
Optimization
Backpropagation
Motivation: hand engineered features are time and energy consuming and not scalable in practice. Can we learn the underlying features directly from data?
Why now? Big data & Hardware (GPU) & Software (pytorch)
Idea: takes input data and outputs prediction.
Why activation function?
To introduce non-linearity and learn more complex dependencies.
Recall: linear combo of linear combo will give no more than linear combo :(
Like single perceptron, we can make it with two outputs:
Next, we can add more intermediate neurons
The goal of training: find optimal weights
Define optimization problem: find argmin of loss function
Gradien descent
Computing gradient: chain rule and backpropogation
Loss landscape and learning rate
Stochastic gradient descent aka mini-batchesÂ
OVerfiting
Dropout and early stopping