Optimization - aims at reducing losses and provide most accurate results possible
- Weight - initialized using strategies and updated w/ each epoch according to the equation.
https://cs231n.github.io/neural-networks-3/#hyper
https://dataaspirant.com/optimization-algorithms-deep-learning/
https://medium.com/@minions.k/optimization-techniques-popularly-used-in-deep-learning-3c219ec8e0cc
(Batch)Gradient Descent
- Aim to find the global minimum
- Entire data is loaded at a time
- May stuck at local minima
- → computationally intensive
Stochastic Gradient Descent
- Compute derivative one at a time
- contain noise → slow to converge to minimum
Mini-Batch SGD
Adaptive Optimization techniques
Used stats from previous iterations to speed up the converging process
Momentum based optimizer
- Exponentially used weighted average gradients over previous iterations to stabilize the convergence → quick optimization
- Formula: add a fraction (gamma) to previous iteration values
- Momentum term increase when gradient points are in the same directions & reduce when gradients fluctuate