Adamax build upon the well-known Adam optimization, but swap out the l2 norm for an l_infinite norm in the gradient scaling factor. Let’s explore the definition and an implementation!
Code can be found over here: Learning from Scratch in Python/Gradient Descent Optimization
## Credit
Check out this cool blog post if you want to learn more about stochastic gradient descent based optimization: http