The majority of machine learning techniques necessitate a significant amount of numerical computation. This usually refers to algorithms that solve mathematical problems by iteratively updating solution estimates rather than analytically deriving a formula that provides a symbolic expression for the correct solution. Optimization (identifying the value of an argument that minimizes or maximizes a function) and solving systems of linear equations are two common activities. Even evaluating a mathematical function on a digital computer can be challenging when the function comprises real values that cannot be accurately represented using a finite quantity of memory.
Continuous math on a digital computer is tough because we must express an infinite number of real numbers with a finite number of bit patterns. This means that when we represent a number in the computer, we suffer some approximation error for practically all real numbers. This is frequently merely a rounding error. Rounding error is a concern, especially when it accumulates across multiple operations, and it can lead algorithms to fail in practice if they are not designed to reduce rounding error accumulation.
Underflow
Underflow is a type of rounding error that can be extremely damaging. When integers near zero are rounded to zero, underflow occurs. When the argument is zero instead of a small positive number, many functions act qualitatively differently. For example, we usually want to avoid division by zero (some software environments will throw an exception, while others will return a result with a placeholder NaN [not-a-number value]) or take the logarithm of zero (this is usually treated as
Overflow
Overflow is another highly harmful type of numerical error. When numbers of enormous magnitude are approximated as -\infty or \infty , overflow occurs. These infinite numbers are frequently converted to NaN values with more mathematics. One can consider softmax function as a function that needs stabilization against the underflow and overflow. A multinomial distribution’s probabilities are frequently predicted using the softmax function. The softmax function is as follows:
Consider what happens if all
There is still one minor issue. Even if the numerator has underflow, the expression can nevertheless evaluate to zero. This means that if we implement log softmax (
Most of the numerical considerations needed in implementing the various algorithms given in this book are not clearly detailed. When implementing deep learning algorithms, low-level library developers should keep numerical concerns in mind. Most readers of this book will be able to rely on low-level libraries with reliable implementations. It may be possible to create a new algorithm and have it automatically stabilized in some instances.