Implementation of Basic Gradient Descent
Now that you know how the basic gradient descent works, you can implement it in Python. You’ll use only plain Python and NumPy, which enables you to write concise code when working with arrays (or vectors) and gain a performance boost.
This is a basic implementation of the algorithm that starts with an arbitrary point, start, iteratively moves it toward the minimum, and returns a point that is hopefully at or near the minimum:
def gradient_descent(gradient, start, learn_rate, n_iter):
vector = start
for _ in range(n_iter):
diff = -learn_rate * gradient(vector)
vector += diff
return vector
gradient_descent() takes four arguments:
gradient is the function or any Python callable object that takes a vector and returns the gradient of the function you’re trying to minimize.
start is the point where the algorithm starts its search, given as a sequence (tuple, list, NumPy array, and so on) or scalar (in the case of a one-dimensional problem).
learn_rate is the learning rate that controls the magnitude of the vector update.
n_iter is the number of iterations.
This function does exactly what’s described above: it takes a starting point (line 2), iteratively updates it according to the learning rate and the value of the gradient (lines 3 to 5), and finally returns the last position found.
Before you apply gradient_descent(), you can add another termination criterion:
import numpy as np
def gradient_descent(
gradient, start, learn_rate, n_iter=50, tolerance=1e-06):
vector = start
for _ in range(n_iter):
diff = -learn_rate * gradient(vector)
if np.all(np.abs(diff) <= tolerance):
break
vector += diff
return vector
You now have the additional parameter tolerance (line 4), which specifies the minimal allowed movement in each iteration. You’ve also defined the default values for tolerance and n_iter, so you don’t have to specify them each time you call gradient_descent().
Lines 9 and 10 enable gradient_descent() to stop iterating and return the result before n_iter is reached if the vector update in the current iteration is less than or equal to tolerance. This often happens near the minimum, where gradients are usually very small. Unfortunately, it can also happen near a local minimum or a saddle point.
Line 9 uses the convenient NumPy functions numpy.all() and numpy.abs() to compare the absolute values of diff and tolerance in a single statement. That’s why you import numpy on line 1.
Now that you have the first version of gradient_descent(), it’s time to test your function. You’ll start with a small example and find the minimum of the function ? = ?².
This function has only one independent variable (?), and its gradient is the derivative 2?. It’s a differentiable convex function, and the analytical way to find its minimum is straightforward. However, in practice, analytical differentiation can be difficult or even impossible and is often approximated with numerical methods.
You need only one statement to test your gradient descent implementation:
>>> gradient_descent(
... gradient=lambda v: 2 * v, start=10.0, learn_rate=0.2)
2.210739197207331e-06
You use the lambda function lambda v: 2 * v to provide the gradient of ?². You start from the value 10.0 and set the learning rate to 0.2. You get a result that’s very close to zero, which is the correct minimum.
The figure below shows the movement of the solution through the iterations:
enter link description here
You start from the rightmost green dot (? = 10) and move toward the minimum (? = 0). The updates are larger at first because the value of the gradient (and slope) is higher. As you approach the minimum, they become lower.
Improvement of the Code
You can make gradient_descent() more robust, comprehensive, and better-looking without modifying its core functionality:
import numpy as np
def gradient_descent(
gradient, x, y, start, learn_rate=0.1, n_iter=50, tolerance=1e-06,
dtype="float64"):
# Checking if the gradient is callable
if not callable(gradient):
raise TypeError("'gradient' must be callable")
# Setting up the data type for NumPy arrays
dtype_ = np.dtype(dtype)
# Converting x and y to NumPy arrays
x, y = np.array(x, dtype=dtype_), np.array(y, dtype=dtype_)
if x.shape[0] != y.shape[0]:
raise ValueError("'x' and 'y' lengths do not match")
# Initializing the values of the variables
vector = np.array(start, dtype=dtype_)
# Setting up and checking the learning rate
learn_rate = np.array(learn_rate, dtype=dtype_)
if np.any(learn_rate <= 0):
raise ValueError("'learn_rate' must be greater than zero")
# Setting up and checking the maximal number of iterations
n_iter = int(n_iter)
if n_iter <= 0:
raise ValueError("'n_iter' must be greater than zero")
# Setting up and checking the tolerance
tolerance = np.array(tolerance, dtype=dtype_)
if np.any(tolerance <= 0):
raise ValueError("'tolerance' must be greater than zero")
# Performing the gradient descent loop
for _ in range(n_iter):
# Recalculating the difference
diff = -learn_rate * np.array(gradient(x, y, vector), dtype_)
# Checking if the absolute difference is small enough
if np.all(np.abs(diff) <= tolerance):
break
# Updating the values of the variables
vector += diff
return vector if vector.shape else vector.item()