Introduction
Neural Networks are exciting tools for building awesome AIs and they are fun to learn and implement, while there are tons of libraries available to create and train Neural Networks in a nutshell, building your own Neural Network from scratch will provide you with a very deeper understanding of the underlying process and inner workings of Neural Networks. In this article, we are going to build an entire Neural Network from scratch only using the NumPy library to classify not the classical handwritten digits, but the fashion MNIST dataset. Alright without further delay, let's get straight into it.
The Fashion MNIST dataset
First, let's discuss a little about the dataset we are using. The Fashion MNIST dataset is popular in computer vision and machine learning and consists of 70,000 grayscale images of clothing and accessories, divided into 10 different classes. Each image is a 28x28 pixel square, which is really small compared to other large image datasets, but it is quite a challenging task for machine learning models to correctly classify the images due to the variations in the images that differentiate them from one another.
Image Source: TensorFlow |
The Architecture
Now let's see how we are going to build our Neural Network. Here is our plan,
Here, with each image in the Fashion-MNIST dataset containing 28x28 pixels, the input layer of our neural network must consist of 784 neurons. For the hidden layer, I have chosen 128 neurons, which is more than enough for detecting patterns within the images. Since there are ten distinct classes of clothing and accessories represented in the dataset, our output layer must contain ten neurons in order to classify each image. Altogether, the Neural Network we are going to build has three layers.
Data pre-processing
Proper pre-processing of data is essential before training the neural network. If you are using Jupyter Notebook, you can easily copy the dataset into the working directory. However, if you are using Google Colab instead, you must first upload the dataset to the Colab notebook and obtain the file path before proceeding with pre-processing and training. After the dataset is ready, you can load it to the notebook using the following code:
import numpy as npwith open('fashion_mnist_train.npy', 'rb') as train_data:X_train = np.load(train_data)y_train = np.load(train_data)with open('fashion_mnist_test.npy', 'rb') as test_data:X_test = np.load(test_data)y_test = np.load(test_data)
Note that here I used NumPy to encode the training and testing data, So you can see it as a NumPy file format. Now, it's really simple to load the data and convert it to training and testing data with its corresponding labels.
print(X_train.shape)print(X_test.shape)-----(60000, 28, 28)(10000, 28, 28)
On the training set, we have 60,000 samples and on the testing set, we have 10,000 samples, each arranged in a 28 by 28 shape.
Now, let's slice the dataset,
X_train = X_train[:5000]y_train = X_test[:5000]X_test = X_test[:5000]y_test = y_test[:5000]
Slicing the dataset makes sense here because we are just building a Neural Network only to understand it better not for production or something. So here I'm choosing 5000 samples each from training and testing data.
Reshaping the dataset
After slicing the dataset, you can see that the shape of the dataset is (5000, 28, 28) and (5000, 28, 28). The dataset is in a 2-dimensional matrix format of 28x28 and not like a one-dimensional flattened matrix of the shape of 784. So to feed the data into the input layer containing 784 neurons, we need to reshape the dataset into (5000, 784) for the training set and (5000, 784) for the testing set. Here is how you can do it.
X_train = X_train.reshape(X_train.shape[0], -1) / 255.0X_test = X_test.reshape(X_test.shape[0], -1) / 255.0
One more thing you can note here is that we divided the entire train and test set by 255. It is used to normalize the pixel values of the image data to be within the range of 0 to 1. This is a common preprocessing step for image data because it helps to improve the numerical stability of the model by reducing the variance and computation.
print(X_train.shape)print(X_test.shape)------(5000, 784)(5000, 784)
Vectorizing(one-hot encoding) the labels
One more step remaining in the pre-processing is to vectorize the labels or the target values. This is necessary because most machine learning algorithms are designed to work with numerical data, and cannot handle categorical data directly. Well, it is really simple to do with the Keras library.
from keras.utils import to_categoricaly_train = to_categorical(y_train)y_test = to_categorical(y_test)
What really happens is that the labels that contain numbers from 0 - 9 representing each class of clothes and accessories in the fashion MNIST dataset are converted into their corresponding one-hot vector of length 10, for instance, if the value is 3 the one-hot vector will be [0,0,0,1,0,0,0,0,0,0].
Creating the Neural Network Class
Now let's get into the interesting part, If you are not familiar with or need a recap on the working of Neural Networks, backpropagation, and Gradient Descent, we have a whole article about it. I recommend you read it.
Alright, first let's initialize some instance variables that are necessary like the layers, learning rate, and epochs,
class NN:def __init__(self, input_neurons, hidden_neurons, output_neurons, learning_rate, epochs):# initializing the instance variablesself.input_neurons = input_neuronsself.hidden_neurons = hidden_neuronsself.output_neurons = output_neuronsself.epochs = epochs# Links of weights from input layer to hidden layerself.wih = np.random.normal(0.0, pow(self.input_neurons, -0.5), (self.hidden_neurons, self.input_neurons))self.bih = 0# Links of weights from hidden layer to output layerself.who = np.random.normal(0.0, pow(self.hidden_neurons, -0.5), (self.output_neurons, self.hidden_neurons))self.bho = 0self.lr = learning_rate # Learning rate
Simple! The thing to note here is the weights and bias, we initialized some random weights using the NumPy library corresponding to each connection between the input layer and hidden layer (wih) and the hidden layer to the output layer(who), and the biases as well.
Activation Function
A really important concept when comes to Neural Networks is the activation function. An activation function is used to bring non-linearity to our Neural Network. Well, every complex data like the one we are dealing with contains non-linear relationships. Without an activation function, the network would simply be a linear model and would not be able to capture complex, nonlinear relationships in the data.
Here I'm using the Sigmoid(Logistic) activation function. You can choose any activation function, but I found the sigmoid activation function more suitable here since it gives us the probabilities of each class.
\[\sigma(x) = \frac{1}{1 +e^{-x}}\]
We also need the derivative of the sigmoid activation function. We have done this entire derivation in the backpropagation and Gradient Descent Article, go there if you want to take a look at the derivation.
\[\sigma(x).(1-\sigma(x))\]
So here is how you can implement the following equation in code,
def activation(self, z):"""Returns the sigmoid of z"""z = np.clip(z, -500, 500) # Avoid overflow errorreturn 1 / (1 + np.exp(-z))def sigmoid_derivative(self, z):"""Returns the derivative of the sigmoid of z"""return self.activation(z) * (1 - self.activation(z))
Forward Propagation
Forward propagation as we know is the process of passing the inputs through the network to produce an output. It involves multiplying the input values by the network's weights, adding a bias term, and applying an activation function to produce an output for each neuron in the network. Here is the implementation.
# Forward propagationdef forward(self, input_list):inputs = np.array(input_list, ndmin=2).T# Passing inputs to the hidden layerhidden_inputs = np.dot(self.wih, inputs) + self.bih# Getting outputs from the hidden layerhidden_outputs = self.activation(hidden_inputs)# Passing inputs from the hidden layer to the output layerfinal_inputs = np.dot(self.who, hidden_outputs) + self.bho# Getting output from the output layeryj = self.activation(final_inputs)return yj
That's it! We are simply passing the inputs from the input layer to the hidden layer and finally to the output layer and returning the result.
Backpropagation and Gradient Descent
Now comes the most important part, the backpropagation which is used to train our Neural Network. Backpropagation is the idea of propagating the errors made by the network backward all the way to the input layer for adjusting the weights and biases that fit the training data. Again I recommend reading the article on backpropagation if you're not familiar.
# Back propagationdef backprop(self, inputs_list, targets_list):inputs = np.array(inputs_list, ndmin=2).Ttj = np.array(targets_list, ndmin=2).T # Targets# passing inputs to the hidden layerhidden_inputs = np.dot(self.wih, inputs) + self.bih# Getting outputs from the hidden layerhidden_outputs = self.activation(hidden_inputs)# Passing inputs from the hidden layer to the output layerfinal_inputs = np.dot(self.who, hidden_outputs) + self.bho# Getting output from the output layeryj = self.activation(final_inputs)# Finding the errors from the output layeroutput_errors = -(tj - yj)# Finding the error in the hidden layerhidden_errors = np.dot(self.who.T, output_errors)# Updating the weights using Gradient Descent Update Ruleself.who -= self.lr * np.dot((output_errors * self.sigmoid_derivative(yj)), np.transpose(hidden_outputs))self.wih -= self.lr * np.dot((hidden_errors * self.sigmoid_derivative(hidden_outputs)), np.transpose(inputs))#updating biasself.bho -= self.lr * (output_errors * self.sigmoid_derivative(yj))self.bih -= self.lr * (hidden_errors * self.sigmoid_derivative(hidden_outputs))pass
The backprop method has two parameters, one is the input list which is the training data and the other is the target list which is the corresponding labels.
The fit method
This method is used to train the network over a range of epochs(iterations). So in this method, we perform the backpropagation and gradient descent we have defined above.
# Performing Gradient Descent Optimization using Backpropagationdef fit(self, inputs_list, targets_list):for epoch in range(self.epochs):self.backprop(inputs_list, targets_list)print(f"Epoch {epoch}/{self.epochs} completed.")
The predict method
The final method of our Neural Network class is the predict method, which is of course used to perform the prediction using the updated weights and biases.
def predict(self, X):outputs = self.forward(X).Treturn outputs
The predict method simply takes the test data as an argument and performs the forward propagation to produce the result.
That's it! we have coded and entire the Neural Network class from scratch.
Training and Testing the Network
Alright! Now, let's put everything into action, to train our Neural Network, we need to create the object of the Network class and need to call the fit method.
nn = NN(input_neurons=784, hidden_neurons=128, output_neurons=10, learning_rate=0.01, epochs=1000)nn.fit(X_train, y_train)
Testing the Network
# Predicting probabiliiesprobs = nn.predict(X_test)# Converting probabilities to one-hot vector formatpredictions = []for prob in probs:max_idx = np.argmax(prob)prediction = np.zeros_like(prob)prediction[max_idx] = 1predictions.append(prediction)
Now let's evaluate the network performance in the testing data.
from sklearn.metrics import accuracy_scorefrom sklearn.metrics import classification_reportprint("Accuracy:",accuracy_score(predictions, y_test))print("CR:", classification_report(predictions, y_test))---------Accuracy: 0.7644CR: precision recall f1-score support0 0.76 0.78 0.77 4981 0.95 0.96 0.96 4792 0.71 0.60 0.65 6163 0.84 0.73 0.78 5764 0.71 0.60 0.65 6135 0.81 0.79 0.80 4996 0.26 0.65 0.37 1957 0.87 0.79 0.83 5548 0.91 0.89 0.90 5369 0.81 0.89 0.85 434micro avg 0.76 0.76 0.76 5000macro avg 0.76 0.77 0.76 5000weighted avg 0.79 0.76 0.77 5000samples avg 0.76 0.76 0.76 5000
Wow, the accuracy is good, although it is not impressive. But since this is a self-made network, a 76% accuracy is great actually.
Plotting some images with the corresponding predictions
Let's plot some images with the corresponding predictions the network made for us, to understand. Here is how you can do it.
import matplotlib.pyplot as pltfig, axes = plt.subplots(2, 4, figsize=(10, 6))for i, ax in enumerate(axes.flat):img_data = X_test[i].reshape((28, 28))# Display imageax.imshow(img_data, cmap='gray')ax.set_xticks([])ax.set_yticks([])index = np.where(predictions[i] == 1)[0][0]label = class_names[index]true_label = class_names[np.argmax(y_test[i])]if label != true_label: # Writing the prediction label as red if it is wrongax.set_xlabel(label, color='r')else:ax.set_xlabel(label)plt.show()
, and the plot will look like this,
The full version of the code
class NN:def __init__(self, input_neurons, hidden_neurons, output_neurons, learning_rate, epochs):# initializing the instance variablesself.input_neurons = input_neuronsself.hidden_neurons = hidden_neuronsself.output_neurons = output_neuronsself.epochs = epochs# Links of weights from input layer to hidden layerself.wih = np.random.normal(0.0, pow(self.input_neurons, -0.5), (self.hidden_neurons, self.input_neurons))self.bih = 0# Links of weights from hidden layer to output layerself.who = np.random.normal(0.0, pow(self.hidden_neurons, -0.5), (self.output_neurons, self.hidden_neurons))self.bho = 0self.lr = learning_rate # Learning ratedef activation(self, z):"""Returns the sigmoid of z"""z = np.clip(z, -500, 500) # Avoid overflow errorreturn 1 / (1 + np.exp(-z))def sigmoid_derivative(self, z):"""Returns the derivative of the sigmoid of z"""return self.activation(z) * (1 - self.activation(z))# Forward propagationdef forward(self, input_list):inputs = np.array(input_list, ndmin=2).T# Passing inputs to the hidden layerhidden_inputs = np.dot(self.wih, inputs) + self.bih# Getting outputs from the hidden layerhidden_outputs = self.activation(hidden_inputs)# Passing inputs from the hidden layer to the output layerfinal_inputs = np.dot(self.who, hidden_outputs) + self.bho# Getting output from the output layeryj = self.activation(final_inputs)return yj# Back propagationdef backprop(self, inputs_list, targets_list):inputs = np.array(inputs_list, ndmin=2).Ttj = np.array(targets_list, ndmin=2).T # Targets# passing inputs to the hidden layerhidden_inputs = np.dot(self.wih, inputs) + self.bih# Getting outputs from the hidden layerhidden_outputs = self.activation(hidden_inputs)# Passing inputs from the hidden layer to the output layerfinal_inputs = np.dot(self.who, hidden_outputs) + self.bho# Getting output from the output layeryj = self.activation(final_inputs)# Finding the errors from the output layeroutput_errors = -(tj - yj)# Finding the error in the hidden layerhidden_errors = np.dot(self.who.T, output_errors)# Updating the weights using Update Ruleself.who -= self.lr * np.dot((output_errors * self.sigmoid_derivative(yj)), np.transpose(hidden_outputs))self.wih -= self.lr * np.dot((hidden_errors * self.sigmoid_derivative(hidden_outputs)), np.transpose(inputs))#updating biasself.bho -= self.lr * (output_errors * self.sigmoid_derivative(yj))self.bih -= self.lr * (hidden_errors * self.sigmoid_derivative(hidden_outputs))pass# Performing Gradient Descent Optimization using Backpropagationdef fit(self, inputs_list, targets_list):for epoch in range(self.epochs):self.backprop(inputs_list, targets_list)print(f"Epoch {epoch}/{self.epochs} completed.")def predict(self, X):outputs = self.forward(X).Treturn outputs
Articles to read:
Thanks for reading!
If you have any questions or queries feel free to ask in the comment box.