Introduction
In this article, we will walk through the process of creating a neural network using TensorFlow and Keras to classify the Fashion MNIST dataset. We will start with a brief overview of the Fashion MNIST dataset and then delve into the steps involved in building and training a simple Neural Network model. By the end of this article, you will have a solid understanding of how to use TensorFlow and Keras to create Neural Networks for classification tasks. For this tutorial, I would recommend using Google Colab since all the necessary tools are already installed for you.
The Fashion MNIST Dataset
The first thing we are going to do is to discuss a little about the dataset we are using. The Fashion MNIST dataset is popular in computer vision and machine learning and consists of 70,000 grayscale images of clothing and accessories, divided into 10 different classes. Each image is a 28x28 pixel square, which is really small compared to other large image datasets, but it is quite a challenging task for machine learning models to correctly classify the images due to the variations in the images that differentiate them from one another.
Here each image is a 28 x 28-pixel range together forms 784 total pixels which make the input layer of our Neural Network.
The Architecture
Let's see how we are going to classify the Fashion MNIST using Neural Networks.
The Fashion MNIST dataset consists of images that are 28 x 28 pixels, resulting in 784 pixels which will serve as the input layer for our neural network. Our Neural Network will consist of 2 hidden layers with 128 and 64 neurons, respectively. Since there are ten distinct classes of clothing and accessories represented in the dataset, our output layer must contain ten neurons in order to classify each image. Altogether, the Neural Network we are going to build has 4 layers.
Data Loading and Pre-Processing
Data pre-processing is one of the important steps before training any machine learning algorithms including Neural Networks. It helps us to modify our training data to ease the process of training the algorithm. For pre-processing and training our model, we need to load the dataset first. Well, you can simply load the dataset from Keras in-built datasets itself. Here is how you can do it.
from keras.datasets import fashion_mnist(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
The dataset is loaded and also converted to train and test sets. The entire dataset is split into train and test sets containing 60,000 and 10,000 samples respectively.
print(X_train.shape)print(X_test.shape)(60000, 28, 28)(10000, 28, 28)
Slicing and Normalizing the Training Set
As we are not building a neural network for production purposes, it makes sense to slice the training data. By doing so, we can train our model more quickly since we are only using a portion of the dataset, rather than the entire dataset. Here I'm going to use the first 5000 samples for training.
X_valid, X_train = X_train[:5000] / 255.0, X_train[5000:] / 255.0y_valid, y_train = y_train[:5000], y_train[5000:]
To ensure that our neural network is not overfitting to the training data, we have created a validation set in addition to the train and test sets, and that is what the X_valid and y_valid are. During training, we use the validation set to assess how well the model is generalizing to new, unseen data.
One more thing to note is that we have divided the entire training set by 255. It is used to normalize the pixel values of the image data to be within the range of 0 to 1. This is a common preprocessing step for image data because it helps to improve the numerical stability of the model by reducing the variance and computation.
Creating the Neural Network Using Keras Sequential API
Once we have completed the pre-processing stage, we can proceed to the main part of building a neural network using TensorFlow and Keras. It's important to note that Keras is a high-level library that offers an API for building, training, and evaluating neural networks. However, as TensorFlow has integrated Keras as its official API, we will be using the TensorFlow version of Keras. The advantage of using TensorFlow's version of Keras is that it offers additional features that are not available in standalone Keras.
Here we are using the Sequential API available in the TensorFlow Keras Framework which helps us to build Deep Learning models layer-by-layer and modify them as needed.
import tensorflow as tfmodel = tf.keras.models.Sequential()model.add(tf.keras.layers.Flatten(input_shape=[28, 28]))model.add(tf.keras.layers.Dense(128, activation='relu'))model.add(tf.keras.layers.Dense(64, activation='relu'))model.add(tf.keras.layers.Dense(10, activation='softmax'))
Simple and straightforward, just creating a Neural Network model using Sequential API and adding an input layer, hidden layers, and the output layer containing 128, 64, and, 10 neurons respectively. On the hidden layers, we used the ReLU(Rectified Linear Unit) Activation Function and on the output layer, we used the Softmax Function.
Next, we need to compile the model, which involves providing important information about the training process including the optimization technique, loss function, and the metrics.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
If you want to know about the model you built, just type model.summary()
model.summary()---------------Model: "sequential_1"_________________________________________________________________Layer (type) Output Shape Param #=================================================================flatten_1 (Flatten) (None, 784) 0dense_3 (Dense) (None, 128) 100480dense_4 (Dense) (None, 64) 8256dense_5 (Dense) (None, 10) 650=================================================================Total params: 109,386Trainable params: 109,386Non-trainable params: 0
Training the Model
Everything is set for training, so let's train the model,
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))
After 30 epochs, the model achieved a validation accuracy of 90%, which can be considered quite good. While it may be possible to improve the accuracy with additional epochs, we chose 30 epochs for this example.
Plotting the learning curve
If you want to know how the model performed in each stage of iterations, you can plot them, here is how you can plot the learning curve.
import pandas as pdimport matplotlib.pyplot as pltpd.DataFrame(history.history).plot(figsize=(8, 5))plt.grid(True)plt.gca().set_ylim(0, 1)plt.show()
As you can see from the learning curve plot the training loss and validation loss are decreasing in each iteration meanwhile the training accuracy and validation accuracy are increasing.
Testing the model
Now let's test our model with the test data,
import numpy as npprobs = model.predict(X_test) # predicting probabilitiespred = [np.argmax(prob) for prob in probs] # Converting probabilites to the actual valuespred = np.array(pred)
Classification report,
from sklearn.metrics import classification_reportprint(classification_report(pred, y_test))-------------precision recall f1-score support0 0.87 0.80 0.83 10821 0.99 0.95 0.97 10402 0.71 0.83 0.77 8533 0.81 0.94 0.87 8644 0.92 0.67 0.78 13805 0.95 0.97 0.96 9756 0.60 0.78 0.68 7657 0.88 0.98 0.93 8958 0.97 0.95 0.96 10189 0.99 0.88 0.93 1128accuracy 0.87 10000macro avg 0.87 0.88 0.87 10000weighted avg 0.88 0.87 0.87 10000
Alright, the model, the model got around 87% accuracy in the test set which is great actually, Now if you want to know how many of the values the model predicted wrong, you can simply compare the predicted values and actual values and count them,
error_count = 0#print("Actual\t\t\t\tPredicted Wrong\n")for i in range(len(pred)):if y_test[i] != pred[i]:error_count += 1#predicted = class_names[pred[i]]#actual = class_names[y_test[i]]#print("{}\t\t\t{}".format(# ("\033[32m" + class_names[y_test[i]] + "\033[0m"),# ('\033[91m' + class_names[pred[i]] + '\033[0m')# )#)print("\nTOTAL ERROR COUNT:",error_count)------TOTAL ERROR COUNT: 1404
If you want to see something interesting, just remove the comments from the code above.
That's it we have classified the Fashion MNIST dataset set using Neural Networks.
Thanks for reading! If you have any queries, make sure to post them in the comment box.
Articles to read: Neural Network from Scratch