This project was done when I was pursuing the certificate of Deep Learning Specialization taught by Andrew Ng on Coursera. It was a programming assignment for the first course in the specialization. The project goal is to train a neural network that can tell if an image is a photo of a cat. It is a simple application of logistic regression, and it made me ready for building more advanced and complex AI.
I was given a dataset(“data.h5”) that contains following information:
- a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- a test set of m_test images labeled as cat or non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
Step 0: Load and process the dataset:
We added “_orig” at the end of image datasets (train and test) because we are going to preprocess them. After preprocessing, we will end up with train_set_x and test_set_x (the labels train_set_y and test_set_y don’t need any preprocessing). Each line of train_set_x_orig and test_set_x_orig is an array representing an image.
Reshape the training and test data sets so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px * num_px * 3, 1).
In order to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b*c*d, a), the easy way to do this is:
For picture datasets, it is simpler and more convenient and works almost as well to just divide every row of the dataset by 255.
Step 1: Build parts for the algorithm:
First, I built the sigmoid function method, because a lot of later computing will use it. With numpy package, it is just 1 line of code. After that, we need to think about the data structure of our parameters. The initialization function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
So much for setting up, now we consider about the propagation. Forward propagation generates cost, and back propagation generates gradients.
The square error loss function is not used so that we can avoid local maxima problems during the optimization process. After that, I built the optimize function, which is the core part of “learning”.
Now our neural network can be trained to learn about cat images, but we still need to get their predictions. The predictions are made by doing the binary classification. If the output of the neural network is less or equal than 0.5, which means the neural network thinks it is unlikely to be an image of a cat, the output is classified into “0”, and vice-versa.
Step 2: Modeling and evaluating:
Build the model by putting all the parts we have built together, we call:
The model method comprises the initialization method, the optimization method, and prediction method. I built and train the neural network with train_set_x, and we get the predictions made by the trained AI based on test_set_x. By comparing with test_set_y with our predictions, I can calculate the accuracy, which is also the performance measure of the model. The training accuracy is 99.04306220095694 %, and the testing accuracy is 70%. High accuracy on the training set implies that the model is working and has high enough capacity to fit the training data. Moderate accuracy on the testing set is actually acceptable consider the simplicity of the model and the size of the training dataset.
Step 3: Further analysis:
Let’s plot the learning curve:
The decreasing cost, which measures the error, tells us that the model is learning to fit the training data. However, it is not always better to have more iterations, because having too many iterations brings us overfitting problem. When we overfit the training data, accuracy on predicting test data will be compromised. In most situations, what we really care is to accurately predict unseen, unknown, or unclassified data, so we must take overfitting into consideration when we build machine learning models.
In spite of the number of iterations we talked about, the learning rate is also a crucial parameter when we build the model. If we change the learning rate to be [0.01, 0.001, 0.0001], we can have the following plot:
If the learning rate is too big, the cost may oscillate up and down. It may even diverge. Low cost does not always mean better model. (Overfitting) When we choose learning rate, we should always pick the one that can minimize the cost better, and use other techniques to reduce overfitting.
(Image source: Coursera)
A current master student in WUSTL, department of Electrical and System Engineering.