Gif: Warner Bros. In The Matrix Neo and company learn awesome skills nearly instantly using “transfering”

Transfer Learning: A VGG16 based custom CNN for CIFAR 10 image classification

Rolando Quiroz
8 min readSep 29, 2020

--

Abstract

This blog post showcases the use of transfer learning through a modified convolutional neural network for the CIFAR 10 image dataset classification based on a pre-trained VGG16 architecture on the ImageNet data set. The custom convolutional neural network was implemented in Tensorflow and Keras and was trained in Google Colab. The training process reports validation accuracy percentages with values above 92%.

Introduction

Deep Neural Networks is the name given to the new neural network architectures and the new algorithms used to learn using these architectures. Modern deep learning provides a very powerful tools for supervised learning and in recent years the publications associated with this field are on the rise. A special case is the ImageNet Visual Recognition Challenge tha featured network architectures that have achieved important changes in artificial intelligence direction by creating different structures and new algorithms that allowed to increase and diversify the layers of convolutional neural networks giving greater flexibility and effectiveness to the models for image recognition.

Nowadays most learning applications employ the transfer learning approach that literally uses firsthand previous advances found by the artificial intelligence scientific community, by a process that involves the detailed adjustment of a previously trained model. Everything starts with an existing network, such as AlexNet or GoogLeNet from the ImageNet Visual Recognition Challenge , provided with new data containing previously unknown classes. After making some adjustments to the network, it is possible to perform a new task, e.g. categorizing only dogs or cats instead of 1000 different objects. This also has the advantage of requiring much less data (thousands of images are processed instead of millions), so that calculation time is reduced to hours or minutes.

And that is all, work on this blog try to solve the CIFAR 10 Image dataset classification based in the route drawn up above: training a smaller new model from a pre-trained one, a VGG16 specifically, that performs well in image classification for the ImageNet dataset and test it on CIFAR 10 Image dataset.

Materials and Methods

  • CIFAR 10 Dataset
A set of images from the CIFAR-10 Dataset

In this experiment the CIFAR 10 data set will be used. The Canadian Institute for Advanced Research (CIFAR) developed a standard data set on computer vision and deep learning. The CIFAR 10 dataset consists of 60,000 photos divided into 10 classes (hence the name CIFAR 10). The classes include common objects such as airplanes, cars, birds, cats, etc. in The dataset is divided in a standard way, where 50,000 images are used to form a model and the remaining 10,000 to evaluate its performance. The images are in the 3 channels (red, green and blue) and are small squares measuring 32 x 32 pixels.

  • VGG16 Model

Researchers from the Oxford Visual Geometry Group, or VGG for short, were also participating in the ImageNet Visual Recognition Challenge and in 2014, the convolutional neural network (CNN) models developed by the VGG won the image classification tasks. There are four VGG architectures and this work is focused on the so called VGG16. The 16 in VGG16 refers to it has 16 layers that have weights. This network is a pretty large network and it has about 138 million (approx) parameters.

VGG-16 Model Architecture in detail

One strategy for dealing with deep neural networks is to use pre-trained networks with large databases and adapt them to the problem of our interest.

For this purpose it is necessary that the pre-trained network has been trained to solve a more general problem, of which our problem can be considered a particular case. For example, in the case of classifying the 10 classes and 60000 images from CIFAR 10 we can use a trained network to classify more classes like the so called VGG16 that did it the task outstandingly for the ImageNet dataset containing 1.000 categories and 1.2 million images.

The main reasons why VGG16 was used are the following

1- It has an architecture that is easy to understand .

2- The model achieves 92.7% top-5 test accuracy in the ImageNet competition (ILSVRC-2014).

3- Of course the network (model and trained weights) is available Keras Applications

  • Google Colab
Google Colab logo

The definition given for Google for Google Collaboratory or how is more commonly called Google Collab is:

“Google Collab is a free Jupyter Notebook environment that requires no configuration and runs entirely in the cloud”

Google Colab is Google's cloud tool for running Python code and creating Machine Learning models through the Google cloud and with the ability to use their GPUs.

The main advantage that this tool offers is that it frees our machine from having to carry out a too expensive work in time and power or even allows us to do that work if our machine does not have powerful enough resources. And all for free.

  • What do you do when you have a larger training set like CIFAR 10?

We already know that CIFAR 10 have 50,000 images used to train a model and the remaining 10,000 to evaluate its performance. That is the truth and is reasonable assume that our training set is big enough to take the approach suggested by Andrew Ng on Transfer Learning for this case as our starting point.

Andrew Ng: Transfer learning on larger training scenario

Andrew establishes that some beginning layers could be freeze and just new custom top layers train should be training being these last few new layers the specific architecture builded on your new hidden units and a the final softmax output that matches the right image classification task requirements.

Results:

All said above was implemented using Tensorflow Keras 1.12: VGG16 model without the top classification layer was loaded and frozen then a simple custom classifier made of a GlobalAveragePooling2D layer followed by a default parameterized BatchNormalization operation, a Dense layer with 256 units and ReLU activation function, a Dropout layer with a rate of 6% and a final softmax layer for the 10 classes that must be classified.

The two models were stacked and builded with the Keras Functional API . On next image the model sumary can be appreciated:

Final model summary

The setup for the training system is based on a Cross-Entropy Loss function, an Adam Optimizer and the hyperparameters that batch size and epochs tha support the training procedure.

For classification task, Cross-Entropy Loss function is commonly preferred and widely used. In this dataset, there are 10 classes, so the proper loss function is clearly the Cross-Entropy loss to choose.

There are numerous options for optimizer e.g. Stochastic Gradient Descent, Adam, RMSProp , … In this work, Adam optimizer was utilized with a constant learning rate and the two main reasons for this are, as Jason Brownlee posted in his website, the speed to train models and its relatively easy way to configure where the default configuration parameters do well on most problems.

Finally the hyperparameters batch size and epochs will be respectly 128 and 20 as a recomendation for Google Colab for a easy migration to TPU runtime enviroment.

The taining results are shown as follows:

Custom model training process results.
Custon Neural Net based on VGG16 Performance

With the application of the VGG16 architecture the addition of the new top hidden layers and the softmax output and of course the knowledge transfer from ImageNet experience to CIFAR 10 task with beginning frozen layers, it was possible to classify the dataset classes with an validation accuracy above 92% according to the validation set. On the other hand, the classification error took down around to 12% in the set of validation images.

Discussion

By building a model based on VGG16 trained for ILSVRC-2014, the possibility of applying the knowledge transfer process to achieve results in line with the state of the art was demonstrated. The results presented are above 92% for the training and validation set of images.

The model was efective on achive the target validation accuracy value since first epoch. It is widely known a ratified trend for several VGG16 based models with knowlede loaded from previous saved weight files, specially based on Imagenet set, to reach a validation accuracy of 93.56% or more on the CIFAR 10 set classification, trend that was happily kept in this experince.

Considering the fact that strategies such as learning rate decay or data augmentation still not applied to the model and tend to improve models result on almost all cases or just simple variations to the top architecuture can be done the experimental margin just amaze more.

The thing is that the Andrew Ng suggested strategy was undoubtly efective and it was just enhanced for the high accuracy rates of VGG16. Using networks with generalized features identification already learned are the key of the exponential future growing of deep learning as Andrew Ng himself mentioned on NIPS 2016:

“After supervised learning — Transfer Learning will be the next driver of ML commercial success”

Literature Cited

--

--