Classifying Caltech 101 categories of images

This is a short article about how to build and train VGG19 on image classification task. The main purpose of this mini project is to walk through all verbose part of an image classification task, it also set a baseline for any future advancement on image classification. The training time is capped at 1 hour on a GPU attached computer.

To make the experiment closer to a real-life setting, I opted out the CIFAR10 dataset as it already done some amount of data preparation that you do not get in a real image classification task. Also, the image size from CIFAR10 (32x32) is too small for many algorithms. (Caltech 101)[http://www.vision.caltech.edu/Image_Datasets/Caltech101/] is a better choice. The dataset has 10 times more image categories, most images are above 200px in each dimension and they are not retouched.

Keras is a brilliant wrapper library that removed the verbosity of Tensorflow. To make the classification work, you will be doing the following:

Caveats

The dataset has a category called BACKGROUND_google that it contains all kinds of images. And they do not belong to one category. Please manually remove this directory, or it’s your loss ;)

Train/Validation split

Caltech 101 does not split the dataset for you, I’ve used the script below to split the dataset into 2 parts.

The file hierarchy should be look like below

101_ObjectCategories/
   train/
        rabbit/
            image..1
            image..2
            image..3
        dog/
            image..1
            image..2
   val/
        rabbit/
            image..1
            image..2
            image..3
        dog/
            image..1
            image..2

base_path = os.getcwd()
data_path = os.path.join(base_path, "data/101_ObjectCategories/train") 
categories = os.listdir(data_path)
test_path = os.path.join(base_path, "data/101_ObjectCategories/val")
for cat in categories:
    image_files = os.listdir(os.path.join(data_path, cat))
    choices = np.random.choice([0, 1], size=(len(image_files),), p=[.85, .15])
    files_to_move = compress(image_files, choices)
    
    for _f in files_to_move:
        origin_path = os.path.join(data_path, cat,  _f)
        dest_dir = os.path.join(test_path, cat)
        dest_path = os.path.join(test_path, cat, _f)
        if not os.path.isdir(dest_dir):
            os.mkdir(dest_dir)
        shutil.move(origin_path, dest_path)
        

Load the labels

The labels need to be one-hot encoded for categorical cross entropy cost function to work. Scikit has LabelBinarizer which does the job for us.

catagories = os.listdir('data/101_ObjectCategories/train')
catagories_one_hot = LabelBinarizer().fit_transform(catagories)

Image pre-processing

There are typically three pre-processing techniques, mean subtraction, normalisation and data whitening.

Both mean subtraction and normalisation is used to centre the data around zero means. This typically helps the network to learn faster since gradients act uniformly for each channel.

Mean subtraction can be implemented as

X -= np.mean(X)

Normalisation will alter the scale the data to appropriately the same level. VGG19 network uses multiple nomalisation layers so it is not required in the pre-processing step.

Results

The full source code is pushed to this repository

After 50 minutes of training, I got following result.

>> print('Total {0} incorrectly classified images out of {1}'.format(len(incorrect_indexes), len(v19_results)))
>> print('Accuracy: {0}'.format(accuracy_score(actual, v19_predicted)))

Total 304 incorrectly classified images out of 1263
Accuracy: 0.7593032462391133

Training loss Training accuracy Validation accuracy

Improvements

Let’s have a look what images are wrongly classified

def plot_incorrect_images(predicted, actual):
    for i, (_actual, _predicted) in enumerate(zip(actual, predicted)):
        if not _actual == _predicted:
            image_file_name = val_generator.filenames[i]
            im = Image.open(os.path.join(val_data_dir, image_file_name))
            im.thumbnail((128, 128), Image.ANTIALIAS)
            display(im)
            print('Predicted: {0}\nActual   : {1}'.format(key_to_label[_predicted], key_to_label[_actual]))

Predicted: Faces_easy
Actual   : Faces

Predicted: Faces_easy
Actual   : Faces

Predicted: wild_cat
Actual   : Leopards

Predicted: pigeon
Actual   : Motorbikes

Predicted: chandelier
Actual   : airplanes

Predicted: helicopter
Actual   : airplanes

Predicted: llama
Actual   : barrel

Predicted: pigeon
Actual   : barrel

Predicted: okapi
Actual   : barrel

Predicted: water_lilly
Actual   : bass

Predicted: cannon
Actual   : bass

Predicted: hawksbill
Actual   : bass

Predicted: electric_guitar
Actual   : bass

Predicted: crocodile_head
Actual   : bass

Predicted: platypus
Actual   : beaver

Predicted: hedgehog
Actual   : beaver

Predicted: Faces
Actual   : beaver

Predicted: bass
Actual   : bonsai

Predicted: nautilus
Actual   : brain

Predicted: hedgehog
Actual   : brain

Predicted: pizza
Actual   : brain

Predicted: cougar_body
Actual   : brain

Predicted: lobster
Actual   : brain

The end