# Classifying Caltech 101 categories of images

This is a short article about how to build and train VGG19 on image classification task. The main purpose of this mini project is to walk through all verbose part of an image classification task, it also set a baseline for any future advancement on image classification. The training time is capped at 1 hour on a GPU attached computer.

To make the experiment closer to a real-life setting, I opted out the CIFAR10 dataset as it already done some amount of data preparation that you do not get in a real image classification task. Also, the image size from CIFAR10 (32x32) is too small for many algorithms. (Caltech 101)[http://www.vision.caltech.edu/Image_Datasets/Caltech101/] is a better choice. The dataset has 10 times more image categories, most images are above 200px in each dimension and they are not retouched.

Keras is a brilliant wrapper library that removed the verbosity of Tensorflow. To make the classification work, you will be doing the following:

• Prepare train/validation data split
• Image preprocessing
• Scale the pixel down to 0 to 1 range
• Convert the image to BGR format
• Apply some sort of image argumentation
• Build the VGG19 model
• Train the model
• Analyse the model performance

Caveats

The dataset has a category called BACKGROUND_google that it contains all kinds of images. And they do not belong to one category. Please manually remove this directory, or it’s your loss ;)

### Train/Validation split

Caltech 101 does not split the dataset for you, I’ve used the script below to split the dataset into 2 parts.

The file hierarchy should be look like below

101_ObjectCategories/
train/
rabbit/
image..1
image..2
image..3
dog/
image..1
image..2
val/
rabbit/
image..1
image..2
image..3
dog/
image..1
image..2


base_path = os.getcwd()
data_path = os.path.join(base_path, "data/101_ObjectCategories/train")
categories = os.listdir(data_path)
test_path = os.path.join(base_path, "data/101_ObjectCategories/val")
for cat in categories:
image_files = os.listdir(os.path.join(data_path, cat))
choices = np.random.choice([0, 1], size=(len(image_files),), p=[.85, .15])
files_to_move = compress(image_files, choices)

for _f in files_to_move:
origin_path = os.path.join(data_path, cat,  _f)
dest_dir = os.path.join(test_path, cat)
dest_path = os.path.join(test_path, cat, _f)
if not os.path.isdir(dest_dir):
os.mkdir(dest_dir)
shutil.move(origin_path, dest_path)



The labels need to be one-hot encoded for categorical cross entropy cost function to work. Scikit has LabelBinarizer which does the job for us.

catagories = os.listdir('data/101_ObjectCategories/train')
catagories_one_hot = LabelBinarizer().fit_transform(catagories)


### Image pre-processing

There are typically three pre-processing techniques, mean subtraction, normalisation and data whitening.

Both mean subtraction and normalisation is used to centre the data around zero means. This typically helps the network to learn faster since gradients act uniformly for each channel.

Mean subtraction can be implemented as

X -= np.mean(X)

Normalisation will alter the scale the data to appropriately the same level. VGG19 network uses multiple nomalisation layers so it is not required in the pre-processing step.

### Results

The full source code is pushed to this repository

After 50 minutes of training, I got following result.

>> print('Total {0} incorrectly classified images out of {1}'.format(len(incorrect_indexes), len(v19_results)))
>> print('Accuracy: {0}'.format(accuracy_score(actual, v19_predicted)))

Total 304 incorrectly classified images out of 1263
Accuracy: 0.7593032462391133



### Improvements

• More training data, Caltech 256 has over 30k images should help achieve a much higher score, but it also means it may take days to train.
• More image argumentation. This is to combat the defects in CNN architecture. CNN does not model the spacial relationship between features. The use of MaxPooling only convey the presence of the feature. More intense Image argumentation helps mitigate some of the issue. It generates the image at more view angels.
• Test with other models (Inception, Resnet etc..)

Let’s have a look what images are wrongly classified

def plot_incorrect_images(predicted, actual):
for i, (_actual, _predicted) in enumerate(zip(actual, predicted)):
if not _actual == _predicted:
image_file_name = val_generator.filenames[i]
im = Image.open(os.path.join(val_data_dir, image_file_name))
im.thumbnail((128, 128), Image.ANTIALIAS)
display(im)
print('Predicted: {0}\nActual   : {1}'.format(key_to_label[_predicted], key_to_label[_actual]))


Predicted: Faces_easy
Actual   : Faces


Predicted: Faces_easy
Actual   : Faces


Predicted: wild_cat
Actual   : Leopards


Predicted: pigeon
Actual   : Motorbikes


Predicted: chandelier
Actual   : airplanes


Predicted: helicopter
Actual   : airplanes


Predicted: llama
Actual   : barrel


Predicted: pigeon
Actual   : barrel


Predicted: okapi
Actual   : barrel


Predicted: water_lilly
Actual   : bass


Predicted: cannon
Actual   : bass


Predicted: hawksbill
Actual   : bass


Predicted: electric_guitar
Actual   : bass


Predicted: crocodile_head
Actual   : bass


Predicted: platypus
Actual   : beaver


Predicted: hedgehog
Actual   : beaver


Predicted: Faces
Actual   : beaver


Predicted: bass
Actual   : bonsai


Predicted: nautilus
Actual   : brain


Predicted: hedgehog
Actual   : brain


Predicted: pizza
Actual   : brain


Predicted: cougar_body
Actual   : brain


Predicted: lobster
Actual   : brain


The end