This project is designed to test your current knowledge on applying a CNN to the natural images dataset on Kaggle. This dataset contains 6,899 images from 8 distinct classes to include airplane, car, cat, dog, flower, fruit, motorbike and person.
Your goal is to develop a CNN model to accurately classify new images. Using only the knowledge you’ve gained thus far, and repurposing code from previous modules, you should be able to obtain an accuracy of approximately 90% or higher.
Good luck!
Depending on your approach you may need to load more libraries.
library(keras)
library(ggplot2)
library(glue)
We have already downloaded and organized the images into train, validation, and test directories.
# define the directories:
image_dir <- here::here("materials", "data", "natural_images")
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
As previously mentioned, there are 8 total classes, each with fairly proportional number of train, validation, and test images:
classes <- list.files(train_dir)
total_train <- 0
total_valid <- 0
total_test <- 0
for (class in classes) {
# how many images in each class
n_train <- length(list.files(file.path(train_dir, class)))
n_valid <- length(list.files(file.path(valid_dir, class)))
n_test <- length(list.files(file.path(test_dir, class)))
cat(toupper(class), ": ",
"train (", n_train, "), ",
"valid (", n_valid, "), ",
"test (", n_test, ")", "\n", sep = "")
# tally up totals
total_train <- total_train + n_train
total_valid <- total_valid + n_valid
total_test <- total_test + n_test
}
AIRPLANE: train (436), valid (145), test (146)
CAR: train (580), valid (193), test (195)
CAT: train (531), valid (177), test (177)
DOG: train (421), valid (140), test (141)
FLOWER: train (505), valid (168), test (170)
FRUIT: train (600), valid (200), test (200)
MOTORBIKE: train (472), valid (157), test (159)
PERSON: train (591), valid (197), test (198)
cat("\n", "total training images: ", total_train, "\n",
"total validation images: ", total_valid, "\n",
"total test images: ", total_test, sep = "")
total training images: 4136
total validation images: 1377
total test images: 1386
Let’s check out the first image from each class:
op <- par(mfrow = c(2, 4), mar = c(0.5, 0.2, 1, 0.2))
for (class in classes) {
image_path <- list.files(file.path(train_dir, class), full.names = TRUE)[[1]]
plot(as.raster(jpeg::readJPEG(image_path)))
title(main = class)
}
par(op)
There are two approaches you could take to model this data:
To train a CNN from end-to-end ou could use the exact same architecture we applied in the Cats vs. Dogs notebook and that would get you about 89-90% accuracy. Here, I use a larger capacity model which will take a while to train (~ 2 hours without a GPU). Also, note that we need to use a softmax activation function and the categorical crossentropy loss function since we are dealing with a multi-class classification problem.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu",
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 256, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 512, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = length(classes), activation = "softmax")
summary(model)
Model: "sequential_4"
________________________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================================
conv2d_16 (Conv2D) (None, 148, 148, 64) 1792
________________________________________________________________________________________________
max_pooling2d_16 (MaxPooling2D) (None, 74, 74, 64) 0
________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, 72, 72, 128) 73856
________________________________________________________________________________________________
max_pooling2d_17 (MaxPooling2D) (None, 36, 36, 128) 0
________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, 34, 34, 256) 295168
________________________________________________________________________________________________
max_pooling2d_18 (MaxPooling2D) (None, 17, 17, 256) 0
________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, 15, 15, 512) 1180160
________________________________________________________________________________________________
max_pooling2d_19 (MaxPooling2D) (None, 7, 7, 512) 0
________________________________________________________________________________________________
flatten_4 (Flatten) (None, 25088) 0
________________________________________________________________________________________________
dropout_4 (Dropout) (None, 25088) 0
________________________________________________________________________________________________
dense_8 (Dense) (None, 512) 12845568
________________________________________________________________________________________________
dense_9 (Dense) (None, 8) 4104
================================================================================================
Total params: 14,400,648
Trainable params: 14,400,648
Non-trainable params: 0
________________________________________________________________________________________________
When compiling the model, using the default, or slightly lower, learning rate is sufficient. In this example I use the default but in the model training step I also apply a callback to reduce the learning rate once our loss has plateaued.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = "rmsprop",
metrics = "accuracy"
)
Next, I need to use image_data_generator
and flow_images_from_directory
to import and transform our images into tensors. In this example I:
class_mode = "categorical"
since we are working with a multi-class problem# only augment training data
train_datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
)
# do not augment test and validation data
test_datagen <- image_data_generator(rescale = 1/255)
# generate batches of data from training directory
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 32,
class_mode = "categorical"
)
# generate batches of data from validation directory
validation_generator <- flow_images_from_directory(
valid_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 32,
class_mode = "categorical"
)
Now we can train our model. 50 epochs should be plenty. We also need to add the steps_per_epoch
and validation_steps
, which is just the size of the training and validation data divided by the batch size. Lastly, I add a callback to reduce the learning rate after 3 epochs of no improvement and another to stop training if I don’t have improvement in my loss after 7 epochs.
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = ceiling(total_train / 32),
epochs = 50,
validation_data = validation_generator,
validation_steps = ceiling(total_valid / 32),
callbacks = list(
callback_reduce_lr_on_plateau(patience = 3),
callback_early_stopping(patience = 7)
)
)
Our loss is optimized after 21 epochs and acheives 94% accuracy!
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Our optimal loss is 0.157 with an accuracy of 0.943
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
An alternative approach is to use transfer learning as we did in the transfer learning notebook. In this example, we will perform the feature extraction approach for transfer learning and we’ll use the VGG16 model.
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
summary(conv_base)
Model: "vgg16"
________________________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================================
input_1 (InputLayer) [(None, 150, 150, 3)] 0
________________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 150, 150, 64) 1792
________________________________________________________________________________________________
block1_conv2 (Conv2D) (None, 150, 150, 64) 36928
________________________________________________________________________________________________
block1_pool (MaxPooling2D) (None, 75, 75, 64) 0
________________________________________________________________________________________________
block2_conv1 (Conv2D) (None, 75, 75, 128) 73856
________________________________________________________________________________________________
block2_conv2 (Conv2D) (None, 75, 75, 128) 147584
________________________________________________________________________________________________
block2_pool (MaxPooling2D) (None, 37, 37, 128) 0
________________________________________________________________________________________________
block3_conv1 (Conv2D) (None, 37, 37, 256) 295168
________________________________________________________________________________________________
block3_conv2 (Conv2D) (None, 37, 37, 256) 590080
________________________________________________________________________________________________
block3_conv3 (Conv2D) (None, 37, 37, 256) 590080
________________________________________________________________________________________________
block3_pool (MaxPooling2D) (None, 18, 18, 256) 0
________________________________________________________________________________________________
block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160
________________________________________________________________________________________________
block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808
________________________________________________________________________________________________
block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808
________________________________________________________________________________________________
block4_pool (MaxPooling2D) (None, 9, 9, 512) 0
________________________________________________________________________________________________
block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808
________________________________________________________________________________________________
block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808
________________________________________________________________________________________________
block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808
________________________________________________________________________________________________
block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
================================================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
________________________________________________________________________________________________
Next, we use the exact same code to extract the features as we did in the transfer learning notebook with one exception. Note how I add a shuffle = TRUE
parameter to the extract_features()
function. I leave this set as TRUE
for the training and validation set but I do not shuffle the test set. This will allow me to visualize the misclassified images later on.
datagen <- image_data_generator(rescale = 1/255)
batch_size <- 32
extract_features <- function(directory, sample_count, shuffle = TRUE) {
features <- array(0, dim = c(sample_count, 4, 4, 512))
labels <- array(0, dim = c(sample_count, length(classes)))
generator <- flow_images_from_directory(
directory = directory,
generator = datagen,
target_size = c(150, 150),
batch_size = batch_size,
class_mode = "categorical",
shuffle = shuffle
)
i <- 0
while (TRUE) {
cat("Processing batch", i + 1, "of", ceiling(sample_count / batch_size), "\n")
batch <- generator_next(generator)
inputs_batch <- batch[[1]]
labels_batch <- batch[[2]]
features_batch <- conv_base %>% predict(inputs_batch)
index_range <- ((i * batch_size) + 1):((i + 1) * batch_size)
features[index_range,,,] <- features_batch
labels[index_range, ] <- labels_batch
i <- i + 1
if (i * batch_size >= sample_count) break
}
list(
features = features,
labels = labels
)
}
train <- extract_features(train_dir, 32*129)
validation <- extract_features(valid_dir, 32*43)
test <- extract_features(test_dir, 32*43, shuffle = FALSE)
The extracted features will be a 4D tensor (samples, 4, 4, 512). We can see this in the last layer of our conv_base model above (block5_pool (MaxPooling2D)). Consequently, we need to reshape (flatten) these into a 2D tensor to feed into a densely connected classifier. This results in a 2D tensor of size (samples, 4 * 4 * 512 = 8192).
reshape_features <- function(features) {
array_reshape(features, dim = c(nrow(features), 4 * 4 * 512))
}
train$features <- reshape_features(train$features)
validation$features <- reshape_features(validation$features)
test$features <- reshape_features(test$features)
Now we can build our classifier model. Again, we use the same code as we applied in the transfer learning notebook; however, since we have a multi-class problem we need to change the number of units and the activation function in the last layer.
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = "relu", input_shape = ncol(train$features)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 8, activation = "softmax")
summary(model)
Model: "sequential_1"
________________________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================================
dense_2 (Dense) (None, 256) 2097408
________________________________________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
________________________________________________________________________________________________
dense_3 (Dense) (None, 8) 2056
================================================================================================
Total params: 2,099,464
Trainable params: 2,099,464
Non-trainable params: 0
________________________________________________________________________________________________
We can now compile and train:
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(lr = 0.0001),
metrics = "accuracy"
)
history_pretrained <- model %>% fit(
train$features, train$labels,
epochs = 50,
batch_size = 32,
validation_data = list(validation$features, validation$labels),
callbacks = list(
callback_reduce_lr_on_plateau(patience = 3),
callback_early_stopping(patience = 7)
)
)
Our model trains quickly and our optimal loss is 70% lower than the end-to-end CNN model and our accuracy increases 4 percentage points to 98.3%!
best_epoch <- which.min(history_pretrained$metrics$val_loss)
best_loss <- history_pretrained$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history_pretrained$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Our optimal loss is 0.046 with an accuracy of 0.983
plot(history_pretrained) +
scale_x_continuous(limits = c(0, length(history_pretrained$metrics$val_loss)))
Let’s see how well our pretrained model performs on the test set. The following shows nearly 98% accuracy on the test set.
model %>% evaluate(test$features, test$labels, verbose = FALSE)
$loss
[1] 0.06186189
$accuracy
[1] 0.9774709
The following code will identify the misclassified predictions and the images related to these misclassifications.
predictions <- model %>% predict_classes(test$features, verbose = FALSE) + 1
actuals <- max.col(test$labels)
misclassified <- which(predictions != actuals)
actual_class <- list.files(test_dir)[actuals[misclassified]]
predicted_class <- list.files(test_dir)[predictions[misclassified]]
misclassified_img <- list.files(test_dir, recursive = TRUE, full.names = TRUE)[misclassified]
Now we can look at the images that were misclassified. Note that most of the misclassified images were cats and dogs.
table(actual_class)
op <- par(
mfrow = c(ceiling(length(misclassified) / 3), 3),
mar = c(2, 0.2, 2.3, 0.2),
pty = "s"
)
for (i in seq_along(misclassified_img)) {
img <- misclassified_img[i]
plot(as.raster(jpeg::readJPEG(img)))
title(main = glue("Predicted: {predicted_class[i]}\n Actual: {actual_class[i]}"))
}
par(op)