In this example, we are going to apply a CNN to classify dogs vs. cats images. This will walk you through the fundamentals of importing images, applying image augmentation, and performing classification on them.
Learning objectives:
library(keras)
library(glue)
library(tidyverse)
We are going to use the Dogs vs. Cats Kaggle competition data set (https://www.kaggle.com/c/dogs-vs-cats/data). However, do to size and runtime limitations, we are going to only use a subset of the data. We have already set up the directories which look like:
- data
└── dogs-vs-cats
└── train
└── cats
├── cat.1.jpg
├── cat.2.jpg
└── ...
└── dogs
├── dog.1.jpg
├── dog.2.jpg
└── ...
└── validation
├── cats
└── dogs
└── test
├── cats
└── dogs
# define the directories:
if (stringr::str_detect(here::here(), "conf-2020-user")) {
image_dir <- "/home/conf-2020-user/data/dogs-vs-cats"
} else {
image_dir <- here::here("materials", "data", "dogs-vs-cats")
}
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
# create train, validation, and test file paths for cat images
train_cats_dir <- file.path(train_dir, "cats")
valid_cats_dir <- file.path(valid_dir, "cats")
test_cats_dir <- file.path(test_dir, "cats")
# create train, validation, and test file paths for dog images
train_dogs_dir <- file.path(train_dir, "dogs")
valid_dogs_dir <- file.path(valid_dir, "dogs")
test_dogs_dir <- file.path(test_dir, "dogs")
Although there are 25,000 images in this data set, we are going to use a very small subset, which includes:
glue("Cat images:",
" - total training cat images: {length(list.files(train_cats_dir))}",
" - total training cat images: {length(list.files(train_cats_dir))}",
" - total test cat images: {length(list.files(test_cats_dir))}",
"\n",
"Dog images:",
" - total training dog images: {length(list.files(train_dogs_dir))}",
" - total validation dog images: {length(list.files(valid_dogs_dir))}",
" - total test dog images: {length(list.files(test_dogs_dir))}",
.sep = "\n"
)
Cat images:
- total training cat images: 1000
- total training cat images: 1000
- total test cat images: 500
Dog images:
- total training dog images: 1000
- total validation dog images: 500
- total test dog images: 500
Let’s check out the first 10 cat and dog images:
op <- par(mfrow = c(4, 5), pty = "s", mar = c(0.1, 0.1, 0.1, 0.1))
for (i in 1:10) {
plot(as.raster(jpeg::readJPEG(paste0(train_cats_dir, "/cat.", i, ".jpg"))))
plot(as.raster(jpeg::readJPEG(paste0(train_dogs_dir, "/dog.", i, ".jpg"))))
}
par(op)
We’re going to set up a simple CNN model that contains steps you saw in the previous module. This CNN includes:
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
summary(model)
Model: "sequential_2"
___________________________________________________________________________________________________
Layer (type) Output Shape Param #
===================================================================================================
conv2d_8 (Conv2D) (None, 148, 148, 32) 896
___________________________________________________________________________________________________
max_pooling2d_8 (MaxPooling2D) (None, 74, 74, 32) 0
___________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 72, 72, 64) 18496
___________________________________________________________________________________________________
max_pooling2d_9 (MaxPooling2D) (None, 36, 36, 64) 0
___________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 34, 34, 128) 73856
___________________________________________________________________________________________________
max_pooling2d_10 (MaxPooling2D) (None, 17, 17, 128) 0
___________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 15, 15, 128) 147584
___________________________________________________________________________________________________
max_pooling2d_11 (MaxPooling2D) (None, 7, 7, 128) 0
___________________________________________________________________________________________________
flatten_2 (Flatten) (None, 6272) 0
___________________________________________________________________________________________________
dense_4 (Dense) (None, 512) 3211776
___________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 513
===================================================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
___________________________________________________________________________________________________
Compile the model:
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 0.0001),
metrics = "accuracy"
)
Next, we need a process that imports our images and transforms them to tensors that our model can process. We’ll use two functions to perform this process.
image_data_generator
will:
image_data_generator
provides other capabilities that we’ll look at shortly.
flow_images_from_directory
will:
image_data_generator
train_datagen <- image_data_generator(rescale = 1/255)
valid_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
valid_dir,
valid_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
If we get the first batch from the generator, you will see that it yields 20 images of 150x150 pixels with three channels (20, 150, 150, 3) along with their binary labels (0, 1).
batch <- generator_next(train_generator)
str(batch)
List of 2
$ : num [1:20, 1:150, 1:150, 1:3] 0.3765 0.4863 0.0196 0.2902 0.0471 ...
$ : num [1:20(1d)] 0 0 0 1 1 0 0 1 1 0 ...
To train our model we’ll use fit_generator
which is the equivalent of fit
for data generators. We provide it our generators for the training and validation data. Plus, we need to specify:
steps_per_epoch
: how many samples to draw from the training generator before declaring an epoch over. Our generator supplies batches of 20 and we have 2,000 training images so we need 100 steps.validation_steps
: how many samples to draw from the validation generator. Our generator supplies batches of 20 and we have 1,000 validation images so we need 50 steps.Note:
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50,
callbacks = callback_early_stopping(patience = 5)
)
Our first model’s performance is not that bad but definitely has room for improvement.
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Our optimal loss is 0.559 with an accuracy of 0.721
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
Our model above does ok but definitely has room for improvement. One approach to improve performance is to collect more data. Unfortunately, this is not always an option. An alternative is to use image augmentation. ℹ️
datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
The following helps to visualize the idea of image augmentation by:
# get the first cat image
fnames <- list.files(train_cats_dir, full.names = TRUE)
img_path <- fnames[[1]]
# resize & reshape
img <- image_load(img_path, target_size = c(150, 150))
img_array <- image_to_array(img)
img_array <- array_reshape(img_array, c(1, 150, 150, 3))
# generate a a single augmented image
augmentation_generator <- flow_images_from_data(
img_array,
generator = datagen,
batch_size = 1
)
# plot 10 augmented images of the first cat image
op <- par(mfrow = c(2, 5), pty = "s", mar = c(0, 0.1, 0, 0.1))
for (i in 1:10) {
batch <- generator_next(augmentation_generator)
plot(as.raster(batch[1,,,]))
}
par(op)
Let’s create a new model that includes image augmentation and we’ll apply the dropout regularization method. The following creates a CNN architecture with:
All of which you are familiary with by now.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu", input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 0.0001),
metrics = "accuracy"
)
Now we can add image augmentation to our image_data_generator()
. The rest of the inputs remain the same.
Note:
# only augment training data
train_datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
)
# do not augment test and validation data
test_datagen <- image_data_generator(rescale = 1/255)
# generate batches of data from training directory
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
# generate batches of data from validation directory
validation_generator <- flow_images_from_directory(
valid_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
# train model
history <- model %>%
fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 100,
validation_data = validation_generator,
validation_steps = 50,
callbacks = callback_early_stopping(patience = 10)
)
As you can see, using image augmentation helps to improve our model’s performance. In fact, if we had more patience
with our early stopping we may even be able to nudge out a little more loss reduction.
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Our optimal loss is 0.439 with an accuracy of 0.795
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
We can always save our models as h5 files. Let’s save this model as we will use it one of the “extras” notebooks to illustrate how we can visualize CNNs (see this notebook https://rstudio-conf-2020.github.io/dl-keras-tf/notebooks/visualizing-what-cnns-learn.nb.html).
model %>% save_model_hdf5("cats_and_dogs_small_1.h5")
However, we still have room for improvement because we are only using a small subset of the available data. We have two options to improve our model:
Use more data. We are only using 2,000 of the 25,000 available images. However, this would have a significant impact on compute time.
Use transfer learning. This is much quicker than the first option so in the next module I demonstrate how to use transfer learning for CNNs.
image_data_generator
to read the images, decode pixel values, convert to a tensor, rescale, and perform image augmentation.flow_images_from_directory
import batches of our images, apply the image_data_generator
, resize, and infer training labels.