In this module, we are going to use a pretrained CNN model to perform image classification on our dogs vs. cats images.
Learning objectives:
- Why using pretrained models can be efficient and effective.
- How to perform feature extraction with pretrained models.
- How you can fine tune a pretrained model and run end-to-end.
Requirements
library(keras)
Data
We are working with the same dogs and cats images as before.
# define the directories:
if (stringr::str_detect(here::here(), "conf-2020-user")) {
image_dir <- "/home/conf-2020-user/data/dogs-vs-cats"
} else {
image_dir <- here::here("materials", "data", "dogs-vs-cats")
}
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
Transfer Learning
There are two main ways we can apply a pretrained model to perform a CNN
- Feature extraction: Use the convolutional base to do feature engineering on our images and then feed into a new densely connected classifier.
- Most efficient
- Does not require GPUs
- Does not “personalize” feature extraction to the problem at hand
- Likely leaves room for improvement
- Fine tune a pretrained model and run end-to-end: Build a full sequential model with the convolutional base and a new densely connected classifier and train the entire model with some or all of the convolutional base layers frozen.
- Computationally demanding
- Often requires GPUs
- Tweaks pretrained convolution layers to extract problem-specific features
- Maximize performance
Transfer Learning: End-to-End
⚠️⚠️ ONLY RUN ON GPU!! ⚠️⚠️
The above approach performed pretty well. However, we can see that we are still overfitting, which may be reducing model performance. An alternative approach is to run a pretrained model from end-to-end. This approach is much slower and computationally intense; however, it offers greater flexibility in using and adjusting the pretrained model because it lets you:
- use data augmentation to decrease overfitting (and usually increase model performance).
- fine tune parts of the pretrained model.
The following approach simply plugs the pretrained convolution base into a sequential model but freezes the convolution base weights.
Combining a densely-connected neural network with the convolutional base
In this case we can literally plug in our conv_base
within our model architecture.
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model
Freezing layers
Before you compile and train the model, it’s important to freeze the convolutional base weights. This prevents the weights from being updated during training. If you don’t do this then the representations found in the pretrained model will be modified and, potentially, completely destroyed.
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
freeze_weights(conv_base)
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
Training the model end-to-end with a frozen convolutional base
The following trains the model end-to-end using all CNN logic that you have seen before:
- data augmentation
- image generator
- compile our model
- train our model
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
test_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
validation_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50
)
plot(history2)
Transfer Learning: Fine Tune
Another widely used technique for using pretrained models, is to unfreeze a few of the convolutional base and allow those weights to be updated. Recall that the early layers in a CNN identify detailed edges and shapes. Later layers put these edges and shapes together to make higher order parts of the images we are trying to classify (i.e. cat ears, dog tails).
The more our images deviate from the images used to create the pretrained model, then the more likely you will want to retrain the last few layers, which will make the edge and shape features more relevant to your problem.
To fine-tune a pretrained model you:
- Add your custom network on top of an already-trained base network (executed in the
CNN-base-and-classifier
code chunk).
- Freeze the base network (executed in the
freeze-parameters
code chunk).
- Train the part you added (executed in the
train-end-to-end
code chunk).
- Unfreeze some layers in the base network.
- Jointly train both these layers and the part you added.
We already did steps 1-3. The following executes steps 4 and 5.
Unfreeze some layers in the base network
unfreeze_weights(conv_base, from = "block3_conv1")
Jointly train both these layers and the part you added
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 100,
validation_data = validation_generator,
validation_steps = 50
)
plot(history2)
