This project is designed to test your current knowledge on applying a CNN to the natural images dataset on Kaggle. This dataset contains 6,899 images from 8 distinct classes to include airplane, car, cat, dog, flower, fruit, motorbike and person.
Your goal is to develop a CNN model to accurately classify new images. Using only the knowledge you’ve gained thus far, and repurposing code from previous modules, you should be able to obtain an accuracy of approximately 90% or higher.
Good luck!
Package Requirements
Depending on your approach you may need to load more libraries.
library(keras)
Part 1: Data Preparation
We have already downloaded and organized the images into train, validation, and test directories.
# define the directories:
image_dir <- here::here("materials", "data", "natural_images")
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
As previously mentioned, there are 8 total classes, each with fairly proportional number of train, validation, and test images:
classes <- list.files(train_dir)
total_train <- 0
total_valid <- 0
total_test <- 0
for (class in classes) {
# how many images in each class
n_train <- length(list.files(file.path(train_dir, class)))
n_valid <- length(list.files(file.path(valid_dir, class)))
n_test <- length(list.files(file.path(test_dir, class)))
cat(crayon::underline(crayon::red(class)), ": ",
"train (", n_train, "), ",
"valid (", n_valid, "), ",
"test (", n_test, ")", "\n", sep = "")
# tally up totals
total_train <- total_train + n_train
total_valid <- total_valid + n_valid
total_test <- total_test + n_test
}
[4m[31mairplane[39m[24m: train (436), valid (145), test (146)
[4m[31mcar[39m[24m: train (580), valid (193), test (195)
[4m[31mcat[39m[24m: train (531), valid (177), test (177)
[4m[31mdog[39m[24m: train (421), valid (140), test (141)
[4m[31mflower[39m[24m: train (505), valid (168), test (170)
[4m[31mfruit[39m[24m: train (600), valid (200), test (200)
[4m[31mmotorbike[39m[24m: train (472), valid (157), test (159)
[4m[31mperson[39m[24m: train (591), valid (197), test (198)
cat("\n", "total training images: ", total_train, "\n",
"total validation images: ", total_valid, "\n",
"total test images: ", total_test, sep = "")
total training images: 4136
total validation images: 1377
total test images: 1386
Let’s check out the first image from each class:
op <- par(mfrow = c(2, 4), mar = c(0.5, 0.2, 1, 0.2))
for (class in classes) {
image_path <- list.files(file.path(train_dir, class), full.names = TRUE)[[1]]
plot(as.raster(jpeg::readJPEG(image_path)))
title(main = class)
}
par(op)
Part 2: Modeling
There are two approaches you could take to model this data:
- End-to-end trained CNN with your own custom convolutional layer (reference the 04-computer-vision-CNNs/02-cats-vs-dogs.Rmd file). This solution requires less code but takes 1-2 hours without GPUs. Using the same model structure in this notebook should net you about 89-90% accuracy.
- Apply a pre-trained model (reference the 04-computer-vision-CNNs/03-transfer- learning.Rmd file). This solution is much quicker to train but requires more code. Using the feature extraction model structure in this notebook should net you about 98% accuracy.
Leverage your neighbors’ knowledge and here are some things to think about:
- Architecture & compile steps:
- This is a multi-class classification problem so be sure to pick the activation and loss function that aligns to this type of problem.
- You will likely not have time to test out multiple models and tune hyperparameters so you can assume that the default learning rate or just slightly smaller will work sufficiently (lr = 1e-3 or 1e-4).
- Image data generators
- The images are traditional RGB images with pixel values of 0-255
- Resizing the images to 150x150 is sufficient
- Be sure to set
class_mode = "categorical"
in the flow_images_from_directory
function.
- Batch size of 32 is sufficient
- Use the
train_dir
, valid_dir
, and test_dir
directories above
- Model fitting
- 50 epochs will be more than enough and…
- Be sure to use an early stopping callback to speed up training time
- Using a callback to automatically reduce the learning rate will improve performance
Do the best you can leveraging code from the Cats vs. Dogs and Transfer Learning notebooks. Much of this code will transfer over one-for-one. If you run into problems it is ok to peak at the solution notebook but try to make this your last resort.
LS0tCnRpdGxlOiAiUHJvamVjdCAxOiBDbGFzc2lmeWluZyBOYXR1cmFsIEltYWdlcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKYGBgCgpUaGlzIHByb2plY3QgaXMgZGVzaWduZWQgdG8gdGVzdCB5b3VyIGN1cnJlbnQga25vd2xlZGdlIG9uIGFwcGx5aW5nIGEgQ05OIHRvIHRoZSAKW25hdHVyYWwgaW1hZ2VzXShodHRwczovL3d3dy5rYWdnbGUuY29tL3ByYXN1bnJveS9uYXR1cmFsLWltYWdlcykgZGF0YXNldCBvbiBLYWdnbGUuIApUaGlzIGRhdGFzZXQgY29udGFpbnMgNiw4OTkgaW1hZ2VzIGZyb20gOCBkaXN0aW5jdCBjbGFzc2VzIHRvIGluY2x1ZGUgYWlycGxhbmUsIApjYXIsIGNhdCwgZG9nLCBmbG93ZXIsIGZydWl0LCBtb3RvcmJpa2UgYW5kIHBlcnNvbi4KCllvdXIgZ29hbCBpcyB0byBkZXZlbG9wIGEgQ05OIG1vZGVsIHRvIGFjY3VyYXRlbHkgY2xhc3NpZnkgbmV3IGltYWdlcy4gVXNpbmcgb25seSAKdGhlIGtub3dsZWRnZSB5b3UndmUgZ2FpbmVkIHRodXMgZmFyLCBhbmQgcmVwdXJwb3NpbmcgY29kZSBmcm9tIHByZXZpb3VzIG1vZHVsZXMsIAp5b3Ugc2hvdWxkIGJlIGFibGUgdG8gb2J0YWluIGFuIGFjY3VyYWN5IG9mIGFwcHJveGltYXRlbHkgOTAlIG9yIGhpZ2hlci4KCl9fX0dvb2QgbHVjayFfX18KCiMjIFBhY2thZ2UgUmVxdWlyZW1lbnRzCgpEZXBlbmRpbmcgb24geW91ciBhcHByb2FjaCB5b3UgbWF5IG5lZWQgdG8gbG9hZCBtb3JlIGxpYnJhcmllcy4KCmBgYHtyfQpsaWJyYXJ5KGtlcmFzKQpgYGAKCgojIFBhcnQgMTogRGF0YSBQcmVwYXJhdGlvbgoKV2UgaGF2ZSBhbHJlYWR5IGRvd25sb2FkZWQgYW5kIG9yZ2FuaXplZCB0aGUgaW1hZ2VzIGludG8gdHJhaW4sIHZhbGlkYXRpb24sIGFuZCAKdGVzdCBkaXJlY3Rvcmllcy4KCmBgYHtyIGltYWdlLWZpbGUtcGF0aHN9CiMgZGVmaW5lIHRoZSBkaXJlY3RvcmllczoKaW1hZ2VfZGlyIDwtIGhlcmU6OmhlcmUoIm1hdGVyaWFscyIsICJkYXRhIiwgIm5hdHVyYWxfaW1hZ2VzIikKdHJhaW5fZGlyIDwtIGZpbGUucGF0aChpbWFnZV9kaXIsICJ0cmFpbiIpCnZhbGlkX2RpciA8LSBmaWxlLnBhdGgoaW1hZ2VfZGlyLCAidmFsaWRhdGlvbiIpCnRlc3RfZGlyIDwtIGZpbGUucGF0aChpbWFnZV9kaXIsICJ0ZXN0IikKYGBgCgpBcyBwcmV2aW91c2x5IG1lbnRpb25lZCwgdGhlcmUgYXJlIDggdG90YWwgY2xhc3NlcywgZWFjaCB3aXRoIGZhaXJseSBwcm9wb3J0aW9uYWwgCm51bWJlciBvZiB0cmFpbiwgdmFsaWRhdGlvbiwgYW5kIHRlc3QgaW1hZ2VzOgoKYGBge3J9CmNsYXNzZXMgPC0gbGlzdC5maWxlcyh0cmFpbl9kaXIpCnRvdGFsX3RyYWluIDwtIDAKdG90YWxfdmFsaWQgPC0gMAp0b3RhbF90ZXN0IDwtIDAKCmZvciAoY2xhc3MgaW4gY2xhc3NlcykgewogICMgaG93IG1hbnkgaW1hZ2VzIGluIGVhY2ggY2xhc3MKICBuX3RyYWluIDwtIGxlbmd0aChsaXN0LmZpbGVzKGZpbGUucGF0aCh0cmFpbl9kaXIsIGNsYXNzKSkpCiAgbl92YWxpZCA8LSBsZW5ndGgobGlzdC5maWxlcyhmaWxlLnBhdGgodmFsaWRfZGlyLCBjbGFzcykpKQogIG5fdGVzdCA8LSBsZW5ndGgobGlzdC5maWxlcyhmaWxlLnBhdGgodGVzdF9kaXIsIGNsYXNzKSkpCiAgCiAgY2F0KGNyYXlvbjo6dW5kZXJsaW5lKGNyYXlvbjo6cmVkKGNsYXNzKSksICI6ICIsIAogICAgICAidHJhaW4gKCIsIG5fdHJhaW4sICIpLCAiLCAKICAgICAgInZhbGlkICgiLCBuX3ZhbGlkLCAiKSwgIiwgCiAgICAgICJ0ZXN0ICgiLCBuX3Rlc3QsICIpIiwgIlxuIiwgc2VwID0gIiIpCiAgCiAgIyB0YWxseSB1cCB0b3RhbHMKICB0b3RhbF90cmFpbiA8LSB0b3RhbF90cmFpbiArIG5fdHJhaW4KICB0b3RhbF92YWxpZCA8LSB0b3RhbF92YWxpZCArIG5fdmFsaWQKICB0b3RhbF90ZXN0IDwtIHRvdGFsX3Rlc3QgKyBuX3Rlc3QKfQoKY2F0KCJcbiIsICJ0b3RhbCB0cmFpbmluZyBpbWFnZXM6ICIsIHRvdGFsX3RyYWluLCAiXG4iLAogICAgInRvdGFsIHZhbGlkYXRpb24gaW1hZ2VzOiAiLCB0b3RhbF92YWxpZCwgIlxuIiwKICAgICJ0b3RhbCB0ZXN0IGltYWdlczogIiwgdG90YWxfdGVzdCwgc2VwID0gIiIpCmBgYAoKTGV0J3MgY2hlY2sgb3V0IHRoZSBmaXJzdCBpbWFnZSBmcm9tIGVhY2ggY2xhc3M6CgpgYGB7ciBleGFtcGxlLWltYWdlc30Kb3AgPC0gcGFyKG1mcm93ID0gYygyLCA0KSwgbWFyID0gYygwLjUsIDAuMiwgMSwgMC4yKSkKZm9yIChjbGFzcyBpbiBjbGFzc2VzKSB7CiAgaW1hZ2VfcGF0aCA8LSBsaXN0LmZpbGVzKGZpbGUucGF0aCh0cmFpbl9kaXIsIGNsYXNzKSwgZnVsbC5uYW1lcyA9IFRSVUUpW1sxXV0KICBwbG90KGFzLnJhc3RlcihqcGVnOjpyZWFkSlBFRyhpbWFnZV9wYXRoKSkpCiAgdGl0bGUobWFpbiA9IGNsYXNzKQp9CiAgICAgICAKcGFyKG9wKQpgYGAKCiMgUGFydCAyOiBNb2RlbGluZwoKVGhlcmUgYXJlIHR3byBhcHByb2FjaGVzIHlvdSBjb3VsZCB0YWtlIHRvIG1vZGVsIHRoaXMgZGF0YToKCjEuIEVuZC10by1lbmQgdHJhaW5lZCBDTk4gd2l0aCB5b3VyIG93biBjdXN0b20gY29udm9sdXRpb25hbCBsYXllciAocmVmZXJlbmNlIAogICB0aGUgMDQtY29tcHV0ZXItdmlzaW9uLUNOTnMvMDItY2F0cy12cy1kb2dzLlJtZCBmaWxlKS4gVGhpcyBzb2x1dGlvbiByZXF1aXJlcwogICBsZXNzIGNvZGUgYnV0IHRha2VzIDEtMiBob3VycyB3aXRob3V0IEdQVXMuIFVzaW5nIHRoZSBzYW1lIG1vZGVsIHN0cnVjdHVyZQogICBpbiB0aGlzIFtub3RlYm9va10oaHR0cHM6Ly9yc3R1ZGlvLWNvbmYtMjAyMC5naXRodWIuaW8vZGwta2VyYXMtdGYvbm90ZWJvb2tzLzAyLWNhdHMtdnMtZG9ncy5uYi5odG1sKQogICBzaG91bGQgbmV0IHlvdSBhYm91dCA4OS05MCUgYWNjdXJhY3kuCjIuIEFwcGx5IGEgcHJlLXRyYWluZWQgbW9kZWwgKHJlZmVyZW5jZSB0aGUgMDQtY29tcHV0ZXItdmlzaW9uLUNOTnMvMDMtdHJhbnNmZXItCiAgIGxlYXJuaW5nLlJtZCBmaWxlKS4gVGhpcyBzb2x1dGlvbiBpcyBtdWNoIHF1aWNrZXIgdG8gdHJhaW4gYnV0IHJlcXVpcmVzIG1vcmUKICAgY29kZS4gVXNpbmcgdGhlIGZlYXR1cmUgZXh0cmFjdGlvbiBtb2RlbCBzdHJ1Y3R1cmUgaW4gdGhpcyBbbm90ZWJvb2tdKGh0dHBzOi8vcnN0dWRpby1jb25mLTIwMjAuZ2l0aHViLmlvL2RsLWtlcmFzLXRmL25vdGVib29rcy8wMy10cmFuc2Zlci1sZWFybmluZy5uYi5odG1sKQogICBzaG91bGQgbmV0IHlvdSBhYm91dCA5OCUgYWNjdXJhY3kuCgpMZXZlcmFnZSB5b3VyIG5laWdoYm9ycycga25vd2xlZGdlIGFuZCBoZXJlIGFyZSBzb21lIHRoaW5ncyB0byB0aGluayBhYm91dDoKCi0gQXJjaGl0ZWN0dXJlICYgY29tcGlsZSBzdGVwczoKICAgLSBUaGlzIGlzIGEgbXVsdGktY2xhc3MgY2xhc3NpZmljYXRpb24gcHJvYmxlbSBzbyBiZSBzdXJlIHRvIHBpY2sgdGhlCiAgICAgYWN0aXZhdGlvbiBhbmQgbG9zcyBmdW5jdGlvbiB0aGF0IGFsaWducyB0byB0aGlzIHR5cGUgb2YgcHJvYmxlbS4KICAgLSBZb3Ugd2lsbCBsaWtlbHkgbm90IGhhdmUgdGltZSB0byB0ZXN0IG91dCBtdWx0aXBsZSBtb2RlbHMgYW5kIHR1bmUKICAgICBoeXBlcnBhcmFtZXRlcnMgc28geW91IGNhbiBhc3N1bWUgdGhhdCB0aGUgZGVmYXVsdCBsZWFybmluZyByYXRlIG9yIGp1c3QKICAgICBzbGlnaHRseSBzbWFsbGVyIHdpbGwgd29yayBzdWZmaWNpZW50bHkgKGxyID0gMWUtMyBvciAxZS00KS4KLSBJbWFnZSBkYXRhIGdlbmVyYXRvcnMKICAgLSBUaGUgaW1hZ2VzIGFyZSB0cmFkaXRpb25hbCBSR0IgaW1hZ2VzIHdpdGggcGl4ZWwgdmFsdWVzIG9mIDAtMjU1CiAgIC0gUmVzaXppbmcgdGhlIGltYWdlcyB0byAxNTB4MTUwIGlzIHN1ZmZpY2llbnQKICAgLSBCZSBzdXJlIHRvIHNldCBgY2xhc3NfbW9kZSA9ICJjYXRlZ29yaWNhbCJgIGluIHRoZSBgZmxvd19pbWFnZXNfZnJvbV9kaXJlY3RvcnlgCiAgICAgZnVuY3Rpb24uCiAgIC0gQmF0Y2ggc2l6ZSBvZiAzMiBpcyBzdWZmaWNpZW50CiAgIC0gVXNlIHRoZSBgdHJhaW5fZGlyYCwgYHZhbGlkX2RpcmAsIGFuZCBgdGVzdF9kaXJgIGRpcmVjdG9yaWVzIGFib3ZlCi0gTW9kZWwgZml0dGluZwogICAtIDUwIGVwb2NocyB3aWxsIGJlIG1vcmUgdGhhbiBlbm91Z2ggYW5kLi4uIAogICAtIEJlIHN1cmUgdG8gdXNlIGFuIGVhcmx5IHN0b3BwaW5nIGNhbGxiYWNrIHRvIHNwZWVkIHVwIHRyYWluaW5nIHRpbWUKICAgLSBVc2luZyBhIGNhbGxiYWNrIHRvIGF1dG9tYXRpY2FsbHkgcmVkdWNlIHRoZSBsZWFybmluZyByYXRlIHdpbGwgaW1wcm92ZQogICAgIHBlcmZvcm1hbmNlCiAgIApEbyB0aGUgYmVzdCB5b3UgY2FuIGxldmVyYWdpbmcgY29kZSBmcm9tIHRoZSBbQ2F0cyB2cy4gRG9nc10oaHR0cHM6Ly9yc3R1ZGlvLWNvbmYtMjAyMC5naXRodWIuaW8vZGwta2VyYXMtdGYvbm90ZWJvb2tzLzAyLWNhdHMtdnMtZG9ncy5uYi5odG1sKQphbmQgW1RyYW5zZmVyIExlYXJuaW5nXShodHRwczovL3JzdHVkaW8tY29uZi0yMDIwLmdpdGh1Yi5pby9kbC1rZXJhcy10Zi9ub3RlYm9va3MvMDMtdHJhbnNmZXItbGVhcm5pbmcubmIuaHRtbCkKbm90ZWJvb2tzLiBNdWNoIG9mIHRoaXMgY29kZSB3aWxsIHRyYW5zZmVyIG92ZXIgb25lLWZvci1vbmUuIElmIHlvdSBydW4gaW50bwpwcm9ibGVtcyBpdCBpcyBvayB0byBwZWFrIGF0IHRoZSBbc29sdXRpb24gbm90ZWJvb2tdKGh0dHBzOi8vcnN0dWRpby1jb25mLTIwMjAuZ2l0aHViLmlvL2RsLWtlcmFzLXRmL25vdGVib29rcy9wcm9qZWN0MS1uYXR1cmFsLWltYWdlcy5uYi5odG1sKQpidXQgdHJ5IHRvIG1ha2UgdGhpcyB5b3VyIGxhc3QgcmVzb3J0Lgo=