This project is designed to test your current knowledge on applying a CNN to the natural images dataset on Kaggle. This dataset contains 6,899 images from 8 distinct classes to include airplane, car, cat, dog, flower, fruit, motorbike and person.

Your goal is to develop a CNN model to accurately classify new images. Using only the knowledge you’ve gained thus far, and repurposing code from previous modules, you should be able to obtain an accuracy of approximately 90% or higher.

Good luck!

Package Requirements

Depending on your approach you may need to load more libraries.

library(keras)

Part 1: Data Preparation

We have already downloaded and organized the images into train, validation, and test directories.

# define the directories:
image_dir <- here::here("materials", "data", "natural_images")
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")

As previously mentioned, there are 8 total classes, each with fairly proportional number of train, validation, and test images:

classes <- list.files(train_dir)
total_train <- 0
total_valid <- 0
total_test <- 0

for (class in classes) {
  # how many images in each class
  n_train <- length(list.files(file.path(train_dir, class)))
  n_valid <- length(list.files(file.path(valid_dir, class)))
  n_test <- length(list.files(file.path(test_dir, class)))
  
  cat(crayon::underline(crayon::red(class)), ": ", 
      "train (", n_train, "), ", 
      "valid (", n_valid, "), ", 
      "test (", n_test, ")", "\n", sep = "")
  
  # tally up totals
  total_train <- total_train + n_train
  total_valid <- total_valid + n_valid
  total_test <- total_test + n_test
}
airplane: train (436), valid (145), test (146)
car: train (580), valid (193), test (195)
cat: train (531), valid (177), test (177)
dog: train (421), valid (140), test (141)
flower: train (505), valid (168), test (170)
fruit: train (600), valid (200), test (200)
motorbike: train (472), valid (157), test (159)
person: train (591), valid (197), test (198)
cat("\n", "total training images: ", total_train, "\n",
    "total validation images: ", total_valid, "\n",
    "total test images: ", total_test, sep = "")

total training images: 4136
total validation images: 1377
total test images: 1386

Let’s check out the first image from each class:

op <- par(mfrow = c(2, 4), mar = c(0.5, 0.2, 1, 0.2))
for (class in classes) {
  image_path <- list.files(file.path(train_dir, class), full.names = TRUE)[[1]]
  plot(as.raster(jpeg::readJPEG(image_path)))
  title(main = class)
}
       
par(op)

Part 2: Modeling

There are two approaches you could take to model this data:

  1. End-to-end trained CNN with your own custom convolutional layer (reference the 04-computer-vision-CNNs/02-cats-vs-dogs.Rmd file). This solution requires less code but takes 1-2 hours without GPUs. Using the same model structure in this notebook should net you about 89-90% accuracy.
  2. Apply a pre-trained model (reference the 04-computer-vision-CNNs/03-transfer- learning.Rmd file). This solution is much quicker to train but requires more code. Using the feature extraction model structure in this notebook should net you about 98% accuracy.

Leverage your neighbors’ knowledge and here are some things to think about:

Do the best you can leveraging code from the Cats vs. Dogs and Transfer Learning notebooks. Much of this code will transfer over one-for-one. If you run into problems it is ok to peak at the solution notebook but try to make this your last resort.

LS0tCnRpdGxlOiAiUHJvamVjdCAxOiBDbGFzc2lmeWluZyBOYXR1cmFsIEltYWdlcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKYGBgCgpUaGlzIHByb2plY3QgaXMgZGVzaWduZWQgdG8gdGVzdCB5b3VyIGN1cnJlbnQga25vd2xlZGdlIG9uIGFwcGx5aW5nIGEgQ05OIHRvIHRoZSAKW25hdHVyYWwgaW1hZ2VzXShodHRwczovL3d3dy5rYWdnbGUuY29tL3ByYXN1bnJveS9uYXR1cmFsLWltYWdlcykgZGF0YXNldCBvbiBLYWdnbGUuIApUaGlzIGRhdGFzZXQgY29udGFpbnMgNiw4OTkgaW1hZ2VzIGZyb20gOCBkaXN0aW5jdCBjbGFzc2VzIHRvIGluY2x1ZGUgYWlycGxhbmUsIApjYXIsIGNhdCwgZG9nLCBmbG93ZXIsIGZydWl0LCBtb3RvcmJpa2UgYW5kIHBlcnNvbi4KCllvdXIgZ29hbCBpcyB0byBkZXZlbG9wIGEgQ05OIG1vZGVsIHRvIGFjY3VyYXRlbHkgY2xhc3NpZnkgbmV3IGltYWdlcy4gVXNpbmcgb25seSAKdGhlIGtub3dsZWRnZSB5b3UndmUgZ2FpbmVkIHRodXMgZmFyLCBhbmQgcmVwdXJwb3NpbmcgY29kZSBmcm9tIHByZXZpb3VzIG1vZHVsZXMsIAp5b3Ugc2hvdWxkIGJlIGFibGUgdG8gb2J0YWluIGFuIGFjY3VyYWN5IG9mIGFwcHJveGltYXRlbHkgOTAlIG9yIGhpZ2hlci4KCl9fX0dvb2QgbHVjayFfX18KCiMjIFBhY2thZ2UgUmVxdWlyZW1lbnRzCgpEZXBlbmRpbmcgb24geW91ciBhcHByb2FjaCB5b3UgbWF5IG5lZWQgdG8gbG9hZCBtb3JlIGxpYnJhcmllcy4KCmBgYHtyfQpsaWJyYXJ5KGtlcmFzKQpgYGAKCgojIFBhcnQgMTogRGF0YSBQcmVwYXJhdGlvbgoKV2UgaGF2ZSBhbHJlYWR5IGRvd25sb2FkZWQgYW5kIG9yZ2FuaXplZCB0aGUgaW1hZ2VzIGludG8gdHJhaW4sIHZhbGlkYXRpb24sIGFuZCAKdGVzdCBkaXJlY3Rvcmllcy4KCmBgYHtyIGltYWdlLWZpbGUtcGF0aHN9CiMgZGVmaW5lIHRoZSBkaXJlY3RvcmllczoKaW1hZ2VfZGlyIDwtIGhlcmU6OmhlcmUoIm1hdGVyaWFscyIsICJkYXRhIiwgIm5hdHVyYWxfaW1hZ2VzIikKdHJhaW5fZGlyIDwtIGZpbGUucGF0aChpbWFnZV9kaXIsICJ0cmFpbiIpCnZhbGlkX2RpciA8LSBmaWxlLnBhdGgoaW1hZ2VfZGlyLCAidmFsaWRhdGlvbiIpCnRlc3RfZGlyIDwtIGZpbGUucGF0aChpbWFnZV9kaXIsICJ0ZXN0IikKYGBgCgpBcyBwcmV2aW91c2x5IG1lbnRpb25lZCwgdGhlcmUgYXJlIDggdG90YWwgY2xhc3NlcywgZWFjaCB3aXRoIGZhaXJseSBwcm9wb3J0aW9uYWwgCm51bWJlciBvZiB0cmFpbiwgdmFsaWRhdGlvbiwgYW5kIHRlc3QgaW1hZ2VzOgoKYGBge3J9CmNsYXNzZXMgPC0gbGlzdC5maWxlcyh0cmFpbl9kaXIpCnRvdGFsX3RyYWluIDwtIDAKdG90YWxfdmFsaWQgPC0gMAp0b3RhbF90ZXN0IDwtIDAKCmZvciAoY2xhc3MgaW4gY2xhc3NlcykgewogICMgaG93IG1hbnkgaW1hZ2VzIGluIGVhY2ggY2xhc3MKICBuX3RyYWluIDwtIGxlbmd0aChsaXN0LmZpbGVzKGZpbGUucGF0aCh0cmFpbl9kaXIsIGNsYXNzKSkpCiAgbl92YWxpZCA8LSBsZW5ndGgobGlzdC5maWxlcyhmaWxlLnBhdGgodmFsaWRfZGlyLCBjbGFzcykpKQogIG5fdGVzdCA8LSBsZW5ndGgobGlzdC5maWxlcyhmaWxlLnBhdGgodGVzdF9kaXIsIGNsYXNzKSkpCiAgCiAgY2F0KGNyYXlvbjo6dW5kZXJsaW5lKGNyYXlvbjo6cmVkKGNsYXNzKSksICI6ICIsIAogICAgICAidHJhaW4gKCIsIG5fdHJhaW4sICIpLCAiLCAKICAgICAgInZhbGlkICgiLCBuX3ZhbGlkLCAiKSwgIiwgCiAgICAgICJ0ZXN0ICgiLCBuX3Rlc3QsICIpIiwgIlxuIiwgc2VwID0gIiIpCiAgCiAgIyB0YWxseSB1cCB0b3RhbHMKICB0b3RhbF90cmFpbiA8LSB0b3RhbF90cmFpbiArIG5fdHJhaW4KICB0b3RhbF92YWxpZCA8LSB0b3RhbF92YWxpZCArIG5fdmFsaWQKICB0b3RhbF90ZXN0IDwtIHRvdGFsX3Rlc3QgKyBuX3Rlc3QKfQoKY2F0KCJcbiIsICJ0b3RhbCB0cmFpbmluZyBpbWFnZXM6ICIsIHRvdGFsX3RyYWluLCAiXG4iLAogICAgInRvdGFsIHZhbGlkYXRpb24gaW1hZ2VzOiAiLCB0b3RhbF92YWxpZCwgIlxuIiwKICAgICJ0b3RhbCB0ZXN0IGltYWdlczogIiwgdG90YWxfdGVzdCwgc2VwID0gIiIpCmBgYAoKTGV0J3MgY2hlY2sgb3V0IHRoZSBmaXJzdCBpbWFnZSBmcm9tIGVhY2ggY2xhc3M6CgpgYGB7ciBleGFtcGxlLWltYWdlc30Kb3AgPC0gcGFyKG1mcm93ID0gYygyLCA0KSwgbWFyID0gYygwLjUsIDAuMiwgMSwgMC4yKSkKZm9yIChjbGFzcyBpbiBjbGFzc2VzKSB7CiAgaW1hZ2VfcGF0aCA8LSBsaXN0LmZpbGVzKGZpbGUucGF0aCh0cmFpbl9kaXIsIGNsYXNzKSwgZnVsbC5uYW1lcyA9IFRSVUUpW1sxXV0KICBwbG90KGFzLnJhc3RlcihqcGVnOjpyZWFkSlBFRyhpbWFnZV9wYXRoKSkpCiAgdGl0bGUobWFpbiA9IGNsYXNzKQp9CiAgICAgICAKcGFyKG9wKQpgYGAKCiMgUGFydCAyOiBNb2RlbGluZwoKVGhlcmUgYXJlIHR3byBhcHByb2FjaGVzIHlvdSBjb3VsZCB0YWtlIHRvIG1vZGVsIHRoaXMgZGF0YToKCjEuIEVuZC10by1lbmQgdHJhaW5lZCBDTk4gd2l0aCB5b3VyIG93biBjdXN0b20gY29udm9sdXRpb25hbCBsYXllciAocmVmZXJlbmNlIAogICB0aGUgMDQtY29tcHV0ZXItdmlzaW9uLUNOTnMvMDItY2F0cy12cy1kb2dzLlJtZCBmaWxlKS4gVGhpcyBzb2x1dGlvbiByZXF1aXJlcwogICBsZXNzIGNvZGUgYnV0IHRha2VzIDEtMiBob3VycyB3aXRob3V0IEdQVXMuIFVzaW5nIHRoZSBzYW1lIG1vZGVsIHN0cnVjdHVyZQogICBpbiB0aGlzIFtub3RlYm9va10oaHR0cHM6Ly9yc3R1ZGlvLWNvbmYtMjAyMC5naXRodWIuaW8vZGwta2VyYXMtdGYvbm90ZWJvb2tzLzAyLWNhdHMtdnMtZG9ncy5uYi5odG1sKQogICBzaG91bGQgbmV0IHlvdSBhYm91dCA4OS05MCUgYWNjdXJhY3kuCjIuIEFwcGx5IGEgcHJlLXRyYWluZWQgbW9kZWwgKHJlZmVyZW5jZSB0aGUgMDQtY29tcHV0ZXItdmlzaW9uLUNOTnMvMDMtdHJhbnNmZXItCiAgIGxlYXJuaW5nLlJtZCBmaWxlKS4gVGhpcyBzb2x1dGlvbiBpcyBtdWNoIHF1aWNrZXIgdG8gdHJhaW4gYnV0IHJlcXVpcmVzIG1vcmUKICAgY29kZS4gVXNpbmcgdGhlIGZlYXR1cmUgZXh0cmFjdGlvbiBtb2RlbCBzdHJ1Y3R1cmUgaW4gdGhpcyBbbm90ZWJvb2tdKGh0dHBzOi8vcnN0dWRpby1jb25mLTIwMjAuZ2l0aHViLmlvL2RsLWtlcmFzLXRmL25vdGVib29rcy8wMy10cmFuc2Zlci1sZWFybmluZy5uYi5odG1sKQogICBzaG91bGQgbmV0IHlvdSBhYm91dCA5OCUgYWNjdXJhY3kuCgpMZXZlcmFnZSB5b3VyIG5laWdoYm9ycycga25vd2xlZGdlIGFuZCBoZXJlIGFyZSBzb21lIHRoaW5ncyB0byB0aGluayBhYm91dDoKCi0gQXJjaGl0ZWN0dXJlICYgY29tcGlsZSBzdGVwczoKICAgLSBUaGlzIGlzIGEgbXVsdGktY2xhc3MgY2xhc3NpZmljYXRpb24gcHJvYmxlbSBzbyBiZSBzdXJlIHRvIHBpY2sgdGhlCiAgICAgYWN0aXZhdGlvbiBhbmQgbG9zcyBmdW5jdGlvbiB0aGF0IGFsaWducyB0byB0aGlzIHR5cGUgb2YgcHJvYmxlbS4KICAgLSBZb3Ugd2lsbCBsaWtlbHkgbm90IGhhdmUgdGltZSB0byB0ZXN0IG91dCBtdWx0aXBsZSBtb2RlbHMgYW5kIHR1bmUKICAgICBoeXBlcnBhcmFtZXRlcnMgc28geW91IGNhbiBhc3N1bWUgdGhhdCB0aGUgZGVmYXVsdCBsZWFybmluZyByYXRlIG9yIGp1c3QKICAgICBzbGlnaHRseSBzbWFsbGVyIHdpbGwgd29yayBzdWZmaWNpZW50bHkgKGxyID0gMWUtMyBvciAxZS00KS4KLSBJbWFnZSBkYXRhIGdlbmVyYXRvcnMKICAgLSBUaGUgaW1hZ2VzIGFyZSB0cmFkaXRpb25hbCBSR0IgaW1hZ2VzIHdpdGggcGl4ZWwgdmFsdWVzIG9mIDAtMjU1CiAgIC0gUmVzaXppbmcgdGhlIGltYWdlcyB0byAxNTB4MTUwIGlzIHN1ZmZpY2llbnQKICAgLSBCZSBzdXJlIHRvIHNldCBgY2xhc3NfbW9kZSA9ICJjYXRlZ29yaWNhbCJgIGluIHRoZSBgZmxvd19pbWFnZXNfZnJvbV9kaXJlY3RvcnlgCiAgICAgZnVuY3Rpb24uCiAgIC0gQmF0Y2ggc2l6ZSBvZiAzMiBpcyBzdWZmaWNpZW50CiAgIC0gVXNlIHRoZSBgdHJhaW5fZGlyYCwgYHZhbGlkX2RpcmAsIGFuZCBgdGVzdF9kaXJgIGRpcmVjdG9yaWVzIGFib3ZlCi0gTW9kZWwgZml0dGluZwogICAtIDUwIGVwb2NocyB3aWxsIGJlIG1vcmUgdGhhbiBlbm91Z2ggYW5kLi4uIAogICAtIEJlIHN1cmUgdG8gdXNlIGFuIGVhcmx5IHN0b3BwaW5nIGNhbGxiYWNrIHRvIHNwZWVkIHVwIHRyYWluaW5nIHRpbWUKICAgLSBVc2luZyBhIGNhbGxiYWNrIHRvIGF1dG9tYXRpY2FsbHkgcmVkdWNlIHRoZSBsZWFybmluZyByYXRlIHdpbGwgaW1wcm92ZQogICAgIHBlcmZvcm1hbmNlCiAgIApEbyB0aGUgYmVzdCB5b3UgY2FuIGxldmVyYWdpbmcgY29kZSBmcm9tIHRoZSBbQ2F0cyB2cy4gRG9nc10oaHR0cHM6Ly9yc3R1ZGlvLWNvbmYtMjAyMC5naXRodWIuaW8vZGwta2VyYXMtdGYvbm90ZWJvb2tzLzAyLWNhdHMtdnMtZG9ncy5uYi5odG1sKQphbmQgW1RyYW5zZmVyIExlYXJuaW5nXShodHRwczovL3JzdHVkaW8tY29uZi0yMDIwLmdpdGh1Yi5pby9kbC1rZXJhcy10Zi9ub3RlYm9va3MvMDMtdHJhbnNmZXItbGVhcm5pbmcubmIuaHRtbCkKbm90ZWJvb2tzLiBNdWNoIG9mIHRoaXMgY29kZSB3aWxsIHRyYW5zZmVyIG92ZXIgb25lLWZvci1vbmUuIElmIHlvdSBydW4gaW50bwpwcm9ibGVtcyBpdCBpcyBvayB0byBwZWFrIGF0IHRoZSBbc29sdXRpb24gbm90ZWJvb2tdKGh0dHBzOi8vcnN0dWRpby1jb25mLTIwMjAuZ2l0aHViLmlvL2RsLWtlcmFzLXRmL25vdGVib29rcy9wcm9qZWN0MS1uYXR1cmFsLWltYWdlcy5uYi5odG1sKQpidXQgdHJ5IHRvIG1ha2UgdGhpcyB5b3VyIGxhc3QgcmVzb3J0Lgo=