A Couple Case Studies to Get Started

# A Couple Case Studies to Get Started
### Brad Boehmke
### 2020-01-27

---

---
# Vectorization & standardization

.font120.bold[_All inputs and response values in a neural network must be tensors of either 
floating-point or integer data._]

---
# Vectorization & standardization

.font120.bold[_Moreover, our feature values should not be relatively large compared to the randomized initial weights and all our features should take values in roughly the same range._]

- Values should not be significantly larger than the initial weights

- Triggers large gradient updates that will prevent the network from converging

]

- Option 1:
   - standardize between 0-1
   - easy when working with images since all features align to the same range
   
- Option 2:
   - normalize each feature to have mean of 0
   - normalize each feature to have standard deviation of 1
   - common when working with features with different ranges

]

---
# Feature engineering

* Many different feature engineering techniques to do this plus other great things

* Misperception that neural nets do not require feature engineering

]

]

---
# Feature engineering

* Many different feature engineering techniques to do this plus other great things

* Misperception that neural nets do not require feature engineering

* [_Feature Engineering and Selection: A Practical Approach for Predictive Models_](http://www.feat.engineering/) by Max Kuhn & Kjell Johnson

]

]

---
# Feature engineering

* Many different feature engineering techniques to do this plus other great things

* Misperception that neural nets do not require feature engineering

* [_Feature Engineering and Selection: A Practical Approach for Predictive Models_](http://www.feat.engineering/) by Max Kuhn & Kjell Johnson

* [_Hands-On Machine Learning with R_](https://bradleyboehmke.github.io/HOML/) by Bradley Boehmke & Brandon Greenwell

]

]

---
# Ames Example

.pull-left.code70[

```r
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
* step_nzv(all_nominal()) %>%
* step_other(all_nominal(), threshold = .01, other = "other") %>%
 step_integer(matches("(Qual|Cond|QC|Qu)$")) %>%
 step_YeoJohnson(all_numeric(), -all_outcomes()) %>%
 step_center(all_numeric(), -all_outcomes()) %>%
 step_scale(all_numeric(), -all_outcomes()) %>%
 step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)
```

]

* remove any constant categorical features
* reduce any categorical levels that show in only 1% or less of the observations to a single "other" level

]

---
# Ames Example

.pull-left.code70[

```r
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
 step_nzv(all_nominal()) %>% 
 step_other(all_nominal(), threshold = .01, other = "other") %>% 
* step_integer(matches("(Qual|Cond|QC|Qu)$")) %>%
 step_YeoJohnson(all_numeric(), -all_outcomes()) %>%
 step_center(all_numeric(), -all_outcomes()) %>%
 step_scale(all_numeric(), -all_outcomes()) %>%
 step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)
```

]

* .bold[Vectorization]: convert features that represent ordered quality metrics to numeric values
   - `Overall_Qual` has 10 Levels: Very_Poor, Poor, Fair, Below_Average, Average, ..., Very_Excellent
   - Converted to: 1, 2, 3, 4, ..., 10

]

---
# Ames Example

.pull-left.code70[

```r
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
 step_nzv(all_nominal()) %>% 
 step_other(all_nominal(), threshold = .01, other = "other") %>% 
 step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% 
* step_YeoJohnson(all_numeric(), -all_outcomes()) %>%
* step_center(all_numeric(), -all_outcomes()) %>%
* step_scale(all_numeric(), -all_outcomes()) %>%
 step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)
```

]

* .bold[Standardizes numeric values]

* Yeo-Johnson normalizes value distributions, minimizes outliers which reduces large extreme values

* Centering standardizes features to have mean of zero

* Scaling standardizes feature to have standard deviation of zero

]

---
# Ames Example

.pull-left.code70[

```r
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
 step_nzv(all_nominal()) %>% 
 step_other(all_nominal(), threshold = .01, other = "other") %>% 
 step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% 
 step_YeoJohnson(all_numeric(), -all_outcomes()) %>% 
 step_center(all_numeric(), -all_outcomes()) %>%
 step_scale(all_numeric(), -all_outcomes()) %>% 
* step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)
```

]

* .bold[Vectorize remaining categorical features]

* One-hot encoding

]

---
# Ames Example

.pull-left.code70[

```r
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
 step_nzv(all_nominal()) %>% 
 step_other(all_nominal(), threshold = .01, other = "other") %>% 
 step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% 
 step_YeoJohnson(all_numeric(), -all_outcomes()) %>% 
 step_center(all_numeric(), -all_outcomes()) %>%
 step_scale(all_numeric(), -all_outcomes()) %>% 
 step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)
```

]

]

---
# Callbacks

Training a model can be like flying a paper airplane...

...once you let go you have little control over its trajectory!

]

]

---
# Callbacks

.pull-left.font90[

When training a model, sometimes we want to:

- automatically stop a model once performance has stopped improving

- dynamically adjust values of certain parameters (i.e. learning rate)

- log model information to use or visualize later on

- continually save the model during training and save the model with the best performance

]

---
# Callbacks

.pull-left.font90[

When training a model, sometimes we want to:

- .blue[automatically stop a model once performance has stopped improving]

- .red[dynamically adjust values of certain parameters (i.e. learning rate)]

- .grey[log model information to use or visualize later on]

- .purple[continually save the model during training and save the model with the best performance]

]

.pull-right.font90[

Callbacks provide a way to control and monitor our model during training:

- .blue[`callback_early_stopping()`]

- .red[`callback_reduce_lr_on_plateau()`]

- .red[`callback_learning_rate_scheduler()`]

- .grey[`callback_csv_logger()`]

- .purple[`callback_model_checkpoint()`]

- and others (`keras::callback_xxx`)

]

---
# Controlling the size of your network

___Model capacity___ is controlled by:

* number of layers (_depth_)
* number of nodes (_width_)

.center.bold[Typically we see better performance (accuracy & compute efficiency) by increasing the number of layers moreso than nodes]

.center.font80[Rule of only!]

]

]

---
# Best practice for layers & nodes .red[(for preditive models)]

Layers are typically:

- Tunnel shaped
- Funnel shaped

For best performance:

- Nodes are powers of 2 (16, 32, 64, 128, etc.)
- Relative to number of inputs (remember layers condense our features)
- Consistent number of nodes per layer makes tuning easier
- Last hidden layer should always have more nodes than the output layer

]

]

---
# Ames Housing

* Single hidden layer with varying # of neurons

```r
## # A tibble: 9 x 3
## neurons min_loss train_time
## <dbl> <dbl> <dbl>
## 1 4 30.5 4.74
## 2 8 37.5 4.61
## 3 16 18.5 4.66
## 4 32 13.9 4.77
## 5 64 8.77 5.06
## 6 128 6.33 5.20
## 7 256 2.80 5.77
## 8 512 1.11 7.25
*## 9 1024 0.139 8.75
```

]

* Varying # of hidden layers with 128 neurons per layer

```r
## # A tibble: 8 x 3
## nlayers min_loss train_time
## <int> <dbl> <dbl>
## 1 1 5.959 6.88
## 2 2 0.824 4.32
## 3 3 0.609 4.32
## 4 4 0.750 4.54
## 5 5 0.665 4.36
## 6 6 0.998 4.43
## 7 7 0.802 4.45
## 8 8 0.603 4.78
```

]

---
class: clear, center, middle

background-image: url(https://www.elitereaders.com/wp-content/uploads/2016/04/worst-movie-reviews-featured.jpg)
background-size: cover
---
# IMDB data set

]

* A  collection of 50,000 reviews from IMDB on the condition there are no more than 30 reviews per movie.

* The numbers of positive and negative reviews are equal. 
   - .red[Negative] reviews: score `$\leq$` 4 out of 10
   - .green[Positive] reviews: score `$\geq$` 7 out of 10
   - Neutral reviews are not included. 
   
* The 50,000 reviews are divided evenly into the training and test set.

]

---
# Vectorizing text

```r
str(train_data)
## List of 25000
##  $ : chr [1:218] "?" "this" "film" "was" ...
##  $ : chr [1:189] "?" "big" "hair" "big" ...
##  $ : chr [1:141] "?" "this" "has" "to" ...
##  $ : chr [1:550] "?" "the" "?" "?" ...
##  $ : chr [1:147] "?" "worst" "mistake" "of" ...
##  $ : chr [1:43] "?" "begins" "better" "than" ...
##  $ : chr [1:123] "?" "lavish" "production" "values" ...
##  $ : chr [1:562] "?" "the" "?" "tells" ...
##  $ : chr [1:233] "?" "just" "got" "out" ...
##  $ : chr [1:130] "?" "this" "movie" "has" ...
##  $ : chr [1:450] "?" "french" "horror" "cinema" ...
##  $ : chr [1:99] "?" "when" "i" "rented" ...
##  $ : chr [1:117] "?" "i" "love" "cheesy" ...
##  $ : chr [1:238] "?" "anyone" "who" "could" ...
##  $ : chr [1:109] "?" "b" "movie" "at" ...
##  $ : chr [1:129] "?" "a" "total" "waste" ...
##  $ : chr [1:163] "?" "laputa" "castle" "in" ...
##  $ : chr [1:752] "?" "at" "the" "height" ...
##  $ : chr [1:212] "?" "i" "have" "only" ...
##  $ : chr [1:177] "?" "chances" "are" "is" ...
##  $ : chr [1:129] "?" "shown" "in" "australia" ...
##  $ : chr [1:140] "?" "despite" "some" "occasionally" ...
##  $ : chr [1:256] "?" "i" "hate" "reading" ...
##  $ : chr [1:888] "?" "the" "problems" "with" ...
##  $ : chr [1:93] "?" "the" "original" "demille" ...
##  $ : chr [1:142] "?" "this" "is" "a" ...
##  $ : chr [1:220] "?" "the" "dvd" "version" ...
##  $ : chr [1:193] "?" "we" "had" "to" ...
##  $ : chr [1:171] "?" "bela" "lugosi" "appeared" ...
##  $ : chr [1:221] "?" "i" "have" "read" ...
##  $ : chr [1:174] "?" "i" "caught" "this" ...
##  $ : chr [1:647] "?" "a" "strong" "woman" ...
##  $ : chr [1:233] "?" "the" "first" "episode" ...
##  $ : chr [1:162] "?" "i" "don't" "know" ...
##  $ : chr [1:597] "?" "many" "have" "stated" ...
##  $ : chr [1:234] "?" "detective" "tony" "rome" ...
##  $ : chr [1:51] "?" "sorry" "i" "just" ...
##  $ : chr [1:336] "?" "on" "24" "october" ...
##  $ : chr [1:139] "?" "as" "with" "many" ...
##  $ : chr [1:231] "?" "this" "so" "called" ...
##  $ : chr [1:704] "?" "this" "was" "the" ...
##  $ : chr [1:142] "?" "this" "film" "was" ...
##  $ : chr [1:861] "?" "warning" "this" "review" ...
##  $ : chr [1:132] "?" "the" "script" "for" ...
##  $ : chr [1:122] "?" "hamlet" "is" "by" ...
##  $ : chr [1:570] "?" "in" "this" "swimming" ...
##  $ : chr [1:55] "?" "police" "story" "is" ...
##  $ : chr [1:214] "?" "sorry" "but" "i" ...
##  $ : chr [1:103] "?" "when" "philo" "vance" ...
##  $ : chr [1:186] "?" "i" "am" "rarely" ...
##  $ : chr [1:113] "?" "i" "actually" "saw" ...
##  $ : chr [1:169] "?" "before" "i" "watched" ...
##  $ : chr [1:469] "?" "this" "low" "grade" ...
##  $ : chr [1:138] "?" "hey" "look" "deal" ...
##  $ : chr [1:302] "?" "any" "movie" "in" ...
##  $ : chr [1:766] "?" "kubrick" "meets" "king" ...
##  $ : chr [1:351] "?" "wow" "i" "can't" ...
##  $ : chr [1:146] "?" "i" "admit" "i" ...
##  $ : chr [1:59] "?" "i" "watched" "the" ...
##  $ : chr [1:206] "?" "without" "question" "this" ...
##  $ : chr [1:107] "?" "i" "saw" "this" ...
##  $ : chr [1:152] "?" "please" "don't" "rent" ...
##  $ : chr [1:186] "?" "i" "found" "this" ...
##  $ : chr [1:431] "?" "the" "sea" "is" ...
##  $ : chr [1:147] "?" "i'm" "probably" "one" ...
##  $ : chr [1:684] "?" "dressed" "to" "kill" ...
##  $ : chr [1:383] "?" "spoilers" "in" "first" ...
##  $ : chr [1:324] "?" "?" "clutter" "wife" ...
##  $ : chr [1:252] "?" "i" "wasn't" "expecting" ...
##  $ : chr [1:263] "?" "if" "you're" "a" ...
##  $ : chr [1:787] "?" "a" "stunning" "realization" ...
##  $ : chr [1:211] "?" "this" "movie" "isn't" ...
##  $ : chr [1:314] "?" "some" "not" "so" ...
##  $ : chr [1:118] "?" "amazing" "effects" "for" ...
##  $ : chr [1:390] "?" "robert" "altman" "shouldn't" ...
##  $ : chr [1:132] "?" "tim" "robbins" "is" ...
##  $ : chr [1:710] "?" "'the" "adventures" "of" ...
##  $ : chr [1:306] "?" "as" "always" "this" ...
##  $ : chr [1:167] "?" "whoever" "wrote" "the" ...
##  $ : chr [1:115] "?" "in" "the" "early" ...
##  $ : chr [1:95] "?" "went" "to" "see" ...
##  $ : chr [1:158] "?" "one" "of" "my" ...
##  $ : chr [1:156] "?" "really" "truly" "?" ...
##  $ : chr [1:82] "?" "the" "biggest" "reason" ...
##  $ : chr [1:502] "?" "this" "film" "screened" ...
##  $ : chr [1:314] "?" "i" "should" "explain" ...
##  $ : chr [1:190] "?" "i" "don't" "often" ...
##  $ : chr [1:174] "?" "my" "comments" "may" ...
##  $ : chr [1:60] "?" "for" "a" "movie" ...
##  $ : chr [1:145] "?" "i" "saw" "this" ...
##  $ : chr [1:214] "?" "in" "fact" "marc" ...
##  $ : chr [1:659] "?" "blood" "legacy" "starts" ...
##  $ : chr [1:408] "?" "i'd" "really" "really" ...
##  $ : chr [1:515] "?" "tashan" "the" "title" ...
##  $ : chr [1:461] "?" "where" "do" "i" ...
##  $ : chr [1:202] "?" "i" "really" "must" ...
##  $ : chr [1:238] "?" "tom" "?" "?" ...
##  $ : chr [1:170] "?" "not" "that" "much" ...
##  $ : chr [1:107] "?" "this" "movie" "is" ...
##   [list output truncated]
```

]

```r
train_data_vec[, 1:10]
##          pad start unknown the and a of to is br
##     [1,]   1     1       0   1   1 1  1  1  1  0
##     [2,]   1     1       0   1   1 1  1  1  1  0
##     [3,]   1     1       0   1   0 1  1  1  1  0
##     [4,]   1     1       0   1   1 1  1  1  1  1
##     [5,]   1     1       0   1   1 1  1  1  0  1
##     [6,]   1     1       0   1   0 0  0  1  0  1
##     [7,]   1     1       0   1   1 1  1  1  1  0
##     [8,]   1     1       0   1   0 1  1  1  1  1
##     [9,]   1     1       0   1   1 1  1  1  1  0
##    [10,]   1     1       0   1   1 1  1  1  1  1
##    [11,]   1     1       0   1   1 1  1  1  1  1
##    [12,]   1     0       0   1   1 1  1  0  1  0
##    [13,]   1     1       0   1   1 1  1  1  1  1
##    [14,]   1     1       0   1   1 1  1  1  1  0
##    [15,]   1     1       0   1   1 1  0  1  1  1
##    [16,]   1     1       0   1   1 1  1  1  0  0
##    [17,]   1     1       0   1   1 1  1  1  1  0
##    [18,]   1     1       0   1   1 1  1  1  1  1
##    [19,]   1     1       0   1   1 1  1  1  1  0
##    [20,]   1     1       0   1   1 1  1  1  1  0
##    [21,]   1     1       0   1   1 1  1  1  1  1
##    [22,]   1     1       0   1   1 1  1  1  1  1
##    [23,]   1     1       0   1   1 1  1  1  1  1
##    [24,]   1     1       0   1   1 1  1  1  1  1
##    [25,]   1     1       0   1   0 1  1  1  1  0
##    [26,]   1     1       0   1   1 1  1  1  1  0
##    [27,]   1     1       0   1   1 1  1  1  1  1
##    [28,]   1     1       0   1   1 1  1  1  1  1
##    [29,]   1     1       0   1   1 1  1  1  1  1
##    [30,]   1     1       0   1   1 1  1  1  1  0
##    [31,]   1     1       0   1   1 1  1  1  1  1
##    [32,]   1     1       0   1   1 1  1  1  1  1
##    [33,]   1     1       0   1   1 1  1  1  1  0
##    [34,]   1     1       0   1   1 1  1  1  1  0
##    [35,]   1     1       0   1   1 1  1  1  1  1
##    [36,]   1     1       0   1   1 1  1  1  1  0
##    [37,]   1     0       0   1   1 1  0  0  0  0
##    [38,]   1     1       0   1   1 1  1  1  1  1
##    [39,]   1     1       0   1   1 1  1  1  1  1
##    [40,]   1     1       0   1   1 1  1  1  1  0
##    [41,]   1     1       0   1   1 1  1  1  1  1
##    [42,]   1     1       0   1   1 1  1  1  1  1
##    [43,]   1     1       0   1   1 1  1  1  1  1
##    [44,]   1     1       0   1   1 1  1  1  1  0
##    [45,]   1     1       0   1   1 1  1  1  1  0
##    [46,]   1     1       0   1   1 1  1  1  1  1
##    [47,]   1     1       0   1   1 0  1  1  1  0
##    [48,]   1     1       0   1   1 1  1  1  1  1
##    [49,]   1     1       0   1   1 1  1  1  1  1
##    [50,]   1     1       0   1   1 1  1  1  1  0
##    [51,]   1     1       0   1   1 1  0  1  0  0
##    [52,]   1     1       0   1   1 1  1  1  1  1
##    [53,]   1     1       0   1   1 1  1  1  1  0
##    [54,]   1     1       0   1   1 1  1  1  1  1
##    [55,]   1     1       0   1   1 1  1  1  1  0
##    [56,]   1     1       0   1   1 1  1  1  1  1
##    [57,]   1     1       0   1   1 1  1  1  1  1
##    [58,]   1     1       0   1   1 1  0  1  1  1
##    [59,]   1     0       0   1   1 1  1  1  1  0
##    [60,]   1     1       0   1   1 0  1  1  1  1
##    [61,]   1     1       0   1   1 1  1  1  0  1
##    [62,]   1     1       0   1   1 1  1  1  1  0
##    [63,]   1     1       0   1   1 1  1  1  1  0
##    [64,]   1     1       0   1   1 1  1  1  1  1
##    [65,]   1     0       0   1   1 1  1  1  1  0
##    [66,]   1     1       0   1   1 1  1  1  1  1
##    [67,]   1     1       0   1   1 1  1  1  1  1
##    [68,]   1     1       0   1   1 1  1  1  1  1
##    [69,]   1     1       0   1   1 1  1  1  1  1
##    [70,]   1     1       0   1   1 1  1  1  1  1
##    [71,]   1     1       0   1   1 1  1  1  1  1
##    [72,]   1     1       0   1   1 1  1  1  1  0
##    [73,]   1     1       0   1   1 1  1  1  1  1
##    [74,]   1     1       0   1   1 1  1  1  1  1
##    [75,]   1     1       0   1   1 1  1  1  1  1
##    [76,]   1     1       0   1   1 1  1  1  1  0
##    [77,]   1     1       0   1   1 1  1  1  1  1
##    [78,]   1     1       0   1   1 1  1  1  1  0
##    [79,]   1     1       0   1   1 1  1  1  1  0
##    [80,]   1     1       0   1   1 1  1  1  1  1
##    [81,]   1     1       0   1   1 1  1  1  1  0
##    [82,]   1     1       0   1   1 1  1  0  1  0
##    [83,]   1     1       0   1   1 1  1  1  0  1
##    [84,]   1     1       0   1   1 1  0  1  0  0
##    [85,]   1     1       0   1   1 1  1  1  1  1
##    [86,]   1     1       0   1   1 1  1  1  1  1
##    [87,]   1     1       0   1   1 1  1  1  1  0
##    [88,]   1     1       0   1   1 1  1  1  1  1
##    [89,]   1     1       0   1   1 1  0  1  0  0
##    [90,]   1     1       0   1   1 1  1  1  1  0
##    [91,]   1     1       0   1   1 1  1  1  1  1
##    [92,]   1     1       0   1   0 1  1  1  1  1
##    [93,]   1     1       0   1   1 1  1  1  1  1
##    [94,]   1     1       0   1   1 1  1  1  1  1
##    [95,]   1     1       0   1   1 1  1  1  1  1
##    [96,]   1     1       0   1   1 1  1  1  1  0
##    [97,]   1     1       0   1   1 1  1  1  1  1
##    [98,]   1     1       0   1   1 1  1  1  1  1
##    [99,]   1     1       0   1   1 1  1  0  1  1
##   [100,]   1     1       0   1   1 1  1  1  1  0
##  [ reached getOption("max.print") -- omitted 24900 rows ]
```

]

---
# Weight regularization

* Regular loss function: `$\text{minimize} \left( MSE = \frac{1}{n} \sum^n_{i=1}(Y_i - \hat Y_i)^2 \right)$`

* Weight regularization: `$\text{minimize} \left( SSE + P \right)$`

* `$\text{L}^2$` norm (aka _weight decay_): `$\text{minimize } \left( MSE + \lambda \sum^p_{j=1} w_j^2 \right)$`

---
# Weight regularization

* Regular loss function: `$\text{minimize} \left( MSE = \frac{1}{n} \sum^n_{i=1}(Y_i - \hat Y_i)^2 \right)$`

* Weight regularization: `$\text{minimize} \left( SSE + P \right)$`

* `$\text{L}^2$` norm (aka _weight decay_): `$\text{minimize } \left( MSE + \lambda \sum^p_{j=1} w_j^2 \right)$`

* `$\text{L}^1$` norm: `$\text{minimize } \left( MSE + \lambda \sum^p_{j=1} | w_j | \right)$`

---
# Dropout

* During each run, each neuron has *p* probability of being dropped (set to 0)

* Forces more nodes to identify and relay signal in the data

* Helps minimize the model from latching onto happenstance patterns (noise) that are not significant

]

]

---
# Back home

[.center[]](https://github.com/rstudio-conf-2020/dl-keras-tf)