class: center, middle, inverse, title-slide # A Couple Case Studies to Get Started ### Brad Boehmke ### 2020-01-27 --- class: clear, center, middle background-image: url(images/home4sale.jpg) background-size: cover --- # Vectorization & standardization .font120.bold[_All inputs and response values in a neural network must be tensors of either floating-point or integer data._] <img src="images/vectorization.png" width="960" style="display: block; margin: auto;" /> --- # Vectorization & standardization .font120.bold[_Moreover, our feature values should not be relatively large compared to the randomized initial weights <u>and</u> all our features should take values in roughly the same range._] .pull-left[ - Values should not be significantly larger than the initial weights - Triggers large gradient updates that will prevent the network from converging ] -- .pull-right[ - Option 1: - standardize between 0-1 - easy when working with images since all features align to the same range - Option 2: - normalize each feature to have mean of 0 - normalize each feature to have standard deviation of 1 - common when working with features with different ranges ] --- # Feature engineering .pull-left[ * Many different feature engineering techniques to do this plus other great things * Misperception that neural nets do not require feature engineering ] .pull-right[ ] --- # Feature engineering .pull-left[ * Many different feature engineering techniques to do this plus other great things * Misperception that neural nets do not require feature engineering * [_Feature Engineering and Selection: A Practical Approach for Predictive Models_](http://www.feat.engineering/) by Max Kuhn & Kjell Johnson ] .pull-right[ <img src="https://images.tandf.co.uk/common/jackets/amazon/978113807/9781138079229.jpg" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- # Feature engineering .pull-left[ * Many different feature engineering techniques to do this plus other great things * Misperception that neural nets do not require feature engineering * [_Feature Engineering and Selection: A Practical Approach for Predictive Models_](http://www.feat.engineering/) by Max Kuhn & Kjell Johnson * [_Hands-On Machine Learning with R_](https://bradleyboehmke.github.io/HOML/) by Bradley Boehmke & Brandon Greenwell ] .pull-right[ <img src="https://bradleyboehmke.github.io/HOML/images/homl-cover.jpg" width="60%" height="60%" style="display: block; margin: auto;" /> ] --- # Ames Example .pull-left.code70[ ```r blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% * step_nzv(all_nominal()) %>% * step_other(all_nominal(), threshold = .01, other = "other") %>% step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% step_YeoJohnson(all_numeric(), -all_outcomes()) %>% step_center(all_numeric(), -all_outcomes()) %>% step_scale(all_numeric(), -all_outcomes()) %>% step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) ``` ] .pull-right[ * remove any constant categorical features * reduce any categorical levels that show in only 1% or less of the observations to a single "other" level ] --- # Ames Example .pull-left.code70[ ```r blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% step_nzv(all_nominal()) %>% step_other(all_nominal(), threshold = .01, other = "other") %>% * step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% step_YeoJohnson(all_numeric(), -all_outcomes()) %>% step_center(all_numeric(), -all_outcomes()) %>% step_scale(all_numeric(), -all_outcomes()) %>% step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) ``` ] .pull-right[ * .bold[Vectorization]: convert features that represent ordered quality metrics to numeric values - `Overall_Qual` has 10 Levels: Very_Poor, Poor, Fair, Below_Average, Average, ..., Very_Excellent - Converted to: 1, 2, 3, 4, ..., 10 ] --- # Ames Example .pull-left.code70[ ```r blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% step_nzv(all_nominal()) %>% step_other(all_nominal(), threshold = .01, other = "other") %>% step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% * step_YeoJohnson(all_numeric(), -all_outcomes()) %>% * step_center(all_numeric(), -all_outcomes()) %>% * step_scale(all_numeric(), -all_outcomes()) %>% step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) ``` ] .pull-right[ * .bold[Standardizes numeric values] * Yeo-Johnson normalizes value distributions, minimizes outliers which reduces large extreme values * Centering standardizes features to have mean of zero * Scaling standardizes feature to have standard deviation of zero <img src="03-case-studies_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- # Ames Example .pull-left.code70[ ```r blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% step_nzv(all_nominal()) %>% step_other(all_nominal(), threshold = .01, other = "other") %>% step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% step_YeoJohnson(all_numeric(), -all_outcomes()) %>% step_center(all_numeric(), -all_outcomes()) %>% step_scale(all_numeric(), -all_outcomes()) %>% * step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) ``` ] .pull-right[ * .bold[Vectorize remaining categorical features] * One-hot encoding <img src="https://bradleyboehmke.github.io/HOML/images/ohe-vs-dummy.png" style="display: block; margin: auto;" /> ] --- # Ames Example .pull-left.code70[ ```r blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% step_nzv(all_nominal()) %>% step_other(all_nominal(), threshold = .01, other = "other") %>% step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% step_YeoJohnson(all_numeric(), -all_outcomes()) %>% step_center(all_numeric(), -all_outcomes()) %>% step_scale(all_numeric(), -all_outcomes()) %>% step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) ``` ] .pull-right[ <img src="images/recipes.png" width="50%" height="50%" style="display: block; margin: auto;" /> .center[[https://tidymodels.github.io/recipes](https://tidymodels.github.io/recipes/)] ] --- # Callbacks .pull-left[ Training a model can be like flying a paper airplane... <br><br> ...once you let go you have little control over its trajectory! ] .pull-right[ <img src="https://media2.giphy.com/media/zMS612WWVzQPu/source.gif" width="80%" height="80%" style="display: block; margin: auto;" /> ] --- # Callbacks .pull-left.font90[ When training a model, sometimes we want to: <br> - automatically stop a model once performance has stopped improving - dynamically adjust values of certain parameters (i.e. learning rate) - log model information to use or visualize later on - continually save the model during training and save the model with the best performance .center[_These tasks and others can help control the trajectory of our model._] ] --- # Callbacks .pull-left.font90[ When training a model, sometimes we want to: <br> - .blue[automatically stop a model once performance has stopped improving] - .red[dynamically adjust values of certain parameters (i.e. learning rate)] - .grey[log model information to use or visualize later on] - .purple[continually save the model during training and save the model with the best performance] .center[_These tasks and others can help control the trajectory of our model._] ] .pull-right.font90[ Callbacks provide a way to control and monitor our model during training: <br> - .blue[`callback_early_stopping()`] - .red[`callback_reduce_lr_on_plateau()`] - .red[`callback_learning_rate_scheduler()`] - .grey[`callback_csv_logger()`] - .purple[`callback_model_checkpoint()`] - and others (`keras::callback_xxx`) ] --- # Controlling the size of your network .pull-left[ ___Model capacity___ is controlled by: * number of layers (_depth_) * number of nodes (_width_) <br><br><br> .center.bold[Typically we see better performance (accuracy & compute efficiency) by increasing the number of layers moreso than nodes] .center.font80[Rule of
<i class="fas fa-thumbs-up faa-FALSE animated "></i>
only!] ] .pull-right[ <img src="images/model_capacity_depth.png" width="90%" height="90%" style="display: block; margin: auto;" /><img src="images/model_capacity_width.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- # Best practice for layers & nodes .red[(for preditive models)] .pull-left[ Layers are typically: - Tunnel shaped - Funnel shaped For best performance: - Nodes are powers of 2 (16, 32, 64, 128, etc.) - Relative to number of inputs (remember layers condense our features) - Consistent number of nodes per layer makes tuning easier - Last hidden layer should always have more nodes than the output layer ] .pull-right[ <img src="images/model_capacity_depth.png" width="90%" height="90%" style="display: block; margin: auto;" /><img src="images/model_capacity_funnel.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- # Ames Housing .pull-left[ * Single hidden layer with varying # of neurons ```r ## # A tibble: 9 x 3 ## neurons min_loss train_time ## <dbl> <dbl> <dbl> ## 1 4 30.5 4.74 ## 2 8 37.5 4.61 ## 3 16 18.5 4.66 ## 4 32 13.9 4.77 ## 5 64 8.77 5.06 ## 6 128 6.33 5.20 ## 7 256 2.80 5.77 ## 8 512 1.11 7.25 *## 9 1024 0.139 8.75 ``` ] .pull-right[ * Varying # of hidden layers with 128 neurons per layer ```r ## # A tibble: 8 x 3 ## nlayers min_loss train_time ## <int> <dbl> <dbl> ## 1 1 5.959 6.88 ## 2 2 0.824 4.32 ## 3 3 0.609 4.32 ## 4 4 0.750 4.54 ## 5 5 0.665 4.36 ## 6 6 0.998 4.43 ## 7 7 0.802 4.45 ## 8 8 0.603 4.78 ``` ] --- class: clear, center, middle background-image: url(https://www.elitereaders.com/wp-content/uploads/2016/04/worst-movie-reviews-featured.jpg) background-size: cover --- # IMDB data set .pull-left[ <img src="https://www.wikihow.com/images/thumb/4/46/Prepare-a-Review-on-IMDb-Step-6-Version-2.jpg/aid2512841-v4-728px-Prepare-a-Review-on-IMDb-Step-6-Version-2.jpg" style="display: block; margin: auto;" /> ] .pull-right[ * A collection of 50,000 reviews from IMDB on the condition there are no more than 30 reviews per movie. * The numbers of positive and negative reviews are equal. - .red[Negative] reviews: score `\(\leq\)` 4 out of 10 - .green[Positive] reviews: score `\(\geq\)` 7 out of 10 - Neutral reviews are not included. * The 50,000 reviews are divided evenly into the training and test set. ] --- # Vectorizing text .pull-left[ .bold[Before vectorization] `\(\rightarrow\)` list of integers representing words ```r str(train_data) ## List of 25000 ## $ : chr [1:218] "?" "this" "film" "was" ... ## $ : chr [1:189] "?" "big" "hair" "big" ... ## $ : chr [1:141] "?" "this" "has" "to" ... ## $ : chr [1:550] "?" "the" "?" "?" ... ## $ : chr [1:147] "?" "worst" "mistake" "of" ... ## $ : chr [1:43] "?" "begins" "better" "than" ... ## $ : chr [1:123] "?" "lavish" "production" "values" ... ## $ : chr [1:562] "?" "the" "?" "tells" ... ## $ : chr [1:233] "?" "just" "got" "out" ... ## $ : chr [1:130] "?" "this" "movie" "has" ... ## $ : chr [1:450] "?" "french" "horror" "cinema" ... ## $ : chr [1:99] "?" "when" "i" "rented" ... ## $ : chr [1:117] "?" "i" "love" "cheesy" ... ## $ : chr [1:238] "?" "anyone" "who" "could" ... ## $ : chr [1:109] "?" "b" "movie" "at" ... ## $ : chr [1:129] "?" "a" "total" "waste" ... ## $ : chr [1:163] "?" "laputa" "castle" "in" ... ## $ : chr [1:752] "?" "at" "the" "height" ... ## $ : chr [1:212] "?" "i" "have" "only" ... ## $ : chr [1:177] "?" "chances" "are" "is" ... ## $ : chr [1:129] "?" "shown" "in" "australia" ... ## $ : chr [1:140] "?" "despite" "some" "occasionally" ... ## $ : chr [1:256] "?" "i" "hate" "reading" ... ## $ : chr [1:888] "?" "the" "problems" "with" ... ## $ : chr [1:93] "?" "the" "original" "demille" ... ## $ : chr [1:142] "?" "this" "is" "a" ... ## $ : chr [1:220] "?" "the" "dvd" "version" ... ## $ : chr [1:193] "?" "we" "had" "to" ... ## $ : chr [1:171] "?" "bela" "lugosi" "appeared" ... ## $ : chr [1:221] "?" "i" "have" "read" ... ## $ : chr [1:174] "?" "i" "caught" "this" ... ## $ : chr [1:647] "?" "a" "strong" "woman" ... ## $ : chr [1:233] "?" "the" "first" "episode" ... ## $ : chr [1:162] "?" "i" "don't" "know" ... ## $ : chr [1:597] "?" "many" "have" "stated" ... ## $ : chr [1:234] "?" "detective" "tony" "rome" ... ## $ : chr [1:51] "?" "sorry" "i" "just" ... ## $ : chr [1:336] "?" "on" "24" "october" ... ## $ : chr [1:139] "?" "as" "with" "many" ... ## $ : chr [1:231] "?" "this" "so" "called" ... ## $ : chr [1:704] "?" "this" "was" "the" ... ## $ : chr [1:142] "?" "this" "film" "was" ... ## $ : chr [1:861] "?" "warning" "this" "review" ... ## $ : chr [1:132] "?" "the" "script" "for" ... ## $ : chr [1:122] "?" "hamlet" "is" "by" ... ## $ : chr [1:570] "?" "in" "this" "swimming" ... ## $ : chr [1:55] "?" "police" "story" "is" ... ## $ : chr [1:214] "?" "sorry" "but" "i" ... ## $ : chr [1:103] "?" "when" "philo" "vance" ... ## $ : chr [1:186] "?" "i" "am" "rarely" ... ## $ : chr [1:113] "?" "i" "actually" "saw" ... ## $ : chr [1:169] "?" "before" "i" "watched" ... ## $ : chr [1:469] "?" "this" "low" "grade" ... ## $ : chr [1:138] "?" "hey" "look" "deal" ... ## $ : chr [1:302] "?" "any" "movie" "in" ... ## $ : chr [1:766] "?" "kubrick" "meets" "king" ... ## $ : chr [1:351] "?" "wow" "i" "can't" ... ## $ : chr [1:146] "?" "i" "admit" "i" ... ## $ : chr [1:59] "?" "i" "watched" "the" ... ## $ : chr [1:206] "?" "without" "question" "this" ... ## $ : chr [1:107] "?" "i" "saw" "this" ... ## $ : chr [1:152] "?" "please" "don't" "rent" ... ## $ : chr [1:186] "?" "i" "found" "this" ... ## $ : chr [1:431] "?" "the" "sea" "is" ... ## $ : chr [1:147] "?" "i'm" "probably" "one" ... ## $ : chr [1:684] "?" "dressed" "to" "kill" ... ## $ : chr [1:383] "?" "spoilers" "in" "first" ... ## $ : chr [1:324] "?" "?" "clutter" "wife" ... ## $ : chr [1:252] "?" "i" "wasn't" "expecting" ... ## $ : chr [1:263] "?" "if" "you're" "a" ... ## $ : chr [1:787] "?" "a" "stunning" "realization" ... ## $ : chr [1:211] "?" "this" "movie" "isn't" ... ## $ : chr [1:314] "?" "some" "not" "so" ... ## $ : chr [1:118] "?" "amazing" "effects" "for" ... ## $ : chr [1:390] "?" "robert" "altman" "shouldn't" ... ## $ : chr [1:132] "?" "tim" "robbins" "is" ... ## $ : chr [1:710] "?" "'the" "adventures" "of" ... ## $ : chr [1:306] "?" "as" "always" "this" ... ## $ : chr [1:167] "?" "whoever" "wrote" "the" ... ## $ : chr [1:115] "?" "in" "the" "early" ... ## $ : chr [1:95] "?" "went" "to" "see" ... ## $ : chr [1:158] "?" "one" "of" "my" ... ## $ : chr [1:156] "?" "really" "truly" "?" ... ## $ : chr [1:82] "?" "the" "biggest" "reason" ... ## $ : chr [1:502] "?" "this" "film" "screened" ... ## $ : chr [1:314] "?" "i" "should" "explain" ... ## $ : chr [1:190] "?" "i" "don't" "often" ... ## $ : chr [1:174] "?" "my" "comments" "may" ... ## $ : chr [1:60] "?" "for" "a" "movie" ... ## $ : chr [1:145] "?" "i" "saw" "this" ... ## $ : chr [1:214] "?" "in" "fact" "marc" ... ## $ : chr [1:659] "?" "blood" "legacy" "starts" ... ## $ : chr [1:408] "?" "i'd" "really" "really" ... ## $ : chr [1:515] "?" "tashan" "the" "title" ... ## $ : chr [1:461] "?" "where" "do" "i" ... ## $ : chr [1:202] "?" "i" "really" "must" ... ## $ : chr [1:238] "?" "tom" "?" "?" ... ## $ : chr [1:170] "?" "not" "that" "much" ... ## $ : chr [1:107] "?" "this" "movie" "is" ... ## [list output truncated] ``` ] .pull-right[ .bold[After vectorization] `\(\rightarrow\)` 2D Tensor of one-hot encoded words ```r train_data_vec[, 1:10] ## pad start unknown the and a of to is br ## [1,] 1 1 0 1 1 1 1 1 1 0 ## [2,] 1 1 0 1 1 1 1 1 1 0 ## [3,] 1 1 0 1 0 1 1 1 1 0 ## [4,] 1 1 0 1 1 1 1 1 1 1 ## [5,] 1 1 0 1 1 1 1 1 0 1 ## [6,] 1 1 0 1 0 0 0 1 0 1 ## [7,] 1 1 0 1 1 1 1 1 1 0 ## [8,] 1 1 0 1 0 1 1 1 1 1 ## [9,] 1 1 0 1 1 1 1 1 1 0 ## [10,] 1 1 0 1 1 1 1 1 1 1 ## [11,] 1 1 0 1 1 1 1 1 1 1 ## [12,] 1 0 0 1 1 1 1 0 1 0 ## [13,] 1 1 0 1 1 1 1 1 1 1 ## [14,] 1 1 0 1 1 1 1 1 1 0 ## [15,] 1 1 0 1 1 1 0 1 1 1 ## [16,] 1 1 0 1 1 1 1 1 0 0 ## [17,] 1 1 0 1 1 1 1 1 1 0 ## [18,] 1 1 0 1 1 1 1 1 1 1 ## [19,] 1 1 0 1 1 1 1 1 1 0 ## [20,] 1 1 0 1 1 1 1 1 1 0 ## [21,] 1 1 0 1 1 1 1 1 1 1 ## [22,] 1 1 0 1 1 1 1 1 1 1 ## [23,] 1 1 0 1 1 1 1 1 1 1 ## [24,] 1 1 0 1 1 1 1 1 1 1 ## [25,] 1 1 0 1 0 1 1 1 1 0 ## [26,] 1 1 0 1 1 1 1 1 1 0 ## [27,] 1 1 0 1 1 1 1 1 1 1 ## [28,] 1 1 0 1 1 1 1 1 1 1 ## [29,] 1 1 0 1 1 1 1 1 1 1 ## [30,] 1 1 0 1 1 1 1 1 1 0 ## [31,] 1 1 0 1 1 1 1 1 1 1 ## [32,] 1 1 0 1 1 1 1 1 1 1 ## [33,] 1 1 0 1 1 1 1 1 1 0 ## [34,] 1 1 0 1 1 1 1 1 1 0 ## [35,] 1 1 0 1 1 1 1 1 1 1 ## [36,] 1 1 0 1 1 1 1 1 1 0 ## [37,] 1 0 0 1 1 1 0 0 0 0 ## [38,] 1 1 0 1 1 1 1 1 1 1 ## [39,] 1 1 0 1 1 1 1 1 1 1 ## [40,] 1 1 0 1 1 1 1 1 1 0 ## [41,] 1 1 0 1 1 1 1 1 1 1 ## [42,] 1 1 0 1 1 1 1 1 1 1 ## [43,] 1 1 0 1 1 1 1 1 1 1 ## [44,] 1 1 0 1 1 1 1 1 1 0 ## [45,] 1 1 0 1 1 1 1 1 1 0 ## [46,] 1 1 0 1 1 1 1 1 1 1 ## [47,] 1 1 0 1 1 0 1 1 1 0 ## [48,] 1 1 0 1 1 1 1 1 1 1 ## [49,] 1 1 0 1 1 1 1 1 1 1 ## [50,] 1 1 0 1 1 1 1 1 1 0 ## [51,] 1 1 0 1 1 1 0 1 0 0 ## [52,] 1 1 0 1 1 1 1 1 1 1 ## [53,] 1 1 0 1 1 1 1 1 1 0 ## [54,] 1 1 0 1 1 1 1 1 1 1 ## [55,] 1 1 0 1 1 1 1 1 1 0 ## [56,] 1 1 0 1 1 1 1 1 1 1 ## [57,] 1 1 0 1 1 1 1 1 1 1 ## [58,] 1 1 0 1 1 1 0 1 1 1 ## [59,] 1 0 0 1 1 1 1 1 1 0 ## [60,] 1 1 0 1 1 0 1 1 1 1 ## [61,] 1 1 0 1 1 1 1 1 0 1 ## [62,] 1 1 0 1 1 1 1 1 1 0 ## [63,] 1 1 0 1 1 1 1 1 1 0 ## [64,] 1 1 0 1 1 1 1 1 1 1 ## [65,] 1 0 0 1 1 1 1 1 1 0 ## [66,] 1 1 0 1 1 1 1 1 1 1 ## [67,] 1 1 0 1 1 1 1 1 1 1 ## [68,] 1 1 0 1 1 1 1 1 1 1 ## [69,] 1 1 0 1 1 1 1 1 1 1 ## [70,] 1 1 0 1 1 1 1 1 1 1 ## [71,] 1 1 0 1 1 1 1 1 1 1 ## [72,] 1 1 0 1 1 1 1 1 1 0 ## [73,] 1 1 0 1 1 1 1 1 1 1 ## [74,] 1 1 0 1 1 1 1 1 1 1 ## [75,] 1 1 0 1 1 1 1 1 1 1 ## [76,] 1 1 0 1 1 1 1 1 1 0 ## [77,] 1 1 0 1 1 1 1 1 1 1 ## [78,] 1 1 0 1 1 1 1 1 1 0 ## [79,] 1 1 0 1 1 1 1 1 1 0 ## [80,] 1 1 0 1 1 1 1 1 1 1 ## [81,] 1 1 0 1 1 1 1 1 1 0 ## [82,] 1 1 0 1 1 1 1 0 1 0 ## [83,] 1 1 0 1 1 1 1 1 0 1 ## [84,] 1 1 0 1 1 1 0 1 0 0 ## [85,] 1 1 0 1 1 1 1 1 1 1 ## [86,] 1 1 0 1 1 1 1 1 1 1 ## [87,] 1 1 0 1 1 1 1 1 1 0 ## [88,] 1 1 0 1 1 1 1 1 1 1 ## [89,] 1 1 0 1 1 1 0 1 0 0 ## [90,] 1 1 0 1 1 1 1 1 1 0 ## [91,] 1 1 0 1 1 1 1 1 1 1 ## [92,] 1 1 0 1 0 1 1 1 1 1 ## [93,] 1 1 0 1 1 1 1 1 1 1 ## [94,] 1 1 0 1 1 1 1 1 1 1 ## [95,] 1 1 0 1 1 1 1 1 1 1 ## [96,] 1 1 0 1 1 1 1 1 1 0 ## [97,] 1 1 0 1 1 1 1 1 1 1 ## [98,] 1 1 0 1 1 1 1 1 1 1 ## [99,] 1 1 0 1 1 1 1 0 1 1 ## [100,] 1 1 0 1 1 1 1 1 1 0 ## [ reached getOption("max.print") -- omitted 24900 rows ] ``` ] --- # Weight regularization * Regular loss function: `\(\text{minimize} \left( MSE = \frac{1}{n} \sum^n_{i=1}(Y_i - \hat Y_i)^2 \right)\)` * Weight regularization: `\(\text{minimize} \left( SSE + P \right)\)` * `\(\text{L}^2\)` norm (aka _weight decay_): `\(\text{minimize } \left( MSE + \lambda \sum^p_{j=1} w_j^2 \right)\)` <br> <img src="03-case-studies_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- # Weight regularization * Regular loss function: `\(\text{minimize} \left( MSE = \frac{1}{n} \sum^n_{i=1}(Y_i - \hat Y_i)^2 \right)\)` * Weight regularization: `\(\text{minimize} \left( SSE + P \right)\)` * `\(\text{L}^2\)` norm (aka _weight decay_): `\(\text{minimize } \left( MSE + \lambda \sum^p_{j=1} w_j^2 \right)\)` * `\(\text{L}^1\)` norm: `\(\text{minimize } \left( MSE + \lambda \sum^p_{j=1} | w_j | \right)\)` <img src="03-case-studies_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- # Dropout .pull-left[ * During each run, each neuron has *p* probability of being dropped (set to 0) * Forces more nodes to identify and relay signal in the data * Helps minimize the model from latching onto happenstance patterns (noise) that are not significant ] .pull-right[ <img src="images/dropout_illustration.png" width="960" style="display: block; margin: auto;" /> ] --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/rstudio-conf-2020/dl-keras-tf) .center[https://github.com/rstudio-conf-2020/dl-keras-tf]