In this case study, our objective is to predict the sales price of a home. This is a regression problem since the goal is to predict any real number across some spectrum ($119,201, $168,594, $301,446, etc). To predict the sales price, we will use numeric and categorical features of the home.
As you proceed, you’ll work through the steps we discussed in the last module:
library(keras) # for deep learning
library(testthat) # unit testing
library(tidyverse) # for dplyr, ggplot2, etc.
library(rsample) # for data splitting
library(recipes) # for feature engineering
For this case study we will use the Ames housing dataset provided by the AmesHousing package.
ames <- AmesHousing::make_ames()
dim(ames)
[1] 2930 81
This data has been partially cleaned up and has no missing data:
sum(is.na(ames))
[1] 0
But this tabular data is a combination of numeric and categorical data that we need to address.
str(ames)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2930 obs. of 81 variables:
$ MS_SubClass : Factor w/ 16 levels "One_Story_1946_and_Newer_All_Styles",..: 1 1 1 1 6 6 12 12 12 6 ...
$ MS_Zoning : Factor w/ 7 levels "Floating_Village_Residential",..: 3 2 3 3 3 3 3 3 3 3 ...
$ Lot_Frontage : num 141 80 81 93 74 78 41 43 39 60 ...
$ Lot_Area : int 31770 11622 14267 11160 13830 9978 4920 5005 5389 7500 ...
$ Street : Factor w/ 2 levels "Grvl","Pave": 2 2 2 2 2 2 2 2 2 2 ...
$ Alley : Factor w/ 3 levels "Gravel","No_Alley_Access",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Lot_Shape : Factor w/ 4 levels "Regular","Slightly_Irregular",..: 2 1 2 1 2 2 1 2 2 1 ...
$ Land_Contour : Factor w/ 4 levels "Bnk","HLS","Low",..: 4 4 4 4 4 4 4 2 4 4 ...
$ Utilities : Factor w/ 3 levels "AllPub","NoSeWa",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Lot_Config : Factor w/ 5 levels "Corner","CulDSac",..: 1 5 1 1 5 5 5 5 5 5 ...
$ Land_Slope : Factor w/ 3 levels "Gtl","Mod","Sev": 1 1 1 1 1 1 1 1 1 1 ...
$ Neighborhood : Factor w/ 28 levels "North_Ames","College_Creek",..: 1 1 1 1 7 7 17 17 17 7 ...
$ Condition_1 : Factor w/ 9 levels "Artery","Feedr",..: 3 2 3 3 3 3 3 3 3 3 ...
$ Condition_2 : Factor w/ 8 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Bldg_Type : Factor w/ 5 levels "OneFam","TwoFmCon",..: 1 1 1 1 1 1 5 5 5 1 ...
$ House_Style : Factor w/ 8 levels "One_and_Half_Fin",..: 3 3 3 3 8 8 3 3 3 8 ...
$ Overall_Qual : Factor w/ 10 levels "Very_Poor","Poor",..: 6 5 6 7 5 6 8 8 8 7 ...
$ Overall_Cond : Factor w/ 10 levels "Very_Poor","Poor",..: 5 6 6 5 5 6 5 5 5 5 ...
$ Year_Built : int 1960 1961 1958 1968 1997 1998 2001 1992 1995 1999 ...
$ Year_Remod_Add : int 1960 1961 1958 1968 1998 1998 2001 1992 1996 1999 ...
$ Roof_Style : Factor w/ 6 levels "Flat","Gable",..: 4 2 4 4 2 2 2 2 2 2 ...
$ Roof_Matl : Factor w/ 8 levels "ClyTile","CompShg",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Exterior_1st : Factor w/ 16 levels "AsbShng","AsphShn",..: 4 14 15 4 14 14 6 7 6 14 ...
$ Exterior_2nd : Factor w/ 17 levels "AsbShng","AsphShn",..: 11 15 16 4 15 15 6 7 6 15 ...
$ Mas_Vnr_Type : Factor w/ 5 levels "BrkCmn","BrkFace",..: 5 4 2 4 4 2 4 4 4 4 ...
$ Mas_Vnr_Area : num 112 0 108 0 0 20 0 0 0 0 ...
$ Exter_Qual : Factor w/ 4 levels "Excellent","Fair",..: 4 4 4 3 4 4 3 3 3 4 ...
$ Exter_Cond : Factor w/ 5 levels "Excellent","Fair",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Foundation : Factor w/ 6 levels "BrkTil","CBlock",..: 2 2 2 2 3 3 3 3 3 3 ...
$ Bsmt_Qual : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 3 6 3 3 3 6 ...
$ Bsmt_Cond : Factor w/ 6 levels "Excellent","Fair",..: 3 6 6 6 6 6 6 6 6 6 ...
$ Bsmt_Exposure : Factor w/ 5 levels "Av","Gd","Mn",..: 2 4 4 4 4 4 3 4 4 4 ...
$ BsmtFin_Type_1 : Factor w/ 7 levels "ALQ","BLQ","GLQ",..: 2 6 1 1 3 3 3 1 3 7 ...
$ BsmtFin_SF_1 : num 2 6 1 1 3 3 3 1 3 7 ...
$ BsmtFin_Type_2 : Factor w/ 7 levels "ALQ","BLQ","GLQ",..: 7 4 7 7 7 7 7 7 7 7 ...
$ BsmtFin_SF_2 : num 0 144 0 0 0 0 0 0 0 0 ...
$ Bsmt_Unf_SF : num 441 270 406 1045 137 ...
$ Total_Bsmt_SF : num 1080 882 1329 2110 928 ...
$ Heating : Factor w/ 6 levels "Floor","GasA",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Heating_QC : Factor w/ 5 levels "Excellent","Fair",..: 2 5 5 1 3 1 1 1 1 3 ...
$ Central_Air : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ Electrical : Factor w/ 6 levels "FuseA","FuseF",..: 5 5 5 5 5 5 5 5 5 5 ...
$ First_Flr_SF : int 1656 896 1329 2110 928 926 1338 1280 1616 1028 ...
$ Second_Flr_SF : int 0 0 0 0 701 678 0 0 0 776 ...
$ Low_Qual_Fin_SF : int 0 0 0 0 0 0 0 0 0 0 ...
$ Gr_Liv_Area : int 1656 896 1329 2110 1629 1604 1338 1280 1616 1804 ...
$ Bsmt_Full_Bath : num 1 0 0 1 0 0 1 0 1 0 ...
$ Bsmt_Half_Bath : num 0 0 0 0 0 0 0 0 0 0 ...
$ Full_Bath : int 1 1 1 2 2 2 2 2 2 2 ...
$ Half_Bath : int 0 0 1 1 1 1 0 0 0 1 ...
$ Bedroom_AbvGr : int 3 2 3 3 3 3 2 2 2 3 ...
$ Kitchen_AbvGr : int 1 1 1 1 1 1 1 1 1 1 ...
$ Kitchen_Qual : Factor w/ 5 levels "Excellent","Fair",..: 5 5 3 1 5 3 3 3 3 3 ...
$ TotRms_AbvGrd : int 7 5 6 8 6 7 6 5 5 7 ...
$ Functional : Factor w/ 8 levels "Maj1","Maj2",..: 8 8 8 8 8 8 8 8 8 8 ...
$ Fireplaces : int 2 0 0 2 1 1 0 0 1 1 ...
$ Fireplace_Qu : Factor w/ 6 levels "Excellent","Fair",..: 3 4 4 6 6 3 4 4 6 6 ...
$ Garage_Type : Factor w/ 7 levels "Attchd","Basment",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Garage_Finish : Factor w/ 4 levels "Fin","No_Garage",..: 1 4 4 1 1 1 1 3 3 1 ...
$ Garage_Cars : num 2 1 1 2 2 2 2 2 2 2 ...
$ Garage_Area : num 528 730 312 522 482 470 582 506 608 442 ...
$ Garage_Qual : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Garage_Cond : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Paved_Drive : Factor w/ 3 levels "Dirt_Gravel",..: 2 3 3 3 3 3 3 3 3 3 ...
$ Wood_Deck_SF : int 210 140 393 0 212 360 0 0 237 140 ...
$ Open_Porch_SF : int 62 0 36 0 34 36 0 82 152 60 ...
$ Enclosed_Porch : int 0 0 0 0 0 0 170 0 0 0 ...
$ Three_season_porch: int 0 0 0 0 0 0 0 0 0 0 ...
$ Screen_Porch : int 0 120 0 0 0 0 0 144 0 0 ...
$ Pool_Area : int 0 0 0 0 0 0 0 0 0 0 ...
$ Pool_QC : Factor w/ 5 levels "Excellent","Fair",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Fence : Factor w/ 5 levels "Good_Privacy",..: 5 3 5 5 3 5 5 5 5 5 ...
$ Misc_Feature : Factor w/ 6 levels "Elev","Gar2",..: 3 3 2 3 3 3 3 3 3 3 ...
$ Misc_Val : int 0 0 12500 0 0 0 0 0 0 0 ...
$ Mo_Sold : int 5 6 6 4 3 6 4 1 3 6 ...
$ Year_Sold : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
$ Sale_Type : Factor w/ 10 levels "COD","Con","ConLD",..: 10 10 10 10 10 10 10 10 10 10 ...
$ Sale_Condition : Factor w/ 6 levels "Abnorml","AdjLand",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Sale_Price : int 215000 105000 172000 244000 189900 195500 213500 191500 236500 189000 ...
$ Longitude : num -93.6 -93.6 -93.6 -93.6 -93.6 ...
$ Latitude : num 42.1 42.1 42.1 42.1 42.1 ...
The numeric variables are on different scales. For example:
ames %>%
select(Lot_Area, Lot_Frontage, Year_Built, Gr_Liv_Area, Garage_Cars, Mo_Sold) %>%
gather(feature, value) %>%
ggplot(aes(feature, value)) +
geom_boxplot() +
scale_y_log10(labels = scales::comma)
There are categorical features that could be ordered:
ames %>%
select(matches("(Qual|Cond|QC|Qu)$")) %>%
str()
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2930 obs. of 12 variables:
$ Overall_Qual: Factor w/ 10 levels "Very_Poor","Poor",..: 6 5 6 7 5 6 8 8 8 7 ...
$ Overall_Cond: Factor w/ 10 levels "Very_Poor","Poor",..: 5 6 6 5 5 6 5 5 5 5 ...
$ Exter_Qual : Factor w/ 4 levels "Excellent","Fair",..: 4 4 4 3 4 4 3 3 3 4 ...
$ Exter_Cond : Factor w/ 5 levels "Excellent","Fair",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Bsmt_Qual : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 3 6 3 3 3 6 ...
$ Bsmt_Cond : Factor w/ 6 levels "Excellent","Fair",..: 3 6 6 6 6 6 6 6 6 6 ...
$ Heating_QC : Factor w/ 5 levels "Excellent","Fair",..: 2 5 5 1 3 1 1 1 1 3 ...
$ Kitchen_Qual: Factor w/ 5 levels "Excellent","Fair",..: 5 5 3 1 5 3 3 3 3 3 ...
$ Fireplace_Qu: Factor w/ 6 levels "Excellent","Fair",..: 3 4 4 6 6 3 4 4 6 6 ...
$ Garage_Qual : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Garage_Cond : Factor w/ 6 levels "Excellent","Fair",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Pool_QC : Factor w/ 5 levels "Excellent","Fair",..: 4 4 4 4 4 4 4 4 4 4 ...
And some of the categorical features have many levels:
ames %>%
select_if(~ is.factor(.) & length(levels(.)) > 8) %>%
glimpse()
Observations: 2,930
Variables: 8
$ MS_SubClass [3m[38;5;246m<fct>[39m[23m One_Story_1946_and_Newer_All_Styles, One_Story_1946_and_Newer_All_Styles, O…
$ Neighborhood [3m[38;5;246m<fct>[39m[23m North_Ames, North_Ames, North_Ames, North_Ames, Gilbert, Gilbert, Stone_Bro…
$ Condition_1 [3m[38;5;246m<fct>[39m[23m Norm, Feedr, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, No…
$ Overall_Qual [3m[38;5;246m<fct>[39m[23m Above_Average, Average, Above_Average, Good, Average, Above_Average, Very_G…
$ Overall_Cond [3m[38;5;246m<fct>[39m[23m Average, Above_Average, Above_Average, Average, Average, Above_Average, Ave…
$ Exterior_1st [3m[38;5;246m<fct>[39m[23m BrkFace, VinylSd, Wd Sdng, BrkFace, VinylSd, VinylSd, CemntBd, HdBoard, Cem…
$ Exterior_2nd [3m[38;5;246m<fct>[39m[23m Plywood, VinylSd, Wd Sdng, BrkFace, VinylSd, VinylSd, CmentBd, HdBoard, Cme…
$ Sale_Type [3m[38;5;246m<fct>[39m[23m WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , …
Consequently, our first challenge is transforming this dataset into numeric tensors that our model can use.
One of the first things we want to do is create a train and test set as you probably noticed that we do not have a train and test set similar to how MNIST was already set up for us. We can use the rsample package to create our train and test datasets.
Note: This will randomly select the 70/30 split so we are randomizing our data with this process.
set.seed(123)
ames_split <- initial_split(ames, prop = 0.7)
ames_train <- analysis(ames_split)
ames_test <- assessment(ames_split)
dim(ames_train)
[1] 2051 81
dim(ames_test)
[1] 879 81
All inputs and response values in a neural network must be tensors of either floating-point or integer data. Moreover, our feature values should not be relatively large compared to the randomized initial weights and all our features should take values in roughly the same range.
Consequently, we need to vectorize our data into a format conducive to neural networks ℹ️. For this data set, we’ll transform our data by:
Note: we’re using the recipes package (https://tidymodels.github.io/recipes)
blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>%
step_nzv(all_nominal()) %>% # step #1
step_other(all_nominal(), threshold = .01, other = "other") %>% # step #2
step_integer(matches("(Qual|Cond|QC|Qu)$")) %>% # step #3
step_YeoJohnson(all_numeric(), -all_outcomes()) %>% # step #4
step_center(all_numeric(), -all_outcomes()) %>% # step #5
step_scale(all_numeric(), -all_outcomes()) %>% # step #5
step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) # step #6
blueprint
Data Recipe
Inputs:
Operations:
Sparse, unbalanced variable filter on all_nominal
Collapsing factor levels for all_nominal
Integer encoding for matches, (Qual|Cond|QC|Qu)$
Yeo-Johnson transformation on all_numeric, -, all_outcomes()
Centering for all_numeric, -, all_outcomes()
Scaling for all_numeric, -, all_outcomes()
Dummy variables from all_nominal, -, all_outcomes()
This next step computes any relavent information (mean and std deviation of numeric features, names of one-hot encoded features) on the training data so there is no information leakage from the test data.
prepare <- prep(blueprint, training = ames_train)
prepare
Data Recipe
Inputs:
Training data contained 2051 data points and no missing data.
Operations:
Sparse, unbalanced variable filter removed Street, Alley, Land_Contour, Utilities, ... [trained]
Collapsing factor levels for MS_SubClass, MS_Zoning, Lot_Shape, Lot_Config, ... [trained]
Integer encoding for Overall_Qual, Overall_Cond, Exter_Qual, Exter_Cond, Bsmt_Qual, ... [trained]
Yeo-Johnson transformation on Lot_Frontage, Lot_Area, Overall_Qual, ... [trained]
Centering for Lot_Frontage, Lot_Area, Overall_Qual, Overall_Cond, ... [trained]
Scaling for Lot_Frontage, Lot_Area, Overall_Qual, Overall_Cond, ... [trained]
Dummy variables from MS_SubClass, MS_Zoning, Lot_Shape, Lot_Config, Neighborhood, ... [trained]
We can now vectorize our training and test data. If you scroll through the data you will notice that all features are now numeric and are either 0/1 (one hot encoded features) or have mean 0 and generally range between -3 and 3.
baked_train <- bake(prepare, new_data = ames_train)
baked_test <- bake(prepare, new_data = ames_test)
# unit testing to ensure all columns are numeric
expect_equal(map_lgl(baked_train, ~ !is.numeric(.)) %>% sum(), 0)
expect_equal(map_lgl(baked_test, ~ !is.numeric(.)) %>% sum(), 0)
baked_train
Lastly, we need to create the final feature and response objects for train and test data. Since keras and tensorflow require our features & labels to be seperate objects we need to separate them. In doing so, our features need to be a 2D tensor which is why we apply as.matrix
and our response needs to be a vector which is why we apply pull
.
x_train <- select(baked_train, -Sale_Price) %>% as.matrix()
y_train <- baked_train %>% pull(Sale_Price)
x_test <- select(baked_test, -Sale_Price) %>% as.matrix()
y_test <- baked_test %>% pull(Sale_Price)
# unit testing to x & y tensors have same number of observations
expect_equal(nrow(x_train), length(y_train))
expect_equal(nrow(x_test), length(y_test))
Our final feature set now has 188 input variables:
dim(x_train)
[1] 2051 188
dim(x_test)
[1] 879 188
To get started, let’s build a simple model with…
Now, start with the default batch size of 32 and then compare with smaller values (i.e. 16) and larger values (i.e. 128). You’re looking to balance the progression of the loss learning curve and the training spead.
Comment: The default batch size of 32 performs pretty well in this case but you could’ve easily have choosen lower (8 or 16) or higher (64, 128) without negative impacts. We can see that the loss is still trending downward so we should have lots of room for improvement.
n_feat <- ncol(x_train)
model <- keras_model_sequential() %>%
layer_dense(units = 128, activation = "relu", input_shape = ncol(x_train)) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = "sgd",
loss = "msle",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
validation_split = 0.2
)
history
Trained on 1,640 samples (batch_size=32, epochs=10)
Final epoch (plot to see history):
loss: 34.61
mae: 180,347
val_loss: 34.72
val_mae: 180,139
plot(history)
Now go head and start assessing different adaptive learning rates such as:
Try a variety of learning rates. Recall that we typically start assessing rates on a logarithmic scale (i.e. 0.1, 0.01, …, 0.0001).
Comment: The default learning rates on the common adaptive learning rate optimizers show a slow progression down the loss curve so we can afford to use a larger learning rate. I found that the RMSprop tended to provide the best results at this point.
model <- keras_model_sequential() %>%
layer_dense(units = 128, activation = "relu", input_shape = ncol(x_train)) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.1),
loss = "msle",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
validation_split = 0.2
)
history
Trained on 1,640 samples (batch_size=32, epochs=10)
Final epoch (plot to see history):
loss: 0.01915
mae: 15,686
val_loss: 0.01508
val_mae: 16,308
plot(history)
Add the following callbacks and see if your performance improves:
patience = 3
and min_delta = 0.00001
patience = 1
Comment: Adding early stopping improves performance because we can increase the epochs but stop when necessary. Most of my optimal models were stopping at around 15 epochs at this point. Also, adding callback_reduce_lr_on_plateau()
also improves. I plot the learning rates by epoch below and we can see that they reduce multiple times which allows our model to eek out a little more performance improvements.
model <- keras_model_sequential() %>%
layer_dense(units = 128, activation = "relu", input_shape = ncol(x_train)) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.1),
loss = "msle",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
epochs = 30,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 3, min_delta = 0.00001),
callback_reduce_lr_on_plateau(patience = 1)
)
)
history
Trained on 1,640 samples (batch_size=32, epochs=16)
Final epoch (plot to see history):
loss: 0.01694
mae: 14,681
val_loss: 0.01404
val_mae: 15,922
lr: 0.0001
plot(history)
Plotting the learning rate shows that it reduced multiple times during training:
plot(history$metrics$lr)
Now start to explore different widths and depths to your model.
Comment: I follow the same approach we used in the previous module to assess combinations of different nodes and hidden layers. I used the tensorboard callback to save my model runs and analyze them.
train_model <- function(n_units, n_layers, log_to) {
# Create a model with a single hidden input layer
model <- keras_model_sequential() %>%
layer_dense(units = n_units, activation = "relu", input_shape = n_feat)
# Add additional hidden layers based on input
if (n_layers > 1) {
for (i in seq_along(n_layers - 1)) {
model %>% layer_dense(units = n_units, activation = "relu")
}
}
# Add final output layer
model %>% layer_dense(units = 1)
# compile model
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.1),
loss = "msle",
metrics = "mae"
)
# train model and store results with callback_tensorboard()
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
epochs = 30,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 3, min_delta = 0.00001),
callback_reduce_lr_on_plateau(patience = 1),
callback_tensorboard(log_dir = log_to)
),
verbose = FALSE
)
return(history)
}
grid <- expand_grid(
units = c(128, 256, 512, 1024),
layers = c(1:3)
) %>%
mutate(id = paste0("mlp_", layers, "_layers_", units, "_units"))
grid
The initial results don’t show any glaring trends. All our models have loss scores ranging from 0.0135-0.015.
for (row in seq_len(nrow(grid))) {
# get parameters
units <- grid[[row, "units"]]
layers <- grid[[row, "layers"]]
file_path <- paste0("ames/", grid[[row, "id"]])
# provide status update
cat(layers, "hidden layer(s) with", units, "neurons: ")
# train model
m <- train_model(n_units = units, n_layers = layers, log_to = file_path)
min_loss <- min(m$metrics$val_loss, na.rm = TRUE)
# update status with loss
cat(min_loss, "\n", append = TRUE)
}
Looking at the tensorboard shows that, really, any of the models are decent choices as they all have relatively similar results, low variance which means they all are stable models, they all have minimal overfitting, and compute time is definitely not a problem.
tensorboard("ames")
TensorBoard 2.0.1 at http://127.0.0.1:7065/ (Press CTRL+C to quit)
Started TensorBoard at http://127.0.0.1:7065
Comment: After a little more experimenting I found a funnel shaped approached tended to produce a little more improvement in performance:
model <- keras_model_sequential() %>%
layer_dense(units = 1024, activation = "relu", input_shape = n_feat) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.1),
loss = "msle",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
epochs = 30,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 3, min_delta = 0.00001),
callback_reduce_lr_on_plateau(patience = 1)
)
)
history
Trained on 1,640 samples (batch_size=32, epochs=15)
Final epoch (plot to see history):
loss: 0.01102
mae: 12,501
val_loss: 0.01345
val_mae: 14,026
lr: 0.00000001
plot(history) + scale_y_log10()
If your model is overfitting, try to add…
kernel_regularizer = regularizer_l2(l = xxx)
). Remember, we typically start by assessing values on logarithmic scale [0.1, 0.00001].layer_dropout()
) between each layer. Remember, dropout rates typically range from 20-50%.Comment: Pretty much any weight regularizer hurt model performance. In this case, since our validation loss has minimal overfitting we can probably disregard any additional regularization.
model <- keras_model_sequential() %>%
layer_dense(units = 1024, activation = "relu", input_shape = n_feat) %>%
layer_dropout(0.2) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.1),
loss = "msle",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train,
batch_size = 32,
epochs = 30,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 3, min_delta = 0.00001),
callback_reduce_lr_on_plateau(patience = 1)
)
)
history
Trained on 1,640 samples (batch_size=32, epochs=14)
Final epoch (plot to see history):
loss: 0.03473
mae: 25,796
val_loss: 0.01615
val_mae: 15,914
lr: 0.00001
As this point we could repeat the process and…