This project is designed to test your current knowledge on applying LSTMS to the Cornell Movie Review dataset provided by Cornell University. This dataset contains movie reviews introduced in Pang & Lee (2004) with 2000 total observations. Detailed information about the data can be found here.
Your goal is to develop and compare the performance of a word embedding deep learning classifier to one that incorporates LSTM sequence embedding. I will guide you along the way but this project expects you to do most of the work from importing and preprocessing text, to building the models.
Nearly all the code that you need can be found in these notebooks:
Good luck!
Requirements
library(keras)
library(tidyverse)
library(fs)
library(glue)
library(testthat)
Import the data
For those in the workshop we have already downloaded the movie review data for you into the "/materials/data/cornell_reviews"
directory. Outside of the workshop, you can find the download instructions here.
# get path to data
if (stringr::str_detect(here::here(), "conf-2020-user")) {
movie_dir <- "/home/conf-2020-user/data/cornell_reviews/data"
} else {
movie_dir <- here::here("materials", "data", "cornell_reviews", "data")
}
fs::dir_tree(movie_dir, recurse = FALSE)
[01;34m/Users/b294776/Desktop/Workspace/Training/rstudio-conf-2020/dl-keras-tf/materials/data/cornell_reviews/data[0m
├── [01;34mneg[0m
└── [01;34mpos[0m
Step 1: You can see the data have already been separated into positive vs negative sets. The actual reviews are contained in individual .txt files. Similar to Intro to word embeddings, let’s go ahead use this structure to our advantage by iterating over each review and…
- creating the path to each individual review file,
- creating a label based on the “neg” or “pos” folder the review is in, and
- saving the output as a data frame with each review on an individual row.
training_files <- _____ %>%
dir_ls() %>%
map(dir_ls) %>%
set_names(basename) %>%
plyr::ldply(data_frame) %>%
set_names(c("label", "path"))
# you should have 2000 total observations
expect_equal(nrow(training_files), 2000)
Go ahead and take a look at your data frame
training_files
Step 2: How many obseravations are in each response label (i.e. “neg” vs “pos”)?
count(training_files, _____)
Step 3: Next, let’s iterate over each row and
- save the label in a labels vector,
- import the movie review, and
- save in a texts vector.
obs <- nrow(training_files)
labels <- vector(mode = "integer", length = obs)
texts <- vector(mode = "character", length = obs)
for (file in seq_len(obs)) {
label <- training_files[[file, "label"]]
path <- training_files[[file, "path"]]
labels[file] <- ifelse(label == "neg", 0, 1)
texts[file] <- readChar(path, nchars = file.size(path))
}
The number of observations in your text should be equal to the number of responses.
expect_equal(length(texts), length(labels))
Go ahead and check out the text of a couple reviews.
texts[_____]
Data exploration
Step 4: Before preprocessing, let’s get a sense of two attributes that will help us set two of our preprocessing hyperparameters:
How many unique words exist across all our reviews? We’ll use this to determine a good starting point for preprocessing our text.
What is the distribution of word count across all movie reviews (i.e. mean, median)? We’ll use this to determine a good starting point for preprocessing our text.
# reference http://bit.ly/dl-imdb-embeddings for code options
Data preprocessing
Step 5: Now let’s tokenize our text sequences. To do so we:
- Specify how many words we want to include. Remember, a good starting point to use roughly 50% of the number of unique words in the data. This is a hyper- parameter that you can always come back to and adjust.
- Create a
text_tokenizer
object which defines how we want to preprocess the text. The defaults are sufficient.
- Apply the tokenizer to our text with
fit_text_tokenizer()
.
- Extract our vectorized review data with
texts_to_sequences()
.
# 1
top_n_words <- _____
# 2-3
tokenizer <- text_tokenizer(num_words = _____) %>%
fit_text_tokenizer(texts)
# 4
sequences <- texts_to_sequences(tokenizer, _____)
Go ahead and check out the first vectorized sequence. Should look familiar from earlier modules.
# The vectorized first instance:
sequences[[1]]
We can see how our tokenizer converted our original text to a cleaned up version:
cat(crayon::blue("Original text:\n"))
texts[[1]]
cat(crayon::blue("\nRevised text:\n"))
paste(unlist(tokenizer$index_word)[sequences[[1]]] , collapse = " ")
Step 6: Next, since each review is a different length, we need to limit ourselves to a certain number of words so that all our text sequences are the same length.
To do so we:
- Specify the max length for each sequence. You can start out with 500 and then tune this hyperparameter later.
- Use
pad_sequences()
to truncate or pad reviews to the specified max_len
.
max_len <- _____
features <- pad_sequences(_____, maxlen = _____)
Your now have your preprocessed feature data that is a 2D tensor (aka matrix) and contains 2000 observations (rows) and max_len
columns.
dim(features)
expect_equal(class(features), "matrix")
expect_equal(dim(features), c(obs, max_len))
You can see how the final preprocessed sequence looks for the first movie review with the following code:
paste(unlist(tokenizer$index_word)[features[1,]], collapse = " ")
Model training
Step 7: To train our model we will use the validation_split
procedure within fit()
. Remember, this takes the last XX% of our data to be used as our validation set. But if you recall, our data was organized in “neg” and “pos” folders so we should randomize our data to make sure our validation set doesn’t end up being all positive or negative reviews!
set.seed(123)
index <- sample(_____)
x_train <- features[index, ]
y_train <- labels[index]
# there should be 2 unique values (0 - neg, 1 - pos) in last 30% of data
expect_equal(
length(unique(y_train[floor(length(y_train) * 0.7):length(y_train)])),
2
)
Word embedding model
Step 8: We’re now ready to do modeling. For our first model, let’s create a model that:
- applies a word embedding layer
input_dim
should equal top_n_words
input_length
should equal max_len
- start with
output_dim
= 16
- flattens the embeddings
- classifies with a dense layer
You can use early stopping if you’d like but for the first model:
- use the default learning rate
- 20 epochs is more than enough
- use a batch size of 32
- use a validation split of 30%
model_basic <- keras_model_sequential() %>%
layer_embedding(
input_dim = _____, # number of words we are considering
input_length = _____, # length that we have set each review to
output_dim = _____ # length of our word embeddings
) %>%
layer______() %>%
layer_dense(units = _____, activation = _____)
model_basic %>% compile(
optimizer = _____,
loss = _____,
metrics = "accuracy"
)
history_basic <- model_basic %>%
fit(
x_train, y_train,
epochs = _____,
batch_size = _____,
validation_split = _____
)
Run the following code to check out your optimal loss and corresponding accuracy.
best_epoch <- which.min(history_basic$metrics$val_loss)
best_loss <- history_basic$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history_basic$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Word embedding + LSTM model
Step 9: Now let’s build on to the previous model by adding an LSTM layer after the layer_embedding
layer. When feeding an embedding layer into an LSTM layer you do not need to flatten the layer. Reference the Intro to LSTMs notebook. For this first LSTM model use units = 32
.
model_lstm <- keras_model_sequential() %>%
layer_embedding(
input_dim = _____,
input_length = _____,
output_dim = _____
) %>%
layer______(units = _____) %>%
layer_dense(units = _____, activation = _____)
model_lstm %>% compile(
optimizer = _____,
loss = _____,
metrics = "accuracy"
)
history_lstm <- model_lstm %>% fit(
x_train, y_train,
epochs = _____,
batch_size = _____,
validation_split = _____
)
Run the following code to check out your optimal loss and corresponding accuracy.
- How does it compare to the word embedding only model?
- Why do you think there is a difference?
best_epoch <- which.min(history_lstm$metrics$val_loss)
best_loss <- history_lstm$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history_lstm$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Search for a better model
Step 10: Spend the rest of the time tuning hyperparameters and see if you can find a better model. Things you can try:
- Preprocessing hyperparameters
- adjust the number of words to retain in the word index (
top_n_words
)
- adjust the size of the sequences (
max_len
)
- Word embedding layer
- LSTM layer
- Other
- adjust the learning rate (or even the optimizer (i.e. try “adam”))
- adjust the
batch_size
- add a callback to adjust the learning upon plateauing
🏠
LS0tCnRpdGxlOiAiQ2FuIFlvdSBJbXByb3ZlIFNlbnRpbWVudCBQb2xhcml0eSB3aXRoIExTVE1zPyIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB0cnVlCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSwgY2FjaGUgPSBGQUxTRSwgbWVzc2FnZSA9IEZBTFNFLCB3YXJuaW5nID0gRkFMU0UpCmBgYAoKVGhpcyBwcm9qZWN0IGlzIGRlc2lnbmVkIHRvIHRlc3QgeW91ciBjdXJyZW50IGtub3dsZWRnZSBvbiBhcHBseWluZyBMU1RNUyB0byB0aGUKW0Nvcm5lbGwgTW92aWUgUmV2aWV3IGRhdGFzZXRdKGh0dHA6Ly93d3cuY3MuY29ybmVsbC5lZHUvcGVvcGxlL3BhYm8vbW92aWUtcmV2aWV3LWRhdGEvKQpwcm92aWRlZCBieSBDb3JuZWxsIFVuaXZlcnNpdHkuIFRoaXMgZGF0YXNldCBjb250YWlucyBtb3ZpZSByZXZpZXdzIGludHJvZHVjZWQKaW4gW1BhbmcgJiBMZWUgKDIwMDQpXShodHRwczovL2JpdC5seS8yU1dHVkJaKSB3aXRoIDIwMDAgdG90YWwgb2JzZXJ2YXRpb25zLgpEZXRhaWxlZCBpbmZvcm1hdGlvbiBhYm91dCB0aGUgZGF0YSBjYW4gYmUgZm91bmQgW2hlcmVdKGh0dHBzOi8vYml0Lmx5LzJOMDhvMjIpLgoKWW91ciBnb2FsIGlzIHRvIGRldmVsb3AgYW5kIGNvbXBhcmUgdGhlIHBlcmZvcm1hbmNlIG9mIGEgd29yZCBlbWJlZGRpbmcgZGVlcApsZWFybmluZyBjbGFzc2lmaWVyIHRvIG9uZSB0aGF0IGluY29ycG9yYXRlcyBMU1RNIHNlcXVlbmNlIGVtYmVkZGluZy4gSSB3aWxsCmd1aWRlIHlvdSBhbG9uZyB0aGUgd2F5IGJ1dCB0aGlzIHByb2plY3QgZXhwZWN0cyB5b3UgdG8gZG8gbW9zdCBvZiB0aGUgd29yayBmcm9tCmltcG9ydGluZyBhbmQgcHJlcHJvY2Vzc2luZyB0ZXh0LCB0byBidWlsZGluZyB0aGUgbW9kZWxzLgoKTmVhcmx5IGFsbCB0aGUgY29kZSB0aGF0IHlvdSBuZWVkIGNhbiBiZSBmb3VuZCBpbiB0aGVzZSBub3RlYm9va3M6CgoqIFtJbnRybyB0byB3b3JkIGVtYmVkZGluZ3NdKGh0dHA6Ly9iaXQubHkvZGwtaW1kYi1lbWJlZGRpbmdzKQoqIFtJbnRybyB0byBMU1RNc10oaHR0cDovL2JpdC5seS9kbC1sc3RtLWludHJvKQoKX19fR29vZCBsdWNrIV9fXwoKIyBSZXF1aXJlbWVudHMKCmBgYHtyfQpsaWJyYXJ5KGtlcmFzKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShmcykKbGlicmFyeShnbHVlKQpsaWJyYXJ5KHRlc3R0aGF0KQpgYGAKCiMgSW1wb3J0IHRoZSBkYXRhCgpGb3IgdGhvc2UgaW4gdGhlIHdvcmtzaG9wIHdlIGhhdmUgYWxyZWFkeSBkb3dubG9hZGVkIHRoZSBtb3ZpZSByZXZpZXcgZGF0YSBmb3IKeW91IGludG8gdGhlIGAiL21hdGVyaWFscy9kYXRhL2Nvcm5lbGxfcmV2aWV3cyJgIGRpcmVjdG9yeS4gT3V0c2lkZSBvZiB0aGUKd29ya3Nob3AsIHlvdSBjYW4gZmluZCB0aGUgZG93bmxvYWQgaW5zdHJ1Y3Rpb25zIFtoZXJlXShodHRwOi8vYml0Lmx5L2RsLXJxbXRzKS4KCmBgYHtyfQojIGdldCBwYXRoIHRvIGRhdGEKaWYgKHN0cmluZ3I6OnN0cl9kZXRlY3QoaGVyZTo6aGVyZSgpLCAiY29uZi0yMDIwLXVzZXIiKSkgewogIG1vdmllX2RpciA8LSAiL2hvbWUvY29uZi0yMDIwLXVzZXIvZGF0YS9jb3JuZWxsX3Jldmlld3MvZGF0YSIKfSBlbHNlIHsKICBtb3ZpZV9kaXIgPC0gaGVyZTo6aGVyZSgibWF0ZXJpYWxzIiwgImRhdGEiLCAiY29ybmVsbF9yZXZpZXdzIiwgImRhdGEiKQp9Cgpmczo6ZGlyX3RyZWUobW92aWVfZGlyLCByZWN1cnNlID0gRkFMU0UpCmBgYAoKX19TdGVwIDFfXzogWW91IGNhbiBzZWUgdGhlIGRhdGEgaGF2ZSBhbHJlYWR5IGJlZW4gc2VwYXJhdGVkIGludG8gIHBvc2l0aXZlIHZzIApuZWdhdGl2ZSBzZXRzLiBUaGUgYWN0dWFsIHJldmlld3MgYXJlIGNvbnRhaW5lZCBpbiBpbmRpdmlkdWFsIC50eHQgZmlsZXMuIFNpbWlsYXIKdG8gW0ludHJvIHRvIHdvcmQgZW1iZWRkaW5nc10oaHR0cDovL2JpdC5seS9kbC1pbWRiLWVtYmVkZGluZ3MpLCBsZXQncyBnbyBhaGVhZAp1c2UgdGhpcyBzdHJ1Y3R1cmUgdG8gb3VyIGFkdmFudGFnZSBieSBpdGVyYXRpbmcgb3ZlciBlYWNoIHJldmlldyBhbmQuLi4KCjEuIGNyZWF0aW5nIHRoZSBwYXRoIHRvIGVhY2ggaW5kaXZpZHVhbCByZXZpZXcgZmlsZSwKMi4gY3JlYXRpbmcgYSBsYWJlbCBiYXNlZCBvbiB0aGUg4oCcbmVn4oCdIG9yIOKAnHBvc+KAnSBmb2xkZXIgdGhlIHJldmlldyBpcyBpbiwgYW5kCjMuIHNhdmluZyB0aGUgb3V0cHV0IGFzIGEgZGF0YSBmcmFtZSB3aXRoIGVhY2ggcmV2aWV3IG9uIGFuIGluZGl2aWR1YWwgcm93LgoKYGBge3J9CnRyYWluaW5nX2ZpbGVzIDwtIF9fX19fICU+JQogIGRpcl9scygpICU+JQogIG1hcChkaXJfbHMpICU+JQogIHNldF9uYW1lcyhiYXNlbmFtZSkgJT4lCiAgcGx5cjo6bGRwbHkoZGF0YV9mcmFtZSkgJT4lCiAgc2V0X25hbWVzKGMoImxhYmVsIiwgInBhdGgiKSkKCiMgeW91IHNob3VsZCBoYXZlIDIwMDAgdG90YWwgb2JzZXJ2YXRpb25zCmV4cGVjdF9lcXVhbChucm93KHRyYWluaW5nX2ZpbGVzKSwgMjAwMCkKYGBgCgpHbyBhaGVhZCBhbmQgdGFrZSBhIGxvb2sgYXQgeW91ciBkYXRhIGZyYW1lCgpgYGB7cn0KdHJhaW5pbmdfZmlsZXMKYGBgCgpfX1N0ZXAgMl9fOiBIb3cgbWFueSBvYnNlcmF2YXRpb25zIGFyZSBpbiBlYWNoIHJlc3BvbnNlIGxhYmVsIChpLmUuICJuZWciIHZzICJwb3MiKT8KCmBgYHtyfQpjb3VudCh0cmFpbmluZ19maWxlcywgX19fX18pCmBgYAoKX19TdGVwIDNfXzogTmV4dCwgbGV0J3MgaXRlcmF0ZSBvdmVyIGVhY2ggcm93IGFuZAoKMS4gc2F2ZSB0aGUgbGFiZWwgaW4gYSBsYWJlbHMgdmVjdG9yLAoyLiBpbXBvcnQgdGhlIG1vdmllIHJldmlldywgYW5kCjMuIHNhdmUgaW4gYSB0ZXh0cyB2ZWN0b3IuCgpgYGB7cn0Kb2JzIDwtIG5yb3codHJhaW5pbmdfZmlsZXMpCmxhYmVscyA8LSB2ZWN0b3IobW9kZSA9ICJpbnRlZ2VyIiwgbGVuZ3RoID0gb2JzKQp0ZXh0cyA8LSB2ZWN0b3IobW9kZSA9ICJjaGFyYWN0ZXIiLCBsZW5ndGggPSBvYnMpCgpmb3IgKGZpbGUgaW4gc2VxX2xlbihvYnMpKSB7CiAgbGFiZWwgPC0gdHJhaW5pbmdfZmlsZXNbW2ZpbGUsICJsYWJlbCJdXQogIHBhdGggPC0gdHJhaW5pbmdfZmlsZXNbW2ZpbGUsICJwYXRoIl1dCiAgCiAgbGFiZWxzW2ZpbGVdIDwtIGlmZWxzZShsYWJlbCA9PSAibmVnIiwgMCwgMSkKICB0ZXh0c1tmaWxlXSA8LSByZWFkQ2hhcihwYXRoLCBuY2hhcnMgPSBmaWxlLnNpemUocGF0aCkpIAp9CmBgYAoKVGhlIG51bWJlciBvZiBvYnNlcnZhdGlvbnMgaW4geW91ciB0ZXh0IHNob3VsZCBiZSBlcXVhbCB0byB0aGUgbnVtYmVyIG9mIHJlc3BvbnNlcy4KCmBgYHtyfQpleHBlY3RfZXF1YWwobGVuZ3RoKHRleHRzKSwgbGVuZ3RoKGxhYmVscykpCmBgYAoKR28gYWhlYWQgYW5kIGNoZWNrIG91dCB0aGUgdGV4dCBvZiBhIGNvdXBsZSByZXZpZXdzLgoKYGBge3J9CnRleHRzW19fX19fXQpgYGAKCiMgRGF0YSBleHBsb3JhdGlvbgoKX19TdGVwIDRfXzogQmVmb3JlIHByZXByb2Nlc3NpbmcsIGxldCdzIGdldCBhIHNlbnNlIG9mIHR3byBhdHRyaWJ1dGVzIHRoYXQgd2lsbApoZWxwIHVzIHNldCB0d28gb2Ygb3VyIHByZXByb2Nlc3NpbmcgaHlwZXJwYXJhbWV0ZXJzOgoKMS4gSG93IG1hbnkgdW5pcXVlIHdvcmRzIGV4aXN0IGFjcm9zcyBhbGwgb3VyIHJldmlld3M/IFdlJ2xsIHVzZSB0aGlzIHRvIGRldGVybWluZQphIGdvb2Qgc3RhcnRpbmcgcG9pbnQgZm9yIHByZXByb2Nlc3Npbmcgb3VyIHRleHQuCgoyLiBXaGF0IGlzIHRoZSBkaXN0cmlidXRpb24gb2Ygd29yZCBjb3VudCBhY3Jvc3MgYWxsIG1vdmllIHJldmlld3MgKGkuZS4gbWVhbiwgCm1lZGlhbik/IFdlJ2xsIHVzZSB0aGlzIHRvIGRldGVybWluZSBhIGdvb2Qgc3RhcnRpbmcgcG9pbnQgZm9yIHByZXByb2Nlc3Npbmcgb3VyCnRleHQuCgpgYGB7cn0KIyByZWZlcmVuY2UgaHR0cDovL2JpdC5seS9kbC1pbWRiLWVtYmVkZGluZ3MgZm9yIGNvZGUgb3B0aW9ucwpgYGAKCiMgRGF0YSBwcmVwcm9jZXNzaW5nCgpfX1N0ZXAgNV9fOiBOb3cgbGV0J3MgdG9rZW5pemUgb3VyIHRleHQgc2VxdWVuY2VzLiBUbyBkbyBzbyB3ZToKCjEuIFNwZWNpZnkgaG93IG1hbnkgd29yZHMgd2Ugd2FudCB0byBpbmNsdWRlLiBSZW1lbWJlciwgYSBnb29kIHN0YXJ0aW5nIHBvaW50CiAgIHRvIHVzZSByb3VnaGx5IDUwJSBvZiB0aGUgbnVtYmVyIG9mIHVuaXF1ZSB3b3JkcyBpbiB0aGUgZGF0YS4gVGhpcyBpcyBhIGh5cGVyLQogICBwYXJhbWV0ZXIgdGhhdCB5b3UgY2FuIGFsd2F5cyBjb21lIGJhY2sgdG8gYW5kIGFkanVzdC4KMi4gQ3JlYXRlIGEgYHRleHRfdG9rZW5pemVyYCBvYmplY3Qgd2hpY2ggZGVmaW5lcyBob3cgd2Ugd2FudCB0byBwcmVwcm9jZXNzIHRoZQogICB0ZXh0LiBUaGUgZGVmYXVsdHMgYXJlIHN1ZmZpY2llbnQuCjMuIEFwcGx5IHRoZSB0b2tlbml6ZXIgdG8gb3VyIHRleHQgd2l0aCBgZml0X3RleHRfdG9rZW5pemVyKClgLgo0LiBFeHRyYWN0IG91ciB2ZWN0b3JpemVkIHJldmlldyBkYXRhIHdpdGggYHRleHRzX3RvX3NlcXVlbmNlcygpYC4KCmBgYHtyfQojIDEKdG9wX25fd29yZHMgPC0gX19fX18KCiMgMi0zCnRva2VuaXplciA8LSB0ZXh0X3Rva2VuaXplcihudW1fd29yZHMgPSBfX19fXykgJT4lIAogIGZpdF90ZXh0X3Rva2VuaXplcih0ZXh0cykKCiMgNApzZXF1ZW5jZXMgPC0gdGV4dHNfdG9fc2VxdWVuY2VzKHRva2VuaXplciwgX19fX18pCmBgYAoKR28gYWhlYWQgYW5kIGNoZWNrIG91dCB0aGUgZmlyc3QgdmVjdG9yaXplZCBzZXF1ZW5jZS4gU2hvdWxkIGxvb2sgZmFtaWxpYXIgZnJvbQplYXJsaWVyIG1vZHVsZXMuCgpgYGB7cn0KIyBUaGUgdmVjdG9yaXplZCBmaXJzdCBpbnN0YW5jZToKc2VxdWVuY2VzW1sxXV0KYGBgCgpXZSBjYW4gc2VlIGhvdyBvdXIgdG9rZW5pemVyIGNvbnZlcnRlZCBvdXIgb3JpZ2luYWwgdGV4dCB0byBhIGNsZWFuZWQgdXAgCnZlcnNpb246CgpgYGB7cn0gCmNhdChjcmF5b246OmJsdWUoIk9yaWdpbmFsIHRleHQ6XG4iKSkKdGV4dHNbWzFdXQoKY2F0KGNyYXlvbjo6Ymx1ZSgiXG5SZXZpc2VkIHRleHQ6XG4iKSkKcGFzdGUodW5saXN0KHRva2VuaXplciRpbmRleF93b3JkKVtzZXF1ZW5jZXNbWzFdXV0gLCBjb2xsYXBzZSA9ICIgIikKYGBgCgpfX1N0ZXAgNl9fOiBOZXh0LCBzaW5jZSBlYWNoIHJldmlldyBpcyBhIGRpZmZlcmVudCBsZW5ndGgsIHdlIG5lZWQgdG8gbGltaXQKb3Vyc2VsdmVzIHRvIGEgY2VydGFpbiBudW1iZXIgb2Ygd29yZHMgc28gdGhhdCBhbGwgb3VyIHRleHQgc2VxdWVuY2VzIGFyZSB0aGUKc2FtZSBsZW5ndGguIAoKVG8gZG8gc28gd2U6CgoxLiBTcGVjaWZ5IHRoZSBtYXggbGVuZ3RoIGZvciBlYWNoIHNlcXVlbmNlLiBZb3UgY2FuIHN0YXJ0IG91dCB3aXRoIDUwMCBhbmQgdGhlbgp0dW5lIHRoaXMgaHlwZXJwYXJhbWV0ZXIgbGF0ZXIuCjIuIFVzZSBgcGFkX3NlcXVlbmNlcygpYCB0byB0cnVuY2F0ZSBvciBwYWQgcmV2aWV3cyB0byB0aGUgc3BlY2lmaWVkIGBtYXhfbGVuYC4KCmBgYHtyfQptYXhfbGVuIDwtIF9fX19fCmZlYXR1cmVzIDwtIHBhZF9zZXF1ZW5jZXMoX19fX18sIG1heGxlbiA9IF9fX19fKQpgYGAKCllvdXIgbm93IGhhdmUgeW91ciBwcmVwcm9jZXNzZWQgZmVhdHVyZSBkYXRhIHRoYXQgaXMgYSAyRCB0ZW5zb3IgKGFrYSBtYXRyaXgpCmFuZCBjb250YWlucyAyMDAwIG9ic2VydmF0aW9ucyAocm93cykgYW5kIGBtYXhfbGVuYCBjb2x1bW5zLgoKYGBge3J9CmRpbShmZWF0dXJlcykKCmV4cGVjdF9lcXVhbChjbGFzcyhmZWF0dXJlcyksICJtYXRyaXgiKQpleHBlY3RfZXF1YWwoZGltKGZlYXR1cmVzKSwgYyhvYnMsIG1heF9sZW4pKQpgYGAKCllvdSBjYW4gc2VlIGhvdyB0aGUgZmluYWwgcHJlcHJvY2Vzc2VkIHNlcXVlbmNlIGxvb2tzIGZvciB0aGUgZmlyc3QgbW92aWUgcmV2aWV3CndpdGggdGhlIGZvbGxvd2luZyBjb2RlOgoKYGBge3J9CnBhc3RlKHVubGlzdCh0b2tlbml6ZXIkaW5kZXhfd29yZClbZmVhdHVyZXNbMSxdXSwgY29sbGFwc2UgPSAiICIpCmBgYAoKIyBNb2RlbCB0cmFpbmluZwoKX19TdGVwIDdfXzogVG8gdHJhaW4gb3VyIG1vZGVsIHdlIHdpbGwgdXNlIHRoZSBgdmFsaWRhdGlvbl9zcGxpdGAgcHJvY2VkdXJlCndpdGhpbiBgZml0KClgLiBSZW1lbWJlciwgdGhpcyB0YWtlcyB0aGUgbGFzdCBYWCUgb2Ygb3VyIGRhdGEgdG8gYmUgdXNlZCBhcyBvdXIKdmFsaWRhdGlvbiBzZXQuIEJ1dCBpZiB5b3UgcmVjYWxsLCBvdXIgZGF0YSB3YXMgb3JnYW5pemVkIGluICJuZWciIGFuZCAicG9zIgpmb2xkZXJzIHNvIHdlIHNob3VsZCByYW5kb21pemUgb3VyIGRhdGEgdG8gbWFrZSBzdXJlIG91ciB2YWxpZGF0aW9uIHNldCBkb2VzbuKAmXQKZW5kIHVwIGJlaW5nIGFsbCBwb3NpdGl2ZSBvciBuZWdhdGl2ZSByZXZpZXdzIQoKYGBge3J9CnNldC5zZWVkKDEyMykKaW5kZXggPC0gc2FtcGxlKF9fX19fKQoKeF90cmFpbiA8LSBmZWF0dXJlc1tpbmRleCwgXQp5X3RyYWluIDwtIGxhYmVsc1tpbmRleF0KCiMgdGhlcmUgc2hvdWxkIGJlIDIgdW5pcXVlIHZhbHVlcyAoMCAtIG5lZywgMSAtIHBvcykgaW4gbGFzdCAzMCUgb2YgZGF0YQpleHBlY3RfZXF1YWwoCiAgbGVuZ3RoKHVuaXF1ZSh5X3RyYWluW2Zsb29yKGxlbmd0aCh5X3RyYWluKSAqIDAuNyk6bGVuZ3RoKHlfdHJhaW4pXSkpLCAKICAyCiAgKQpgYGAKCiMjIFdvcmQgZW1iZWRkaW5nIG1vZGVsCgpfX1N0ZXAgOF9fOiBXZSdyZSBub3cgcmVhZHkgdG8gZG8gbW9kZWxpbmcuIEZvciBvdXIgZmlyc3QgbW9kZWwsIGxldCdzIGNyZWF0ZSBhCm1vZGVsIHRoYXQ6CgoxLiBhcHBsaWVzIGEgd29yZCBlbWJlZGRpbmcgbGF5ZXIKICAgLSBgaW5wdXRfZGltYCBzaG91bGQgZXF1YWwgYHRvcF9uX3dvcmRzYAogICAtIGBpbnB1dF9sZW5ndGhgIHNob3VsZCBlcXVhbCBgbWF4X2xlbmAKICAgLSBzdGFydCB3aXRoIGBvdXRwdXRfZGltYCA9IDE2CjIuIGZsYXR0ZW5zIHRoZSBlbWJlZGRpbmdzCjMuIGNsYXNzaWZpZXMgd2l0aCBhIGRlbnNlIGxheWVyCgpZb3UgY2FuIHVzZSBlYXJseSBzdG9wcGluZyBpZiB5b3UnZCBsaWtlIGJ1dCBmb3IgdGhlIGZpcnN0IG1vZGVsOgoKKiB1c2UgdGhlIGRlZmF1bHQgbGVhcm5pbmcgcmF0ZQoqIDIwIGVwb2NocyBpcyBtb3JlIHRoYW4gZW5vdWdoCiogdXNlIGEgYmF0Y2ggc2l6ZSBvZiAzMgoqIHVzZSBhIHZhbGlkYXRpb24gc3BsaXQgb2YgMzAlCgpgYGB7cn0KbW9kZWxfYmFzaWMgPC0ga2VyYXNfbW9kZWxfc2VxdWVudGlhbCgpICU+JQogIGxheWVyX2VtYmVkZGluZygKICAgIGlucHV0X2RpbSA9IF9fX19fLCAgICAgICMgbnVtYmVyIG9mIHdvcmRzIHdlIGFyZSBjb25zaWRlcmluZwogICAgaW5wdXRfbGVuZ3RoID0gX19fX18sICAgIyBsZW5ndGggdGhhdCB3ZSBoYXZlIHNldCBlYWNoIHJldmlldyB0bwogICAgb3V0cHV0X2RpbSA9IF9fX19fICAgICAgIyBsZW5ndGggb2Ygb3VyIHdvcmQgZW1iZWRkaW5ncwogICAgKSAlPiUgIAogIGxheWVyX19fX19fKCkgJT4lCiAgbGF5ZXJfZGVuc2UodW5pdHMgPSBfX19fXywgYWN0aXZhdGlvbiA9IF9fX19fKQogIAptb2RlbF9iYXNpYyAlPiUgY29tcGlsZSgKICBvcHRpbWl6ZXIgPSBfX19fXywKICBsb3NzID0gX19fX18sCiAgbWV0cmljcyA9ICJhY2N1cmFjeSIKKQoKaGlzdG9yeV9iYXNpYyA8LSBtb2RlbF9iYXNpYyAlPiUgCiAgZml0KAogICAgeF90cmFpbiwgeV90cmFpbiwKICAgIGVwb2NocyA9IF9fX19fLAogICAgYmF0Y2hfc2l6ZSA9IF9fX19fLAogICAgdmFsaWRhdGlvbl9zcGxpdCA9IF9fX19fCiAgICApCmBgYAoKUnVuIHRoZSBmb2xsb3dpbmcgY29kZSB0byBjaGVjayBvdXQgeW91ciBvcHRpbWFsIGxvc3MgYW5kIGNvcnJlc3BvbmRpbmcgYWNjdXJhY3kuCgpgYGB7cn0KYmVzdF9lcG9jaCA8LSB3aGljaC5taW4oaGlzdG9yeV9iYXNpYyRtZXRyaWNzJHZhbF9sb3NzKQpiZXN0X2xvc3MgPC0gaGlzdG9yeV9iYXNpYyRtZXRyaWNzJHZhbF9sb3NzW2Jlc3RfZXBvY2hdICU+JSByb3VuZCgzKQpiZXN0X2FjYyA8LSBoaXN0b3J5X2Jhc2ljJG1ldHJpY3MkdmFsX2FjY3VyYWN5W2Jlc3RfZXBvY2hdICU+JSByb3VuZCgzKQoKZ2x1ZSgiT3VyIG9wdGltYWwgbG9zcyBpcyB7YmVzdF9sb3NzfSB3aXRoIGFuIGFjY3VyYWN5IG9mIHtiZXN0X2FjY30iKQpgYGAKCiMjIFdvcmQgZW1iZWRkaW5nICsgTFNUTSBtb2RlbAoKX19TdGVwIDlfXzogTm93IGxldCdzIGJ1aWxkIG9uIHRvIHRoZSBwcmV2aW91cyBtb2RlbCBieSBhZGRpbmcgYW4gTFNUTSBsYXllcgphZnRlciB0aGUgYGxheWVyX2VtYmVkZGluZ2AgbGF5ZXIuIFdoZW4gZmVlZGluZyBhbiBlbWJlZGRpbmcgbGF5ZXIgaW50byBhbiBMU1RNCmxheWVyIHlvdSBfX2RvIG5vdF9fIG5lZWQgdG8gZmxhdHRlbiB0aGUgbGF5ZXIuIFJlZmVyZW5jZSB0aGUgW0ludHJvIHRvIExTVE1zIG5vdGVib29rXShodHRwOi8vYml0Lmx5L2RsLWxzdG0taW50cm8jdHJhaW4tYW4tbHN0bSkuIEZvciB0aGlzIGZpcnN0IExTVE0gbW9kZWwKdXNlIGB1bml0cyA9IDMyYC4KCmBgYHtyfQptb2RlbF9sc3RtIDwtIGtlcmFzX21vZGVsX3NlcXVlbnRpYWwoKSAlPiUKICBsYXllcl9lbWJlZGRpbmcoCiAgICBpbnB1dF9kaW0gPSBfX19fXywKICAgIGlucHV0X2xlbmd0aCA9IF9fX19fLAogICAgb3V0cHV0X2RpbSA9IF9fX19fCiAgICApICU+JSAgCiAgbGF5ZXJfX19fX18odW5pdHMgPSBfX19fXykgJT4lCiAgbGF5ZXJfZGVuc2UodW5pdHMgPSBfX19fXywgYWN0aXZhdGlvbiA9IF9fX19fKSAKCm1vZGVsX2xzdG0gJT4lIGNvbXBpbGUoCiAgb3B0aW1pemVyID0gX19fX18sCiAgbG9zcyA9IF9fX19fLAogIG1ldHJpY3MgPSAiYWNjdXJhY3kiCikKCmhpc3RvcnlfbHN0bSA8LSBtb2RlbF9sc3RtICU+JSBmaXQoCiAgeF90cmFpbiwgeV90cmFpbiwKICBlcG9jaHMgPSBfX19fXywKICBiYXRjaF9zaXplID0gX19fX18sCiAgdmFsaWRhdGlvbl9zcGxpdCA9IF9fX19fCikKYGBgCgpSdW4gdGhlIGZvbGxvd2luZyBjb2RlIHRvIGNoZWNrIG91dCB5b3VyIG9wdGltYWwgbG9zcyBhbmQgY29ycmVzcG9uZGluZyBhY2N1cmFjeS4KCjEuIEhvdyBkb2VzIGl0IGNvbXBhcmUgdG8gdGhlIHdvcmQgZW1iZWRkaW5nIG9ubHkgbW9kZWw/CjIuIFdoeSBkbyB5b3UgdGhpbmsgdGhlcmUgaXMgYSBkaWZmZXJlbmNlPwoKYGBge3J9CmJlc3RfZXBvY2ggPC0gd2hpY2gubWluKGhpc3RvcnlfbHN0bSRtZXRyaWNzJHZhbF9sb3NzKQpiZXN0X2xvc3MgPC0gaGlzdG9yeV9sc3RtJG1ldHJpY3MkdmFsX2xvc3NbYmVzdF9lcG9jaF0gJT4lIHJvdW5kKDMpCmJlc3RfYWNjIDwtIGhpc3RvcnlfbHN0bSRtZXRyaWNzJHZhbF9hY2N1cmFjeVtiZXN0X2Vwb2NoXSAlPiUgcm91bmQoMykKCmdsdWUoIk91ciBvcHRpbWFsIGxvc3MgaXMge2Jlc3RfbG9zc30gd2l0aCBhbiBhY2N1cmFjeSBvZiB7YmVzdF9hY2N9IikKYGBgCgojIyBTZWFyY2ggZm9yIGEgYmV0dGVyIG1vZGVsCgpfX1N0ZXAgMTBfXzogU3BlbmQgdGhlIHJlc3Qgb2YgdGhlIHRpbWUgdHVuaW5nIGh5cGVycGFyYW1ldGVycyBhbmQgc2VlIGlmIHlvdQpjYW4gZmluZCBhIGJldHRlciBtb2RlbC4gVGhpbmdzIHlvdSBjYW4gdHJ5OgoKKiBQcmVwcm9jZXNzaW5nIGh5cGVycGFyYW1ldGVycwogICAtIGFkanVzdCB0aGUgbnVtYmVyIG9mIHdvcmRzIHRvIHJldGFpbiBpbiB0aGUgd29yZCBpbmRleCAoYHRvcF9uX3dvcmRzYCkKICAgLSBhZGp1c3QgdGhlIHNpemUgb2YgdGhlIHNlcXVlbmNlcyAoYG1heF9sZW5gKQoqIFdvcmQgZW1iZWRkaW5nIGxheWVyCiAgIC0gYWRqdXN0IHRoZSBgb3V0cHV0X2RpbWAKKiBMU1RNIGxheWVyCiAgIC0gYWRqdXN0IHRoZSBudW1iZXIgb2YgYHVuaXRzYAogICAtIGFkZCBkcm9wb3V0IChyZWYgaHR0cDovL2JpdC5seS9kbC1sc3RtLWludHJvI3lvdXItdHVybi01bWluLTEpCiAgIC0gbWF5YmUgZXZlbiBhZGQgYSAybmQgTFNUTSBsYXllcgoqIE90aGVyCiAgIC0gYWRqdXN0IHRoZSBsZWFybmluZyByYXRlIChvciBldmVuIHRoZSBvcHRpbWl6ZXIgKGkuZS4gdHJ5ICJhZGFtIikpCiAgIC0gYWRqdXN0IHRoZSBgYmF0Y2hfc2l6ZWAKICAgLSBhZGQgYSBjYWxsYmFjayB0byBhZGp1c3QgdGhlIGxlYXJuaW5nIHVwb24gcGxhdGVhdWluZwoKW/Cfj6BdKGh0dHBzOi8vZ2l0aHViLmNvbS9yc3R1ZGlvLWNvbmYtMjAyMC9kbC1rZXJhcy10Zik=