Deep Learning ver3 Lesson 4

Vikas Jha
4 min readNov 15, 2018

Refer notes: Lesson 1, Lesson 2, Lesson 3

This article is compiled based on my notes on Lesson 4 of Deep Learning for Coders ver3.

· Caution: Update the fastai library and course material before running any code.

Jeremy started with IMDB Reviews Dataset analysis.

IMDB Reviews Dataset

Continued from lesson3

· During Numericalization, every word requires a row in weight matrix. However, this would lead matrix to become too large and sparse. Hence, we limit he matrix to 60000 rows. Anything which does not appear more than 2 times is put in unknown token.

Use pretrained model. Image courtesy: Jeremy Howard & Rachel Thomas

· The model is based on transfer learning from WikiText103. Instead of starting from scratch with random weight, start with pretrained weight based on training on a big dataset(a cleaned subset of wikipedia called wikitext-103).
· The model is further fine-tuned according to the imdb dataset as the English language used on IMDB reviews are not same as those on wikipedia. The model is pretrained to guess what the next word is, with input being all the previous word. This has a recurrent structure (RNN: Recurrent Neural Network).

· Data object inputs are:
— all the text files in path
— split ratio to create validation dataset.
— model label

· TextDataBunch object: For the language model, the TextDataBunch ignores the labels are shuffles the texts each epoch and sends batches that read text in order with targets (next word in sentence).

· Language Model Learner inputs:
— TextDataBunch object
— Pretrained model
— Dropout: To avoid overfitting.
· Further steps: Learning rate estimation, fitting with ‘fit_one_cycle’ (refer previous lesson notes for details), unfreezing and refitting.
· The accuracy obtained is around 30% i.e. 1 out of 3 next words is correctly guessed.

· Next, a new classification data object is created, which keeps the labels (unlike previous language model). Fitting steps are same as previous model. During the second run, all but last 2 layers are frozen. Similarly, during the third run, all but last 3 layers are frozen up.
The accuracy obtained in the classification model is more than 94% with 3 epochs.

Tabular Data

· Deep learning is currently not much used for tabular data analysis. However, tabular data is analysis is at root for most of the use cases.

Tabular data use cases. Image courtesy: Jeremy Howard & Rachel Thomas

· Using Deep learning reduces requirement of feature engineering (does not remove completely). As it more generalizable, maintenance requirement is less.
· Pandas is used to import tabular data. Pandas can read and import data from most of the sources.

· Tabular module from fastai library is used to analyse. tabular_data_from_df converts DataFrame to a DataBunch for modelling using Datablock API.
· Continuous variables can be used as it is post pre-processing (called so as it is done ahead of time, once unlike Transformation for images). It involves filling missing values, categorifying, and normalization. Missing data is replaced with median. Training data is the template for missing imputation, and same process is repeated with Test data.
· For categorical variables, embeddings are used.
· 84% accuracy is obtained after 1 epoch.

Collaborative Filtering

· Collaborative filtering is used in recommendation systems. It works on preferences of several users.

· There are two ways to represent data for the purpose.

Two ways to represent data for Collaborative filtering

· Second way of representing data leads to a big sparse matrix. Data is not storedlike this usually.
· Data used in this case is Movielens dataset.
· Collaborative learner from fastai is used.
· Cold start problem: if there is a new user or a new movie, there is no rating available from/for respectively. In such case, collaborative filtering cannot work. Metadata driven model from cold start can be one of the solution.

Embeddings

· As mentioned above, if the second way of representing data for collaborative filtering is used, the resulting matrix is very big and sparse. Processing this huge matrix is computationally challenging. Further, it is tough to represent meaningful relationships between vectors (rating for a movie, or by a user).

· As solution to above sparsity, embeddings are employed. The convert sparse vectors lying in N dimensional hyperspace to a lower dimensional denser space.

Embeddings example (courtesy: Google ML crash course)

· In the lecture, Jeremy shows an example of movies vs users, where, each sparse vector for a user, which is N1x1 (N1 being the number of movies) vector is condensed into 4x1 vector (without bias), and 5x1 vector (with bias). Similarly, ratings for movies by each user, which is a N2x1 (N2 being the number of users) is condensed into 4x1 vector (without bias), and 5x1 vector (with bias). Taking dot product of embeddings for a user and a movie gives us the rating for the move by that user.

· Using the actual rating available, the randomly initiated embeddings can be solved iteratively to the optimal value, which reduces the mean sum of squared errors (MSE) in excel.

--

--