Deep Learning ver3 Lesson 2

4 min readOct 31, 2018

Refer notes: Lesson 1

This article is compiled based on my notes on Lesson 2 of Deep Learning for Coders ver3.

· Caution: Update the fastai library and course material before running any code.

Creative applications of CNN

— Cleaning up whatsapp folder
— Deep CNN & Data Augmentation for Environment Sound Classification (Converting sound to images and subsequent processing).
— Suvash Thapaliya: Identifying Devnagari script using deep learning.
— The Mystery of the Origin — Cancer Type Classification using Fast.AI Library.
— New vs Old bus in Panama classifier.
— Identifying cities’ location based on satellite images

· Jeremy stressed again on getting hands on experience. This article by James Dellinger is inspirational: https://medium.com/@jamesdell/if-i-can-you-can-and-you-should-a470d7aea89d. Learning ‘Deep learning’ should be taken as learning soccer/football. One does not attempt understand initially gravity, friction, airflow, but starts kicking ball.

Creating new datasets based on Google image search

— Go to google image search: Search what you want images of.
— Press Ctrl+shift+J and Paste the javascript command for getting all the images url in the form of text file.
— Upload the text file in the server. Make new folder for downloading images. Use ‘download_images()’ to download the images in the folder.
— Repeat the steps for any other keyword.

Model

· The model being attempted tries to classify Teddy, Black and Grizzly bears. So, 3 folders with respective images have been made.

· Some images are corrupt/irrelevant/cartoonish: Use verify_images() to remove them.

· Now, ‘ImageDataBunch’ is prepared:
— There is separate validation set. Hence, 20% of the images are reserved for the same, and not used in training.
— Normalization of the images is done(refer lesson 1 notes for details).

· Training model:
— Model: Resnet34, metric: error rate, 5 epochs.
— Model accuracy is around 98%.

· Unfreezing the layers:
— It allows for modification of weights of all layers. However, the choice of learning rate becomes crucial here as lower layers need very fine tuning compared to upper layers, where higher learning rates can be used. Using same learning rate can lead to loss of accuracy.
— Learning rate finder is used to reach the appropriate learning rate. ‘Look out for the strongest downward slope which sticks around for a while.’ If there are multiple such slopes, consider all and see which works best.
— Learning rates in unfrozen layers are provided as range of values, with lower end for lower layers and upper end for higher layers.
— With 2 epochs, error rate is around 1.4% compared to 2% earlier.
— The low accuracy can be because of model, for which more cycles will help(as noise is random here). Or it can be because of biased/garbage data, where the further training of model will be pointless.

· Model in production
— For inference/prediction, models can be run on CPU.
— Load weight > Load image for prediction > Predict.
— Can be easily deployed using web app

Things which can go wrong

— Validation error goes high: Learning rate might be high.
— Slow convergence: Validation error goes down, but convergence takes lot of time.
— Training loss > Validation loss: Underfitting. More training/epochs required.
— Validation error reduces for a while, then shoots up: Overfitting may be an issue.

Behind the scene Concepts

· Tensor vs Matrix vs Vector: Tensor is a generalized matrix. Usual parlance, is 1-D is called Vector, 2-D is called matrix and more than 2 dimensions, it is called Tensor.
· For difference between matrix and tensor, refer: https://medium.com/@quantumsteinke/whats-the-difference-between-a-matrix-and-a-tensor-4505fbdc576c.

(courtesy: Adam Geitgey (https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721).

· Greyscale images are represented by matrix which tells about the intensity of individual pixels in 2-D plane. If images are coloured, the 3rd dimension represents the colour and representation becomes a Tensor.
· In bear identification problem (referred above), the pixels are processed into 3 probability values, and subsequently passed through argmax function, which returns the index of the category with maximum probability. It can be extrapolated for more than 3 categories.

Process of Optimization

· Linear regression: Trying to fit a line in a bunch of points, the metric being the MSE (Mean Square Error).
· Y = X(dot)a
where Y is matrix of output(s), X is matrix of input(s). The intercept is taken care by including a column of 1(s) in X matrix, and a is matrix of coefficients. Refer http://matrixmultiplication.xyz for dot product.
· Solving equation for coefficients which minimizes the optimizing metric (MSE here) gives up the fitted line.
· Matplotlib plot can be animated to visualize the process of fitting curve.
· Two ways to solve: Analytically and iteratively (Gradient descent).

Iterative method

· Weight(updated) = weight — LR*grad.
· Gradient descent calculates gradient at the current coefficient/weight locations to decide on which way to move to reach the minima. Uses whole data to update the coefficients/weights.
· Stochastic Gradient descent: Updates the weights by calculating the gradient in batches, not taking whole data simultaneously. For instance, if batch size is ‘64’, only 64 images at a time in a batch are used to update the weights.
· Minibatch: random group of points are used to update weights.

Overfitting vs Underfitting

· Refer: https://qr.ae/TUhvtz.
· Overfitting can be avoided with large number of coefficients using regularization