Table of Contents

Lab 4

1) Neural Net training is a leaky abstraction (requires cognitive load)

based on: Andrej Karpathy blog

follow the Guide to practice on your own

It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. It’s common see things like:

>>> your_data = # plug your awesome dataset here
>>> model = SuperCrossValidator(SuperDuper.fit, your_data, ResNet50, SGDOptimizer)
# conquer world here

That’s cool! A courageous developer has taken the burden of understanding Fitting the model, Optimization, Validation, Model itself and so on from you and largely hidden the complexity behind a few lines of code. Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier. Backprop + SGD does not magically make your network work. Batch norm does not magically make it converge faster. And just because you can formulate your problem as Reinforcement Learning doesn’t mean you should. If you insist on using the technology without understanding how it works you are likely to fail. You should not do it anyway, since it takes away growth, fun and mainly the sanity, which is rightfully yours! That bring us to …

2) Neural net training fails silently

When you break or misconfigure code you will often get some kind of an exception. You plugged in an integer where something expected a string. The function only expected 3 arguments. This import failed. That key does not exist. The number of elements in the two lists isn’t equal. In addition, it’s often possible to create unit tests for a certain functionality.

This is just a start when it comes to training neural nets. Everything could be correct syntactically, but the whole thing isn’t arranged properly, and it’s really hard to tell. The “possible error surface” is large, logical (as opposed to syntactic), and very tricky to unit test. For example, perhaps you forgot to flip your labels when you left-right flipped the image during data augmentation. Your net can still (shockingly) work pretty well because your network can internally learn to detect flipped images and then it left-right flips its predictions. Your misconfigured neural net will throw exceptions only if you’re lucky; Most of the time it will train but silently work a bit worse.

As a result, (and this is really difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering. Now, suffering is a perfectly natural part of getting a neural network to work well, but it can be mitigated by being thorough, defensive, paranoid, and obsessed with visualizations of basically every possible thing. The qualities that in my experience correlate most strongly to success in deep learning are patience and attention to detail.


Approach


In light of the above two facts, Karpathy has developed a specific process for that he follows when applying a neural net to a new problem, which will helps us to ease our learning curve. In particular, it builds from simple to complex and at every step of the way we make concrete hypotheses about what will happen and then either validate them with an experiment or investigate until we find some issue. What we try to prevent very hard is the introduction of a lot of “unverified” complexity at once, which is bound to introduce bugs/misconfigurations that will take forever to find. If writing your neural net code was like training one, you’d want to use a very small learning rate and guess and then evaluate the full test set after every iteration.

Classification Pipeline

Goal is to set up the end-to-end skeleton for baseline MNIST dataset and Linear classificator.

Become one with the Data - Dataloader

Use Data script to examine the data. For this exercise, we will use MNIST dataset of handwritten digits. It is a simplest data for learning the vision classification neural network.

 Mnist Dataset

  1. Which Features helps you to recognize the number?
  2. Does this Feature always applies?
  3. Will it work for other image?
  4. For which harder cases can we use learned knowledge from MNIST dataset?

Learning the Model

Download the Training Script for classification of hand-written digits. Program the linear classifier (one-layered shallow network) to predict the number on the image, learning pipeline and compute the Accuracy of the model based on labeled validation and testing dataset.

  1. How do you get the weights?
  2. What is the output format of the model?
  3. How do you optimize model weights to perform better?
  4. What is the suitability of linear network for the hand-written digits?