Deep Learning in one year

Have your buzzword and eat it, too

It’s my [birthday], and I’ll [learn] if I want to.

Today is my birthday, and I would like to begin my next ordinal revolution around the sun just right by setting a very ambitious goal for myself:

One year from today - i.e., March 7th, 2016 - I would like to have a working understanding of how to design, train, and implement deep learning systems.

What do I mean by “deep learning”?

Let’s take a (selected) definition from the main Wiki page on deep learning:

Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks.

Using the power of the interwebs, my laptop, and perhaps some EC2 instances (if necessary), I would like to blend the theoretical with the practical, and apply different varieties of deep learning architectures to different varieties of problems over the course of the next year. Luckily, there is a lot of literature out there on applying deep convolutional neural networks to computer vision problems such as AlexNet and ImageNet (I will delve into some of these problems and architectures in subsequent posts). Given my passion for music and science, I’d like to branch out a bit and apply these methods more generally. Music genre classification would be fantastic!

Doing unsupervised learning with deep neural nets would also be quite fascinating. I really like the notion that it’s possible to learn classification/regression problems without labeled data, because:

Unsupervised methods are scalable (i.e., they don’t require hand-coded labels by humans).
Unsupervised methods are less prone to human mislabeling/misclassification.
Unsupervised methods can tell us things we don’t know about the structure of the data, the representations of the data, or the underlying phenomenae of the data. QuocNet is a very well-known example of this.

Last, I’m not just interested in the theoretical underpinnings, though anyone who knows me well knows that I tend to gravitate to wherever there’s more mathematics: I am interested in the engineering! I am fascinated by the use of GPUs for doing deep learning. Look forward to some posts about tensors and embarassingly parallel computational methods :-). E.g., here is the canonical form in which the Ricci flow is written:

\[\partial_t g_{ij}=-2 R_{ij}\]

Wherever possible, I will try to cite my sources and include both math and code, whether inline or as links to math or code living elsewhere (e.g., in a gitHub repo).

1
2
3
4
5
6
7
8
## Example from sci-kit learn: http://bit.ly/1GhbMbd
import numpy as np
from sklearn.neural_network import BernoulliRBM
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
model = BernoulliRBM(n_components=2)
model.fit(X)
BernoulliRBM(batch_size=10, learning_rate=0.1, n_components=2, n_iter=10,
       random_state=None, verbose=0)