Deep Unsupervised Learning
Lecture notes from Berkeley-cs294 https://sites.google.com/view/berkeley-cs294-158-sp19/home
Lec 1
Category
- Generative Model
- Non-generative representation learning
Motivation for Unsupervised Learning
Application
- Generate/predict fancy samples
- Detect Anomaly / deviation from distribution
- Which human can do quite well without training
- Data Compression (because of predictability)
- Use the inner representation to do other tasks!
- Pre-training
Type of Question
Core Question: Modeling a Distribution
- Density estimation: Given $x$ Be able to speak out $p(x)$
- Sampling: Generate new $x$ according to $p(x)$
Where classic statistics fail
Naïve method of density estimation: Histogram
Lesson from failure of Histogram
- In high dimension, histogram model means model = input dataset
- Curse of dimensionality
- If each density function $p(x)$ is a independent variable at each point, then parameter dim is too high!
- If each data point only contributes to the estimation of one parameter, we will never have enough data points!
- So we need to model distribution as a parametrized function $p(x;\theta)$ $\theta$ will be lower dimen than the continuous space
- Data should be reused as much as possible to estimate $\theta$