48 items tagged
Analytical Theory of Spectral Bias in Diffusion Sampling and Learning
TOC {:toc} Motivation Consider a distribution $p(x)$, we could “convolve” it with a kernel $p(\tilde{x}\mid x)=q(\tilde{x}-x)$. The marginal distribution of $\tilde{x}$ is denoted as $p_\sigma(\tilde{x})$. We want to model the score of this convolved distribution and that of the original distribution $\nabla\log p_\sigma(\tilde{x})$ .
TOC {:toc} Motivation Recently, a line of research emerged in generative image models, diffusion models, which showed a competitive performance with GAN [^1]. More recently, a larger scale version of it gave rise to the ground breaking model DALL-E 2 and its precursor GLIDE.
TOC {:toc} Motivation Simply put, “kernel trick” is the finding that sometimes only inner product appears in the formulation of some algorithms. because of this, we could substitute the inner product with some fancier kernel function, i.e. inner product in some other spaces. This post is about another usage of kernel trick. Another usage is Kernel (ridge) Regression.
TOC {:toc} Motivation Understand the use of kernel in regression problems. For usage in unsupervised learning / dimension reduction, see notes on Kernel PCA. Kernel in Classification Kernel is usually introduced in SVM classification problems. The rationale is that a linearly non-separable dataset could be separable in a high-dimensional feature space using the mapping $\phi:\mathcal X\to\mathcal F$ .
Motivations Many CNN models have become the bread and butter in modern deep learning pipeline. Here I’m summarizing some famous CNN structure and their key innovations as I use them.
Note on Compiling Torch C Extensions Motivation Sometimes fusing operations in C library without using python can accelerate your model, especially for key operations that occurs a lot and lots of data pass through.
TOC {:toc} Philosophy The spirit of Variational Inference is to solve Bayesian inference problem with optimization. In the scenario of latent factor It’s not trying to use Bayes rule directly, but to fit this distribution within a class of distributions $q(z;\nu)$, by minimizing the KL-divergence between the 2 models.
Environment Bug https://github.com/rosinality/stylegan2-pytorch/issues/70 Compiler not found bug We need to change compiler_bindir_search_path in ./stylegan2/dnnlib/tflib/custom_ops.pyNeed to be changed to have the C compiler on the machine. Note Visual Studio 2019 is not supported so have to use 2017!
Motivation This is a simple example. https://github.com/ProGamerGov/pytorch-old-tensorflow-models if pretrained: self.load_state_dict(torch.hub.load_state_dict_from_url(model_urls['inceptionv1'], progress=progress)) The official blog about how to use this is here. Hosting Weights The major challenge is to publish weight online. For that you need a public file hosting service, which Google Drive and OneDrive can do.
Note on MiniMax (Updating) Motivation This is a very traditional way of solving turn based game like chess or tic-tac-toc. It’s climax is Deep Blue AI in playing chess. Note, some people think about GAN training procedure as a min-max game between G and D, which is also interesting.
Motivation This is one step forward from Data transport between python and matlab, since sometimes you not only want to transport data, but want to share some code in python or matlab. How can we do so?
Reinforcement Learning deals with environment and rewards. Agents have a set of actions to interact with environment (state $s_i$), and the environment will be changed by these actions $a_j$, from time to time, there will be reward coming from environment!
Note on Photometric Reasoning Shape $\hat n$, lighting $l$, reflectance $\rho$ affect image appearance $I$. Can we infer them back? $$ I=\rho<\hat n,l> $$ How much does shading and photometric effects tell us about shape, in natural settings.
Note on GAN Note with reference to the Youtube lecture series Hongyi Li. Architecture Developments Self Attention Used in Self-Attention GAN and BigGAN class Self_Attn(nn.Module): """ Self attention Layer""" def __init__(self,in_dim,activation): super(Self_Attn,self).__init__() self.chanel_in = in_dim self.activation = activation self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1) self.gamma = nn.Parameter(torch.zeros(1)) self.softmax = nn.Softmax(dim=-1) # def forward(self,x): """ inputs : x : input feature maps( B X C X W X H) returns : out : self attention value + input feature attention: B X N X N (N is Width*Height) """ m_batchsize,C,width ,height = x.size() proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N) proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H) energy = torch.bmm(proj_query,proj_key) # transpose check attention = self.softmax(energy) # BX (N) X (N) proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N out = torch.bmm(proj_value,attention.permute(0,2,1) ) out = out.view(m_batchsize,C,width,height) out = self.gamma*out + x return out,attention Style GAN BigGAN Conditional GAN Text Conditioning Text is processed and combined with noise vector.
Note on Hardware Based Computational Photography Now we have far more computational power than before! Besides, many images will go through complex algorithms as postprocessing. But we can also optimize camera measurement, so that results look even better.
Computational Photography TOC {:toc} Basically, enhance image by computation! Intersection of 3 fields Optics Vision Graphics Majorly two kinds of work Co-design camera and image processing (optics + vision) Use Vision to help Graphics to help generate better image faster! CG2REAL CG rendering is very computational intensive!
Stereo Basic Stereo algorithm can be formulated as Markov Random Field. Thus Methods in MRF inference could all be used. Prior Planar Prior Natural scene is usually piece-wise! How to impose this idea to depth map?
Semantics Vision Task Note semantics and geometric reasoning is conceptually similar to each other Stereo and Optical flow is about finding correspondence / matches. Object recognition in some sense is finding correspondence w.r.t. a template, and make the template match the observation. Semantic Vision before CNN So ancient semantic detector works like this
TOC {:toc} Continuing Image Prior and Generative Model . Probabilistic Graphic Model comes into scene, when we want to model and deal with some complex distribution over many variables. When we start to add structure into the model, not everything depend on everything, then the dependency relationship among variables emerges as a graph structure.
Note on Advanced Computer Vision This is the course note for Advanced Computer Vision Class (CS 659a) These are links to notes for individual modules and specific domain notes. Basic Computer Vision
Image Prior: Modeling Spatial Relaionship Materials: https://www.cse.wustl.edu/~ayan/courses/cse659a/lec1.html#/ TOC {:toc} This is the basis for most further applications We need Regularizer for a spatial configuration $$\hat X=\arg\min_X \phi(X)+R(X)\\$$This could be interpreted in a Bayesian way,
TOC {:toc} Deep Learning Environment Currently we find that multiple version of CUDA could be installed on windows. And different frameworks could use different CUDA version nicely together. PyTorch Tensorflow Co-environment Currently, we can have
Objective Here I want to compare several common deep learning frameworks and make sense of their workflow. Core Logic Tensorflow General Comments: TF is more like a library, in which many low-level operations are defined and programs are long. In contrast, Keras which can use tensorflow as backend has the similar level of abstraction as PyTorch, which is a higher level deep learning package. TFLearn may also be a higher level wrapper.
TOC {:toc} Note on Online Regression Algorithm Least Square Problem Classical least square linear regression is $$ \hat \beta_{ls}=\arg\min_\beta\|y-X\beta\|^2_2 $$ With regularizations it becomes a ridge or lasso regression problem
Motivation Sometimes we want to examine the Hessian or Jacobian of a function w.r.t some variables. For that purpose, autogradient algorithm can help us. Autograd mechanism In Essence, Autograd requires a computational graph. (Directed Acyclic Graph) For each computational node (e.g. $z=f(x,y)$), we define a forward computation $(x,y)\mapsto z,\ z=f(x,y)$ mapping bottom to top, and a backward computation mapping the partial derivative to top to the partial derivative to bottom. $\partial_z\mapsto (\partial_x,\partial_y); (gx,gy)=g(gz;x,y)$ .
TOC {:toc} L-BFGS algorithm Motivation L-BFGS is one of the not so simple optimization algorithm that we may encounter in large scale optimization problems. Not so simple means it’s not simply a first order algorithm, and the deviation from that is well motivated by theoretical arguments. So this note target to understand this algorithm
Note on Local Feature Descriptors Before the advent of convolutional neural network, many techniques to represent and detect local features has been invented. As lower level feature detector, many of them are strongly mathematically motivated. Some are still used in some Computer Vision tasks as preprocessing step.
Note on Patch Based Shape Interpretation These are 2 related papers both employ a patch based approach to tackle shape from shading problem. Typically patches have simpler appearance, thus they are easier to collect the statistics on or fit a model on. The spirit is to find a local explanation for patches in an image. However, as there will be ambiguity in local patches, the algorithm should not over-commiting to any one of the explanation and keep the distribution of possible shapes. And then take these local shape proposals and see which can stitch together and make sense globally.
Installation Official note on installation https://caffe.berkeleyvision.org/installation.html Installing CPU version on CHPC Install Miniconda Install caffe using condaconda install -c intel caffe lsb_release -d Description: CentOS release 6.10 (Final) Building GPU version on CHPC (Not succeeded yet…. aborted)
Note on CNN Interpretability 2 major way of interpreting CNN Feature visualization: See what a hidden neuron is interested in Attribution: See what part of image activate a filter or detector Activation Atlas These works try to find a tool kits for visualizing DeepNN and building up a human-computer interface of DeepNN.
Note on Feature Visualization Motivation We want to understand what the hidden units “represent” What are they tuned to? What’s the favorite stimuli? Why should we find the most excitable stimuli? Resources DeepDream.ipynb Tensorflow
Deep Unsupervised Learning Lecture notes from Berkeley-cs294 https://sites.google.com/view/berkeley-cs294-158-sp19/home Lec 1 Category Generative Model Non-generative representation learning Motivation for Unsupervised Learning Application Generate/predict fancy samples Detect Anomaly / deviation from distribution Which human can do quite well without training Data Compression (because of predictability) Use the inner representation to do other tasks! Pre-training Type of Question Core Question: Modeling a Distribution
Using Google Cloud Service for Large Scale Image Labelling Installing Google SDK https://cloud.google.com/sdk/docs/quickstart-windows New a Google Cloud Platform Project Download Google Cloud SDK After installation run gcloud init and log in to your account there! Select the GCP Project and the computing zone Finish the SDK configuration! Installing Google API for different programs (like Vision we use) https://cloud.google.com/python/
Note on Computer Vision Lecture Notes from CS559. TOC {:toc} Lec01: Image Formation In principle, digital images are formed by measuring energy (counting photons) over an array. But several pre-processing steps makes it interesting and relevant to processing.
TOC {:toc} Objective Build the software environment for Scientific Computing Data Analysis and Deep Learning for a GPU enabled Linux work station. This post majorly summarizes the tools and references for building up a Linux Working Environment. I’ll update the errors and trouble shooting notes as I encounter them.
DeepLabCut Trouble Shooting @(Ponce Lab) TOC {:toc} Install DLC Windows machine, follow the steps in install tutorial to establish the whole conda environment in the machine. Fail at first step Many of us just fail at first step, some error message like
Motivation Although there are a millennium of methods for neural and behavioral signal recording, the questions asked about the neural data is ususally less diverse. Ultimately, everything is number and we process numbers with algorithm.
TOC {:toc} Problem Setting The original problem of non-negative matrix factorization is simple, if the dissimarity $D(A\|HW)$ between original matrix and reconstructed one is L2 distance than, $$ argmin_{H,W} \|A-HW\|_F^2, \\ s.t.\ W\succeq0, H\succeq0 $$The non-negative constraint applies element-wise.
TOC {:toc} Constrained CMA-ES Algorithm Target CMA-ES is originally used in unconstrained optimization. To adapt it into constrained optimization and we have to handle the boundary in some way. So how could it handle this geometric boundary?
TOC {:toc} 最近在阅读1,是以为记。 Objective of Algorithm 目标 Belief Propagation算法想解决的是Markov随机场,Bayes网络等图模型的边缘概率估计,以及求解最可能的状态的问题。 有许多名字称呼这一General的算法,如sum-product, max-product, min-sum, Message Passing等,属于更general的Message Passing算法范畴。 同时这一算法可以说是一种通用框架或者philosophy,因此在不同结构的模型中有许多著名的特例,这些具体算法也有各自的名字(如前向后向算法,Kalman Filter等等) 对于统计学习问题,通常会区分模型与算法,模型设定一些假设,抽象现实的某个方面,建立问题的结构;而算法求解问题(很多时候是转化为优化问题来求解)。在这个post中将要介绍的Belief Propagation算法,属于后者,但为了理解他,我们首先需要理解他对应的模型,即概率图模型。 Graphical Models: What relates graph to probability? 第一次接触概率图模型的人(像我)都会问,概率和图这两者有什么关系呢? 我们知道图是一种直观的表征事物之间二元关系的方法通常由$(\mathcal V, \mathcal E)$定点和边组成。在概率图模型中,顶点通常代表随机变量,而边代表随机变量之间的关系。
Note on Automatic 3D Instance Segmentation Pipeline In this note I try to summarize several recent works on Automatic 3D Instance Segmentation, with most direct application to saturated reconstruction of neural morphology in an imaging volume (mostly scanning Electral Microscopy, but seems it can be generalized into other imaging modality), which is one of the most important method of high-throughput connectomics1.
How to automatically analyze behavior video? DeepLabCut is a powerful tool to rapidly1 train a neural network (based on ResNet) to track keypoints on movement videos, esp. those of moving human or animals. Thus this is a game changing tool for all kind of behavior quantification for neuroscience and psychology researchers (can be applied to nearly any behavioral science topic, e.g. motor learning, motor control, facial expression, social interaction…). The workflow is relatively simple and it scarcely takes time after the network have been trained, and the video analysis can be done automatically. Because of this it’s really favorable to the reserchers doing long term ecological video recording.
TOC {:toc} Task discription To find/generate the stimuli that evoked the strongest response of a neuron in a visual system is in essense an optimization problem. But the optimization task on hand has several unique features that are essential to the choice of optimization algorithms, for example,
题记 一直计划着博士期间定期写一点note把自己最近学到的有趣的, 美妙的东西记下来. 如果读得是一个数学物理的博士, 或者理论神经科学的博士, 那这种Note就像是在数学世界中的探险笔记, 可以叫This week’s finding, 大约就是这周看了什么书, 学会了什么数学, 玩儿了什么Model, 发现了什么trick或者math game,做了什么优美的图. (可以参见之前挖出来好些有意思东西的一个站点 This Week’s Finds in Mathematical Physics UCR一个数学物理教授坚持了十多年的每周数学笔记) 不过现实中, 我读的是Neuroscience的博士, 大概只能写learning写不了什么finding了. (而对于Brain一周时间也学不了什么新东西…) 因此, Note的内容就会更庞杂: 一部分是技术性的, 新学会的数学、统计方法、机器学习方法, 也许会有新的实验技术以及相关的物理原理; 另一部分是理念类的,也许有最近听seminar听到的神经或者心理的实验结果,也可能是相关的有趣的哲学讨论。我想我会逐渐发现哪些内容更适合分享, 以及哪些内容写下来对自己以及对读者更有帮助, 经过一段时间的磨合,这个post series应该能形成自己的风格。