30 items tagged
Motivations Many CNN models have become the bread and butter in modern deep learning pipeline. Here I’m summarizing some famous CNN structure and their key innovations as I use them.
Note on Neural Tuning and Information Given a stimuli with $D$ intrinsic dimensions, we consider how one neuron or a population of neurons is informative about this stimulus space. Specific Information (Mutual Information) Setup for specific information computation is easy given a certain response $r$ , compute the reduction of entropy of stimuli $\mathbb s$ .
Note on MiniMax (Updating) Motivation This is a very traditional way of solving turn based game like chess or tic-tac-toc. It’s climax is Deep Blue AI in playing chess. Note, some people think about GAN training procedure as a min-max game between G and D, which is also interesting.
Notes on Cortical Waves Methods
Reinforcement Learning deals with environment and rewards. Agents have a set of actions to interact with environment (state $s_i$), and the environment will be changed by these actions $a_j$, from time to time, there will be reward coming from environment!
Note on Photometric Reasoning Shape $\hat n$, lighting $l$, reflectance $\rho$ affect image appearance $I$. Can we infer them back? $$ I=\rho<\hat n,l> $$ How much does shading and photometric effects tell us about shape, in natural settings.
Note on GAN Note with reference to the Youtube lecture series Hongyi Li. Architecture Developments Self Attention Used in Self-Attention GAN and BigGAN class Self_Attn(nn.Module): """ Self attention Layer""" def __init__(self,in_dim,activation): super(Self_Attn,self).__init__() self.chanel_in = in_dim self.activation = activation self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1) self.gamma = nn.Parameter(torch.zeros(1)) self.softmax = nn.Softmax(dim=-1) # def forward(self,x): """ inputs : x : input feature maps( B X C X W X H) returns : out : self attention value + input feature attention: B X N X N (N is Width*Height) """ m_batchsize,C,width ,height = x.size() proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N) proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H) energy = torch.bmm(proj_query,proj_key) # transpose check attention = self.softmax(energy) # BX (N) X (N) proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N out = torch.bmm(proj_value,attention.permute(0,2,1) ) out = out.view(m_batchsize,C,width,height) out = self.gamma*out + x return out,attention Style GAN BigGAN Conditional GAN Text Conditioning Text is processed and combined with noise vector.
Note on Hardware Based Computational Photography Now we have far more computational power than before! Besides, many images will go through complex algorithms as postprocessing. But we can also optimize camera measurement, so that results look even better.
Computational Photography TOC {:toc} Basically, enhance image by computation! Intersection of 3 fields Optics Vision Graphics Majorly two kinds of work Co-design camera and image processing (optics + vision) Use Vision to help Graphics to help generate better image faster! CG2REAL CG rendering is very computational intensive!
Motivation This is a brief analytical note about how physical self movement of eye / camera will induce optic flow in a static environment. And then discuss how a system can separate these two components instantaneously.
Stereo Basic Stereo algorithm can be formulated as Markov Random Field. Thus Methods in MRF inference could all be used. Prior Planar Prior Natural scene is usually piece-wise! How to impose this idea to depth map?
Semantics Vision Task Note semantics and geometric reasoning is conceptually similar to each other Stereo and Optical flow is about finding correspondence / matches. Object recognition in some sense is finding correspondence w.r.t. a template, and make the template match the observation. Semantic Vision before CNN So ancient semantic detector works like this
TOC {:toc} Continuing Image Prior and Generative Model . Probabilistic Graphic Model comes into scene, when we want to model and deal with some complex distribution over many variables. When we start to add structure into the model, not everything depend on everything, then the dependency relationship among variables emerges as a graph structure.
Note on Advanced Computer Vision This is the course note for Advanced Computer Vision Class (CS 659a) These are links to notes for individual modules and specific domain notes. Basic Computer Vision
Image Prior: Modeling Spatial Relaionship Materials: https://www.cse.wustl.edu/~ayan/courses/cse659a/lec1.html#/ TOC {:toc} This is the basis for most further applications We need Regularizer for a spatial configuration $$\hat X=\arg\min_X \phi(X)+R(X)\\$$This could be interpreted in a Bayesian way,
Notes on Visual Imagery Definition: Recreate the sensory world in mind in absense of physical stimuli. Usage in daily cognition Closely related to memory. We solve some cognitive task by recreating the visual scene in mind and examine the mind picture! Some tasks are memory about spatial some are feature memory! Usage in creative work Provides another way of thinking, other than verbal and logical induction. Intuition Characteristics of Imagery Is the representation spatial or propositional?
Note on Local Feature Descriptors Before the advent of convolutional neural network, many techniques to represent and detect local features has been invented. As lower level feature detector, many of them are strongly mathematically motivated. Some are still used in some Computer Vision tasks as preprocessing step.
Note on Patch Based Shape Interpretation These are 2 related papers both employ a patch based approach to tackle shape from shading problem. Typically patches have simpler appearance, thus they are easier to collect the statistics on or fit a model on. The spirit is to find a local explanation for patches in an image. However, as there will be ambiguity in local patches, the algorithm should not over-commiting to any one of the explanation and keep the distribution of possible shapes. And then take these local shape proposals and see which can stitch together and make sense globally.
Note on Categorization and Concepts From lecture notes from Science of Behavior Configuration The relative configuration of a single elements Example: Face What defines a face? Components Essential feature Configural property Relative Invariance to many change in Stimuli
Note on CNN Interpretability 2 major way of interpreting CNN Feature visualization: See what a hidden neuron is interested in Attribution: See what part of image activate a filter or detector Activation Atlas These works try to find a tool kits for visualizing DeepNN and building up a human-computer interface of DeepNN.
Based on Goldstein Book Chapter and lecture from Jeff Beck Note on Forms of Memory Definition pin down can be very tricky! Definition Retaining, retrieving, using information after the original information (stimuli) does not present. (Inner view) Any process that some past experience has an effect on the way the subject think and behave in the future. (Outer View) Thus can generalize into even non-animated things! Memory of magnet Use of Memory Longterm Memory Human: Remember things relevant for life. (name, pw, birthday, info about others, address, knowledge) Ecological: cache for food, foraging location. Shorterm Memory Continuity of awareness Different forms Memory has many forms.
Informative Fragment Approach to Object Recognition It’s intuitive that some basic features in the image of objects are informative to the category of the object. Thus, even for occluded images, the revealed fragments can also provide such information, so that we could recognize the object from few patches.
Note on Feature Visualization Motivation We want to understand what the hidden units “represent” What are they tuned to? What’s the favorite stimuli? Why should we find the most excitable stimuli? Resources DeepDream.ipynb Tensorflow
Deep Unsupervised Learning Lecture notes from Berkeley-cs294 https://sites.google.com/view/berkeley-cs294-158-sp19/home Lec 1 Category Generative Model Non-generative representation learning Motivation for Unsupervised Learning Application Generate/predict fancy samples Detect Anomaly / deviation from distribution Which human can do quite well without training Data Compression (because of predictability) Use the inner representation to do other tasks! Pre-training Type of Question Core Question: Modeling a Distribution
Note on Animal Perception From lecture of Science of Behavior What does it feel to be a bat!? Umwelt: the sensory world of an animal, can be very different from ours. Different precision, range … Use same modality in different ways: Sound Imaging Electro-/Magneto-reception “More extreme your claim, stronger your evidence!”
Note on Computer Vision Lecture Notes from CS559. TOC {:toc} Lec01: Image Formation In principle, digital images are formed by measuring energy (counting photons) over an array. But several pre-processing steps makes it interesting and relevant to processing.
Note on Behavioral Study History of Behavioral Science Two historical trends of study combines into modern comparative cognition / behavioral science study. Comparative Psychology More in context of psychology: Athropocentric
Note on Network Commnunication TOC {:toc} General Introduction Network connects devices to transfer data / information. LAN and WAN LAN: Localized network, connected machines in the same area. WAN: Wide area, Internet is the largest WAN! The 2 types are less distinct now, they are blurred because of cellular tech and wireless network.
Note on Selective Attention From Christofer Koch 2013 Lecture Visual Attention and Consciousness Goldstein, Chapter Attention In nature language Attention refers to a family of abilities Vigilance / overall attention Selective attention: processing sth in the cost of others Distributed attention Automaticity (action / perception task that does not take capacity) Selective attention is different from general attention —- arousal.
Note on Computation by Biological Plausible Learning from lecture of Cengiz Penleven 2019 Philosophy Neural dynamics can be a substrate of computation. The neural dynamics and plasticity dynamics can both do optimization, and the biological constraint form a source of constraint on variables.