For an 0 th order optimization. You only need a way to generate samples around you (sampling) and a way to estimate gradient on these samples. Several aspects will affect the performance of your optimizer
Basic Categories of Synaptic Learning Rule Associative vs non-associative Non associative, involves one side only. Associative, involves activity of both pre and post. Associative rules are more often used in computation. Local vs Global (homo, hetero synapse) Hetero synapse is to modify weight $w_{B\to A}$ by synapse $w_{C\to A}$. Direction: Potentiation, Depression Time scale: Short term, Long term Hebbian Learning $$ \tau_w dw/dt = u v $$Updated version is this covariance rule or extended Hebbian Learning
Note on StyleGAN 1 and 2 Adaptive Instance Normalization Actually BigGAN has some similar design with this. StyleBlock Mapping Network Mapping Network warp the input Normal distribution into a more complex manifold. Both ease sampling with a Gaussian distribution and facilitate complex distribution as input.
Motivation It’s well known that the neurons are noisy signal transmitter, but to measure how noisy they are we need some statistics to do so. Classic SNR Definition SNR is typically defined in a linear Gaussian system.
Smoothness of Function on Manifold Motivations Dirichlet Energy Dirichlet energy is defined as the integral of squared norm of functions’ gradient in a set. So it’s a functional over smooth function on the set $C^\infty (M)\to \R$.
Finding Pareto Frontier Problem statement Given $N$ points $x_i$ in $d$ dimensional space, find the set of points such that you cannot make improvement on any dimension without decreasing other dimensions. In other words, these sets of point do not dominate each other.
https://towardsdatascience.com/understanding-compositional-pattern-producing-networks-810f6bef1b88 CPPN Images or sound could be thought of as a continuous function over space, $I[x,y]$. As such this function could be modelled by a neural network! The basic idea of CPPN is simple, it’s just input $x,y$ coordinates as input to a neural network and output patterns. Thus, this idea is quite general: Regress images or voxels or sequence onto the underlying spatial / temporal grid (e.g. meshgrid).
Max Flow Min Cut Theorem https://en.wikipedia.org/wiki/Cederbaum%27s_maximum_flow_theorem https://en.wikipedia.org/wiki/Max-flow_min-cut_theorem https://en.wikipedia.org/wiki/Graph_cuts_in_computer_vision
Note on Geodesic and Curvature on Manifold
Fitting Linear Nonlinear Poisson Model Poisson Likelihood We know that the Poisson distribution reads $$ Pr(X=k\mid\lambda)=\frac{\lambda^ke^{-\lambda}}{k!}\\ \log Pr(X=k\mid\lambda)=k\log \lambda -\lambda -\log k! $$ Here we have a bunch of discrete data like spike counts $y_i$ and paired input data $x_i$. We have assumed a functional form to transform $x$ into the rate $f(x)$. Given the data, how can we write down a loss function to optimize?