Computer Vision

28 posts in this tag

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Jan 1, 2025

Understanding Learning Dynamics of Neural Representations via Feature Visualization at Scale

Nov 1, 2023

Motivations Many CNN models have become the bread and butter in modern deep learning pipeline. Here I’m summarizing some famous CNN structure and their key innovations as I use them.

Jan 17, 2021

Compiling C Extensions for PyTorch

Note on Compiling Torch C Extensions Motivation Sometimes fusing operations in C library without using python can accelerate your model, especially for key operations that occurs a lot and lots of data pass through.

Sep 15, 2020

Debugging StyleGAN2 in PyTorch

Environment Bug https://github.com/rosinality/stylegan2-pytorch/issues/70 Compiler not found bug We need to change compiler_bindir_search_path in ./stylegan2/dnnlib/tflib/custom_ops.pyNeed to be changed to have the C compiler on the machine. Note Visual Studio 2019 is not supported so have to use 2017!

Jun 10, 2020

Note on Laplacian Operator (Diffusion) in Geometry Processing

Note on Laplacian-Beltrami (Diffusion) Operator Motivation Laplacian on graph and on discrete geometry (mesh) are very useful tools. One core intuition, just like Laplacian in $\R^n$ space, it’s related to diffusion and heat equation. Recall the diffusion equation is

May 8, 2020

Spectral Graph Theory and Segmentation

Spectral Graph Theory and Segmentation Motivation Spectral Graph Theory is a powerful tool as it sits at the center of multiple representation. Connects to Graph and manifold, and linear algrbra. It’s related to dynamics on graph, related to Markov chain, random walk (diffusion.) Could be applied to any point cloud: images, meshes are suited. Could be used to perform clustering, segmentation etc. Linear Algebra Review There are several ways to see a eigenvalue problem

Apr 22, 2020

Note on Photometric Reasoning

Note on Photometric Reasoning Shape $\hat n$, lighting $l$, reflectance $\rho$ affect image appearance $I$. Can we infer them back? $$ I=\rho<\hat n,l> $$ How much does shading and photometric effects tell us about shape, in natural settings.

Mar 17, 2020

Note on Generative Adversarial Network

Note on GAN Note with reference to the Youtube lecture series Hongyi Li. Architecture Developments Self Attention Used in Self-Attention GAN and BigGAN class Self_Attn(nn.Module): """ Self attention Layer""" def __init__(self,in_dim,activation): super(Self_Attn,self).__init__() self.chanel_in = in_dim self.activation = activation self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1) self.gamma = nn.Parameter(torch.zeros(1)) self.softmax = nn.Softmax(dim=-1) # def forward(self,x): """ inputs : x : input feature maps( B X C X W X H) returns : out : self attention value + input feature attention: B X N X N (N is Width*Height) """ m_batchsize,C,width ,height = x.size() proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N) proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H) energy = torch.bmm(proj_query,proj_key) # transpose check attention = self.softmax(energy) # BX (N) X (N) proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N out = torch.bmm(proj_value,attention.permute(0,2,1) ) out = out.view(m_batchsize,C,width,height) out = self.gamma*out + x return out,attention Style GAN BigGAN Conditional GAN Text Conditioning Text is processed and combined with noise vector.

Mar 7, 2020

Note on Hardware Based Computational Photography

Note on Hardware Based Computational Photography Now we have far more computational power than before! Besides, many images will go through complex algorithms as postprocessing. But we can also optimize camera measurement, so that results look even better.

Feb 27, 2020

Interpretability

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Interpretability Computer Vision

Thomas Fel

January 1, 2025

Learning Dynamics

Understanding Learning Dynamics of Neural Representations via Feature Visualization at Scale

Learning Dynamics NeuroAI Science of AI Self-supervised learning Computer Vision

Chandana Kuntala

November 1, 2023

Academic Note

Note on Classic CNNs

Motivations Many CNN models have become the bread and butter in modern deep learning pipeline. Here I’m summarizing some famous CNN structure and their key innovations as I use them.

academic note CNN computer vision machine learning deep learning classics

January 17, 2021

Tech Note

Compiling C Extensions for PyTorch

tech note Machine Learning Computer Vision debug

September 15, 2020

Tech Note

Debugging StyleGAN2 in PyTorch

tech note Machine Learning Computer Vision debug

June 10, 2020

Tech Note

Note on Laplacian Operator (Diffusion) in Geometry Processing

tech note Math Linear Algebra computer vision Graphics computer graphics Geometry

May 8, 2020

Tech Note

Spectral Graph Theory and Segmentation

tech note Math Linear Algebra computer vision segmentation graph theory

April 22, 2020

Academic Note

Note on Photometric Reasoning

academic note computer vision computer science machine learning computer graphics

March 17, 2020

Academic Note

Note on Generative Adversarial Network

academic note generative model computer vision computer graphics machine learning deep learning

March 7, 2020

Academic Note

Note on Hardware Based Computational Photography

academic note computer vision computer science machine learning computer graphics

February 27, 2020

Interpretability Computer Vision

Thomas Fel, Ekdeep Singh Lubana, Jacob S Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle (2025). Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models. ICML 2025 (arXiv:2502.12892).

PDF Cite Code 📝 BLOG

Learning Dynamics NeuroAI Science of AI Self-supervised learning Computer Vision

Chandana Kuntala, Carlos R Ponce, Deepak Kumar Sharma, Binxu Wang (2023). Understanding Learning Dynamics of Neural Representations via Feature Visualization at Scale. NeurIPS 2023 UniReps Workshop: the First Workshop on Unifying Representations in Neural Models.

Cite

academic note CNN computer vision machine learning deep learning classics

(2021). Note on Classic CNNs.

tech note Machine Learning Computer Vision debug

(2020). Compiling C Extensions for PyTorch.

tech note Machine Learning Computer Vision debug

(2020). Debugging StyleGAN2 in PyTorch.

tech note Math Linear Algebra computer vision Graphics computer graphics Geometry

(2020). Note on Laplacian Operator (Diffusion) in Geometry Processing.

tech note Math Linear Algebra computer vision segmentation graph theory

(2020). Spectral Graph Theory and Segmentation.

academic note computer vision computer science machine learning computer graphics