Note on GAN
Note with reference to the Youtube lecture series Hongyi Li.
Architecture Developments
Self Attention
Used in Self-Attention GAN and BigGAN
class Self_Attn(nn.Module):
""" Self attention Layer"""
def __init__(self,in_dim,activation):
super(Self_Attn,self).__init__()
self.chanel_in = in_dim
self.activation = activation
self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1)
self.gamma = nn.Parameter(torch.zeros(1))
self.softmax = nn.Softmax(dim=-1) #
def forward(self,x):
"""
inputs :
x : input feature maps( B X C X W X H)
returns :
out : self attention value + input feature
attention: B X N X N (N is Width*Height)
"""
m_batchsize,C,width ,height = x.size()
proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N)
proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H)
energy = torch.bmm(proj_query,proj_key) # transpose check
attention = self.softmax(energy) # BX (N) X (N)
proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N
out = torch.bmm(proj_value,attention.permute(0,2,1) )
out = out.view(m_batchsize,C,width,height)
out = self.gamma*out + x
return out,attention
Style GAN
BigGAN
Conditional GAN
Text Conditioning
Text is processed and combined with noise vector.
Image Conditioning
Image could be sent directly as spatial input! Thus you have conditional GAN.
Comments: Conditional GAN is similar to supervised learning, but doesn’t map one input to one single output, you can get a distribution of possible output based on one input. Instead of getting a mean output!
Conditioning vector should be sent into Discriminator to inform him what to discriminate.
Unsupervised Conditional GAN
Theory of GAN
Minimize some divergence between 2 distributions. Note when the discriminator is fully trained then the output score of it should match the relative probability of natural image and synthesized image.
Thus the optimal discriminator loss is a form of divergence between 2 distributions, i.e. “Discriminability”. This discriminability should decrease as better and better G is trained.
Thus you should really