Open In App

Cycle Generative Adversarial Network (CycleGAN)

GANs was proposed by Ian Goodfellow . Cycle GAN is used to transfer characteristic of one image to another or can map the distribution of images to another. In CycleGAN we treat the problem as an image reconstruction problem. We first take an image input (x) and using the generator G to convert into the reconstructed image. Then we reverse this process from reconstructed image to original image using a generator F. Then we calculate the mean squared error loss between real and reconstructed image. The most important feature of this cycle_GAN is that it can do this image translation on an unpaired image where there is no relation exists between the input image and output image. 



Architecture 

Like all the adversarial network CycleGAN also has two parts Generator and Discriminator, the job of generator to produce the samples from the desired distribution and the job of discriminator is to figure out the sample is from actual distribution (real) or from the one that are generated by generator (fake).



The CycleGAN architecture is different from other GANs in a way that it contains 2 mapping function (G and F) that acts as generators and their corresponding Discriminators (Dxand Dy): The generator mapping functions are as follows:



 

where X is the input image distribution and Y is the desired output distribution (such as Van Gogh styles) . The discriminator corresponding to these are:


 

Dx :  distinguish G(X)(Generated Output) from Y (real Output )


 

Dy : distinguish F(Y)(Generated Inverse Output) from X (Input distribution)



 

To further regularize the mappings, the authors used two more loss function in addition to adversarial loss . The forward cycle consistency loss and the backward cycle consistency loss . The forward cycle consistency loss refines the cycle : 



 

The backward cycle consistency loss refines the cycle:



 

 



 

Generator Architecture:



 

Each CycleGAN generator has three sections:


 


 

The input image is passed into the encoder. The encoder extracts features from the input image by using Convolutions and compressed the representation of image but increase the number of channels. The encoder consists of 3 convolution that reduces the representation by 1/4 th of actual image size. Consider an image of size (256, 256, 3) which we input into the encoder, the output of encoder will be (64, 64, 256).



 

Then the output of encoder after activation function is applied is passed into the transformer. The transformer contains 6 or 9 residual blocks based on the size of input. The output of transformer is then passed into the decoder which uses 2 -deconvolution block of fraction strides to increase the size of representation to original size.


 


 

The architecture of generator is:


 

c7s1-64, d128, d256, R256, R256, R256,


 

R256, R256, R256, u128, u64, c7s1-3 



 

where c7s1-k denote a 7×7 Convolution-InstanceNorm-ReLU layer with k filters and stride 1. dk denotes a 3 × 3 Convolution-InstanceNorm-ReLU layer with k filters and stride 2. Rk denotes a residual block that contains two 3 × 3 convolution layers with the same number of filters on both layer. uk denotes a 3 × 3 fractional-strides-Convolution-InstanceNorm-ReLU layer with k filters and stride 1/2 (i.e deconvolution operation).



 

Discriminator Architecture:



 

In discriminator the authors use PatchGAN discriminator. The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256×256 image to a single scalar output, which signifies “real” or “fake”, whereas the PatchGAN maps from 256×256 to an NxN (here 70×70) array of outputs X, where each Xij signifies whether the patch ij in the image is real or fake. 


 


 

The architecture of discriminator is : 


 

C64-C128-C256-C512



 

where Ck is 4×4 convolution-InstanceNorm-LeakyReLU layer with k filters and stride 2. We don’t apply InstanceNorm on the first layer (C64). After the last layer, we apply convolution operation to produce a 1×1 output.



 

Cost Function:


 



 


 

The Cost function we used is the sum of adversarial loss and cyclic consistent loss:



 

and our aim is :



 

Applications: 


 

Style Transfer Results

Comparison of different Style Transfer Results


 

Evaluation Metrics:


 


 

Results:


 


 

In this task the authors scrap data from Google Maps and google Earth and evaluated on different GANs method and compared to Ground Truth.


 

Classification Performance for different metrics


 

Some of the results from Cityscapes dataset are as follows


 


 

Drawbacks and Limitations :


 

Failure Cases of Cycle GAN


 

References:


 


 


Article Tags :