In this blogpost, we will discuss how we leveraged a capsule network to serve as the discriminator in a Generative Adversarial
architecture. We use our model on a variety of similar datasets in the MNIST family, hoping that the vector activations
produced by a capsule network will be able to provide more nuanced feedback to the generator, and thus lead to improved
generated images. We are especially interested in transformed and rotated images, which capsule networks perform
particularly well with. We achieve significant improvements over the baseline model of a deep convolutional
GAN. To further our analysis, we pinpoint certain areas in the latent space in which certain image transformations or patterns arise. Our code is located here.
Two recently popularized, innovative models are GANs (Generative Adversarial Networks)
, and CapsNets (capsule networks)
. In this paper, we will explain how we integrate the two into a CAPSGAN.
The strength of a Capsule Network lies in its dynamic routing capabilities, which direct the output of a neuron based on
its cosine similarity with other neurons. Capsule networks also internally represent objects as vectors, providing far more
expressive power than a traditional CNN. This representation also allows for rotational invariance, which we test against a
rotated MNIST dataset.
Generative adversarial networks are a model used to generate images from a learned distribution. They are composed of two
components: a generator that learns to generate images given a vector sampled from the latent distribution, and a discriminator
that learns to discriminate between generated images and real ones from the dataset. The two components of this model
play a minimax game, by the end of which the generator is able to generate images from the data distribution by randomly
sampling from the latent space. Traditionally, architectures such as a deep convolutional neural network are used for
both components, but we propose using a capsule network for the discriminator.