Autoencoders in Computer Vision
In my previous article, I’ve been talking about Generative Adversarial Networks, a class of algorithms used in Deep Learning which belong to the category of generative models.
In this article, I’m going to introduce another way of generating new images, yet with a different approach. The algorithms we are going to use for this purpose belong to the class of Autoencoders. Generally speaking, an autoencoder (AE) learns to represent some input information (in our case, images input) by compressing them into a latent space, then reconstructing the input from its compressed form to a new, auto-generated output image (again in the original domain space).
The first step of the job (compressing the information into a latent space) is done by an encoder, while the decompression phase is done by a decoder. Performances are evaluated by measuring the distance between the original and generated data (in case of computer vision, images).
As you can see, in the field of Computer Vision, the main difference between GANs and AEs is the way their performances are measured. With GANs, the generator aims at fooling the discriminator by generated images which are likely to be thought as real rather than generated. On the other hand, with Autoencoders the performances of the algorithm is measured via a loss function which aims at minimizing the difference between the original image and the generated one.
Now that we have an idea of how Autoencoders work, let’s have a look at how to build one with Python and Keras.
Buinding an Autoencoder
To build an AE, we need three components: an encoder network which compresses the image, a decoder network which decompresses it, and a distance metric which can evaluate the similarity between the original and generated image. The whole model is trained to find weights for both the inner networks which minimizes the loss (=distance metric).
As we are talking about Neural Networks, we can be as creative as we please in building the two components of our model. Namely, we can build a simple fully connected AE rather than a deep fully connected one. We can also decide to make the two components Convolutional Neural Networks rather than simple NNs.
For this purpose, we will see the functioning of a very simple model, composed by fully connected layers. I will be using Keras package and, to build the NNs, I will employ the Keras functional API. Plus, as data input I will use the MNSIT digits dataset.
Let’s first create a train and test set of our image and visualize one of them:
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape) #let's have a look at the shape
import matplotlib.pyplot as plt
Great, now let’s build the whole model:
from keras import layersencoding_dim = 24 # 24 floats, it means that we have a compression factor of 784 (image input shape)/ 24 = 32.7input_img = keras.Input(shape=(784,))
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)autoencoder = keras.Model(input_img, decoded)encoder = keras.Model(input_img, encoded)encoded_input = keras.Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))autoencoder.compile(optimizer='adam', loss='binary_crossentropy') #per-pixel binary crossentropy loss
Now we have to train it on our dataset:
validation_data=(x_test, x_test))encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
Let’s see how it performs on our input data:
As you can see, even with a basic fully connected NN, we were able to decently reproduce the initial image.
In the context of image generation, in recent years GANs have outperformed the AEs methods. However, AEs are not only employed for image generation, nor are they employed solely in Computer vision field. Indeed, there is a variety of fields of interest:
- Dimensionality reduction → this is a technique deeply used in Machine Learning which aims at projecting the domain space into a smaller feature space, where it is easier to find relevant patterns in data. Hence, by performing dimensionality reduction it is easier to do Information retrieval. If you are interested in the topic of dimensionality reduction, you can read my former article here.
- Anomaly detection → because of the way it works, AEs are trained to precisely reproduce the most frequent characteristics of the original data. As such, whenever there are anomalies in the original data, the reconstruction performances are worse than average, which should be an indicator of potential anomalies.
- Image processing → in addition to image generation, AEs have also other applications in the computer vision field. Namely, the capability of denoising an image in order to improve the quality of images and movie that are poor in conditions. Similar to this latter, and paired with super-resolution, AEs are used also in the clinical field to improve the quality of X-Ray images, so that they can be further used for image segmentation or object detection tasks.
Hope you enjoyed the reading! I will soon post another article in the AEs topic, in order to focus on a particular verison of them, called Variational Autoencoders, so stay tuned!