Member-only story

Understanding the Inception Module in Googlenet

4 min readJan 9, 2021

GoogLeNet is a 22-layer deep convolutional network whose architecture has been presented in the ImageNet Large-Scale Visual Recognition Challenge in 2014 (main tasks: object detection and image classification). You can read the official paper here.

The main novelty in the architecture of GoogLeNet is the introduction of a particular module called Inception.

To understand why this introduction represented such innovation, we should spend a few words about the architecture of standard Convolutional Neural Networks (CNNs) and the common trade-off the scientist has to make while building them. Since the following will be a very high-level summary of CNNs, if you are curious about this topic I recommend my former article about CNN's’ architecture.

Common Trade-Off in CNN

CNNs are made of the following components:

Convolutional stage (+ non-affine transformation via activation functinos)
Pooling stage
Dense stage

Basically, before the Dense layers (which are placed at the end of the network), each time we add a new layer we face two main decisions:

Deciding whether we want to go with a Pooling or Convolutional operation;

Understanding the Inception Module in Googlenet

Common Trade-Off in CNN

Written by Valentina Alto

No responses yet