Understanding the Inception Module in Googlenet

Valentina Alto
4 min readJan 9, 2021

--

GoogLeNet is a 22-layer deep convolutional network whose architecture has been presented in the ImageNet Large-Scale Visual Recognition Challenge in 2014 (main tasks: object detection and image classification). You can read the official paper here.

The main novelty in the architecture of GoogLeNet is the introduction of a particular module called Inception.

To understand why this introduction represented such innovation, we should spend a few words about the architecture of standard Convolutional Neural Networks (CNNs) and the common trade-off the scientist has to make while building them. Since the following will be a very high-level summary of CNNs, if you are curious about this topic I recommend my former article about CNN's’ architecture.

Common Trade-Off in CNN

CNNs are made of the following components:

  • Convolutional stage (+ non-affine transformation via activation functinos)
  • Pooling stage
  • Dense stage

Basically, before the Dense layers (which are placed at the end of the network), each time we add a new layer we face two main decisions:

  • Deciding whether we want to go with a Pooling or Convolutional operation;

--

--

Valentina Alto

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast