GoogLeNet is a 22-layer deep convolutional network whose architecture has been presented in the ImageNet Large-Scale Visual Recognition Challenge in 2014 (main tasks: object detection and image classification). You can read the official paper here.
The main novelty in the architecture of GoogLeNet is the introduction of a particular module called Inception.
To understand why this introduction represented such innovation, we should spend a few words about the architecture of standard Convolutional Neural Networks (CNNs) and the common trade-off the scientist has to make while building them. Since the following will be a very high-level summary of CNNs, if you are curious about this topic I recommend my former article about CNN's’ architecture.
Common Trade-Off in CNN
CNNs are made of the following components:
- Convolutional stage (+ non-affine transformation via activation functinos)
- Pooling stage
- Dense stage
Basically, before the Dense layers (which are placed at the end of the network), each time we add a new layer we face two main decisions:
- Deciding whether we want to go with a Pooling or Convolutional operation;