Depth-wise Separable Convolutions (shorthand: DepSep convolution) have been proposed as an efficient alternative to traditional Convolutions. They are used in models such as MobileNet (Howard et al., 2017), EfficientNet (Tan et al., 2019), and more. They have less parameters and require less floating point operations (FLOPs) to compute. However, due to the complexities of modern compute accelerators such as GPUs, metrics such as FLOPs and parameter sizes may not correspond with real-world performance.
In this post, we will explore some of the differences between normal convolutions and DepSep convolutions. We will investigate how these differences translate to real-world performance through benchmarks, and try to explain the disparities between theoretical and real-world performance on GPUs.