Knowledge graphs are useful for providing structured sources of information for many downstream tasks. Hence, it is an interesting problem to build large knowledge graphs (KG) from a large text corpus. Being able to learn a KG from web-scale corpora means that we could leverage the large amount of unstructured information on websites (e.g. TechCrunch) and build structured knowledge bases. At a large scale, a KG is hard to maintain as it is not easy to keep track of issues like fact coverage, freshness and correctness. This blog post serves as a short introduction to the techniques used in building a simple KG.
Depth-wise Separable Convolutions (shorthand: DepSep convolution) have been proposed as an efficient alternative to traditional Convolutions. They are used in models such as MobileNet (Howard et al., 2017), EfficientNet (Tan et al., 2019), and more. They have less parameters and require less floating point operations (FLOPs) to compute. However, due to the complexities of modern compute accelerators such as GPUs, metrics such as FLOPs and parameter sizes may not correspond with real-world performance.
In this post, we will explore some of the differences between normal convolutions and DepSep convolutions. We will investigate how these differences translate to real-world performance through benchmarks, and try to explain the disparities between theoretical and real-world performance on GPUs.
This post provides a primer on the Transformer model architecture. It is extremely adept at sequence modelling tasks such as language modelling, where the elements in the sequences exhibit temporal correlations with each other.