On the computational side, there have been confusions about how TPUs and GPUs relate to BERT. BERT was done with 4 TPU pods (256 TPU chips) in 4 days. Does this mean only Google can train a BERT model? Does this mean that GPUs are dead? There are two fundamental things to understand here: (1) A TPU is a matrix multiplication engine — it does matrix multiplication and matrix operations, but not much else. It is fast at computing matrix multiplication, but one has to understand that (2) the slowest thing in matrix multiplication is to get the elements from the main memory and load it into the processing unit. In other words, the most expensive part in matrix multiplication is memory loads. Note the computational load for BERT should be about 90% for matrix multiplication. From these facts, we can do a small technical analysis on this topic.
Deep learning is a field with intense computational requirements and the choice of your GPU will fundamentally determine your deep learning experience. With no GPU this might look like months of waiting for an experiment to finish, or running an experiment for a day or more only to see that the chosen parameters were off and the model diverged. With a good, solid GPU, one can quickly iterate over designs and parameters of deep networks, and run experiments in days instead of months, hours instead of days, minutes instead of hours. So making the right choice when it comes to buying a GPU is critical. So how do you select the GPU which is right for you? This blog post will delve into that question and will lend you advice which will help you to make a choice that is right for you.
[Read more…] about Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning
With the release of the Titan V, we now entered deep learning hardware limbo. It is unclear if NVIDIA will be able to keep its spot as the main deep learning hardware vendor in 2018 and both AMD and Intel Nervana will have a shot at overtaking NVIDIA. So for consumers, I cannot recommend buying any hardware right now. The most prudent choice is to wait until the hardware limbo passes. This might take as little as 3 months or as long as 9 months. So why did we enter deep learning hardware limbo just now?
This morning I got an email about my blog post discussing the history of deep learning which rattled me back into a time of my academic career which I rather not think about. It was a low point which nearly ended my Master studies at the University of Lugano, and it made me feel so bad about blogging that I took two long years to recover. So what has happened?
This blog post looks at the growth of computation, data, deep learning researcher demographics to show that the field of deep learning could stagnate over slowing growth. We will look at recent deep learning research papers which strike up similar problems but also demonstrate how one could to solve these problems. After discussion of these papers, I conclude with promising research directions which face these challenges head on.
Convolution is probably the most important concept in deep learning right now. It was convolution and convolutional nets that catapulted deep learning to the forefront of almost any machine learning task there is. But what makes convolution so powerful? How does it work? In this blog post I will explain convolution and relate it to other concepts that will help you to understand convolution thoroughly.