I am an informatics master student at the University of Lugano, Switzerland, currently working as visiting researcher at the UCL Machine Reading Group where I am advised by Sebastian Riedel working on information retrieval and knowledge graphs. Before that, I did two research internship at Microsoft, where I worked on how automatic, personalized knowledge graphs for speech recognition and on memory optimizations for deep learning. Before that, I did another research internship with the UCL Machine Reading Group, where I focused on focused on neural link prediction in knowledge graphs, that is to infer knowledge between concepts, people, organizations, which can be inferred from the overall structure of the knowledge graph.
I started out in research by building my own GPU cluster and developing algorithms to speed up deep learning on GPU clusters.
I am passionate about large-scale deep learning and unsupervised learning which I think can be achieved by hierarchical associative memory akin to psychological schemas. I am fond of neuroscience and I see my work as a symbiosis of neuroscience and deep learning — the more I know about the brain and its behavior the more I know about AI; the more I know about AI the more I know about the brain.
I am also passionate about computational efficiency. I try to write algorithms and software which scales efficiently to practical and large-scale workloads.
In the past, I also took part in Kaggle competitions where I have reached world rank 63.
I obtained a degree in applied mathematics at the Open University and did a dual apprenticeship as Mathematical and Technical Software Developer where I worked in the automation industry.
Besides deep learning, I am also very interested in understanding the human brain, human nature, the human condition and their evolution. In my spare time, I like to study and think about fields aligned to these topics.
Feel free to contact me at firstname.lastname@gmail.com; if you have questions regarding deep learning, I prefer that you post your questions as comments on one of my blog posts (or on this page if it does not fit to any blog post); this way all people can profit from your questions and my answers.
Publications
2018
Convolutional 2D Knowledge Graph Embeddings, Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel. Proceedings of the 32th Conference on Artificial Intelligence (AAAI), 2018 (acceptance rate 25%).
2016
8-Bit Approximations for Parallelism in Deep Learning, Tim Dettmers, 4th International Conference on Learning Representations (ICLR), 2016 (conference track, acceptance rate 25%).
Computational and Parallel Deep Learning Performance Benchmarks for the Xeon Phi, Tim Dettmers, Hanieh Soleimani, technical report, 2016.
Research internships
2017-09 – Now UCL Machine Reading Group, London
2017-06 – 2017-09 Microsoft Research, Redmond
2017-01 – 2017-06 UCL Machine Reading Group, London
2016-06 – 2016-09 Microsoft Research, Redmond
Hackathons
2015-06 Hamburg Hackathon 2; 1st place (19 teams). Analyzing Twitter stream to find viral content in real-time.
2014-11 Media Hack Day – Video, Berlin; 1st place (13 teams). Finding the most viral youtube videos using twitter and youtube data.
2014-06 Hamburg Hackathon; 1st place in Technology (17 teams). Using deep neural networks to predict personality on Twitter.
Awards & Honors
2016/2017 Google Scholarship
2011/2012 Best regional graduate in apprenticeship “Mathematical-technical Software Developer”
Are you building your own ml machines or buying gaming machines with Nvidia 980 gpu’s? What is a good desktop to buy for ml? I do not really want to put together my own computer.
It is always significantly cheaper to build one’s own machine and you can customize the parts better. Gaming machine often have some parts that are better than needed and some important parts which worse than needed. However, for single GPU systems overall performance is determined by the GPU (about 90%) and not by the surrounding hardware (about 10%); so any gaming rig with a GTX 980 or any other good deep learning card will be all that you need – all choices are good.
Can i use Matlab for deep learning to make a project on speech recognition? I need to add this program in a system in a bank for voice recognition.
A quick google search yielded these matlab deep learning libraries:
https://github.com/huashiyiqike/LSTM-MATLAB
http://www.vlfeat.org/matconvnet/
Hope that helps.
What reading list (blogs/online courses/books) would you suggest for someone with an economics background and wished to become competitive at Kaggle?
Keep up the good work!
The Cousera ML course by Andrew Ng is excellent and will get you acquainted with all different basic machine learning algorithms. With an economics background you may already know some of these algorithms, but a refresher might be nice.
Once you done that the main goal would be to become good at feature engineering. This is a very difficult task, and there is little information available – you will mainly learn through experience. The most important resource will be Kaggle forum threads of past competitions where the winners explain their methods; try to go through all the old competitions and understand the methodology of the winners – you will learn a lot from this! Also look at “Beating the benchmark threads” for each competition. There you will usually be able to learn easy basics to get started on problem.
After this you will mainly learn through experience: I would use python and the scikit-learn (sklearn) module and start to do as many competitions as you can. Try to apply what the winners mentioned in their forum posts. From here it is practice, practice, practice: Initially you will have little success, but do not be discouraged by this, this is normal; it will take a few months until you will score well, and about 1-2 years until you will perform at masters level; 3-7 years and you will perform at world class level.
Hope this helps – good luck!
Very much appreciated Tim. its going to be a long journey but I look forward to completing it. See you at Kaggle!
Hi. Thank you very much for awesome posts about hardwares. It helped me a lot. By the way, I am just curious – what library do you use for your deep learning networks?
Thanks for your feedback. Currently, I am developing my own deep learning library from scratch, which will feature automatic GPU computation for any algorithm and numpy interface. Besides that I found Torch7 to be best (among theano/pylearn2 and caffe): It is easy to use, features many highly optimized routines that other libraries do not offer and it is easy to extend.
Hi, looking at this, is torch7 still relevant for DL with around 10gb to 30gb data? It is easier to install torch7 compared to Theano, but, my question is, how about the Gpu utilization?
P/s thanks for that extensive guide on choosing the right gpu.
Torch7 and theano are good and should offer about the same GPU utilization.
Hey! Just saw your blog. Nice endeavor. I’m an undergrad myself as well. I saw your article on Hardware for Deep Learning. I had been thinking of building a machine for my research. Can you shell out some details about your system?
I currently have 4x GTX Titan, i7 4820K (4core, 8 threads), PCIe 3.0 CPU, 32GB RAM and a SSD + 1TB HD.
Your website is amazing. Thank you so much for putting these together.
What is the motherboard you are using?
I am putting together the following – what do you feel? If I use AMD CPU with PCIe 2.0 then the cost could come down by 400 $. Would I be sacrificing lot by using a PCIe 2.0 ?
Thank you,
Deepak
Component Selection Price
CPU
Intel Core i7-4820K 3.7GHz Quad-Core Processor
$319.99 Buy
CPU Cooler
RAIJINTEK AIDOS BLACK 48.6 CFM Sleeve Bearing CPU Cooler
$24.21 Buy
Motherboard
Asus Rampage IV Gene Micro ATX LGA2011 Motherboard
From parametric selection (show)
$229.99 Buy
Memory
G.Skill Ripjaws X Series 8GB (2 x 4GB) DDR3-1600 Memory
$59.98 Buy
Storage
Western Digital Caviar Blue 1TB 3.5″ 7200RPM Internal Hard Drive
$51.49 Buy
Video Card
EVGA GeForce GTX 960 4GB SuperSC ACX 2.0+ Video Card
$238.98 Buy
Case
Corsair 100R ATX Mid Tower Case
$47.99 Buy
Power Supply
Corsair CX 500W 80+ Bronze Certified Semi-Modular ATX Power Supply
$66.98 Buy
Total: $1039.61
I have an Asus Rampage and a Gigabyte motherboard; for most people the Asus Rampage did work fine, but for me it would not work with 4 GPUs. So I switched to my Gigabyte motherboard which just works fine with 4 GPUs. Generally there is not much wrong you can do with a motherboard — in my case it was just bad luck with the Asus Rampage.
I have no data from AMD CPUs, but because CPUs are not that important, I do not think you can go wrong with that. If you roll with PCIe 2.0, then you really should only work with one GPU (0-10 % performance decrease); otherwise, if you plan to have multiple GPUs in the future try to get a PCIe 3.0 CPU and motherboard (4 GPUs about 40% performance decrease for PCIe 2.0). AMD vs. Intel should be at most 2-5 % performance decrease.
An undergrad?
Really surprised me, your articles are well organized that I think you might be an young professor or engineer!
I am new to deep learning, and I find many preliminary concept from your blogs. That helps me a lot, since my mentor just sends me a series of papers, I am confused by the jargon.
Thank you! It is quite ironical, because I flunked out of high school due to my writing. I am a dyslexic and while I have a terrible time to write with pen-and-paper I am much better at this when I can use a computer. This might seem quite paradoxical, but this is actually quite common for dyslexics: Dyslexics can often write well, but they will need help with the writing to just do that — I am so grateful that I live in a time where computers exist that can help me with that.
I have to agree with jokeren! Very impressive blog and great subject area expertise from an undergrad. Look forward to following your blog as I begin to learn about this field! Do you have a space where I can ask you or other experts questions?
Thank you! It is best to ask questions in the form of a comment under the most appropriate blog post or here for anything else, that way the answers and questions will be visible to anyone else.
Thank you for an interesting blog. Would be great to meet, I am a few hundred Km North of you!
Unfortunately, I will move to Switzerland soon where I will study at the University of Lugano. Currently, I am quite busy with organizing my move. So if you want to discuss/talk about things the best way to reach me is via email. If you have some questions you are always free to ask them here, so that other people will be able to see the Q&A here.
Thanks for your excellent blogs. What’s your email? I want to discuss more about deep learning with you.
My email is firstname.lastname@gmail.com. If you have specific questions I prefer if you ask those questions as a comment on a specific blog post. That way the question/answer will be accessible for everyone.
Hi Tim,
Often I read your very useful articles. I just want to know about how can I connect an external GPU (telsa K40) with my laptop ( Intel core i5 2.5 GHz with 8GB RAM). I want to run some deep learning experiments using GPU ( caffe library).
Is there any dock type system to correct external GPU with laptop? If you provide me some useful link to study, would be great.
Many thanks.
Ranju Mandal
My name is Canaan Tinotenda Madongo and am a 2nd year masters student at Beihang University in Beijing, China. Am doing my research on Person Re-identification, were am also supposed to have an understanding of Deep learning and its application to Person Re-identification. Am asking for your help on the above mentioned, regarding how best i can understand Deep learning and its applicablility to my research topic. Your assistance will be greatly appreciated as am new to this interesting topic of Deep Learning.
I am planning to write two blog posts on the topic of how to learn deep learning, but currently I have no timeline for that. I do not have the resources to help you personally, and it is difficult to give you guidance from the info that you provided. You can read my Deep Learning in a Nutshell series to get started, and after that you might read some papers, take some courses, or even jump right into software frameworks and learn how to train networks on small dataset.
Thank you so much Tim, I really appreciate it…
Hi Tim,
I am using GTX 1070 and have met some trouble installing Ubuntu on my Windows 10 machine, the problem is described here.
http://askubuntu.com/questions/812985/how-do-i-deal-with-the-input-signal-out-of-range-problem-while-installing-ubun
I would be more than appreciated if can look in to the problem and love to help! I have been haunted by this for days.
Best,
Tony
I never had such an error. Do you install ubuntu via the Windows installer? Also this link might be relevant.
Hey,
I would like to buy Nvidia GTX 1060 GPU but I am stuck on choosing one from these brands. Could you help me with the best one from below:
1. Zotac Amp Edition
2. Zotac Mini GTX 1060
3. MSI Armor
4. MSI Gaming X
Thanks and Regards
Harsha
The brands all have the same chip so it is all about the cooling that they provide. I do not know much about fan design, but you might find some benchmarks for different temperatures for each brand in reviews for games.
Hey Tim,
Thanks for coming back. From here , http://gadgets.ndtv.com/laptops/reviews/msi-geforce-gtx-1060-gaming-x-and-zotac-geforce-gtx-1060-amp-edition-review-862713.
Msi is having good performance . However, it says the size could be a problem. Could you help me with any good CPU where I could install this MSi GPU or the Zotac’s GPU?
I am really a tyro with GPU’s, I need some guidance before I procure them.
The size should be fine. The GTX 1060 is a relatively short GPU even the “longer” version by Zotac should be no problem. For the CPU any CPU will do if you are planning on using just one GPU. For multiple GPUs it is important that your CPU supports at least 32 PCIe lanes. You can find this number in the specifications or in the case of intel on ark.intel.com next to the label “Max # of PCI Express Lanes” field.
Hi,
I am currently using DL CNN to develop image based search solution for an ecommerce company. They have about 500 categories, main problem is there are many categories where the variance within categories is very high i.e. a single sports category will have hockey stick and then box of hockey sticks where whole hockey stick is not visible, there are many categories like these. Any suggestions you have for me.
Not much that you can do there. This is often so with real world data. You can either get more data or try to get cleaner data (preferably both). Academic data sets are usually also cleaned up in this more manual and tedious way (or rather with Amazon mechanical turk and the like).
Hi
Really appreciate your sharing. I’ve also just started to learn DL. This is really helps.
I’ve done a simple test of object detection on Linux. Now I’d like to try to port it to mobile devices, like a Andriod phone or Raspberry Pi. I mean, I’d deploy a CNN network on mobile devices to implement object detection application. I was wondering could you kindly provide some tips to choose hardwares on mobile devices?
About the CNN model, memory consumption during deployment is about 500MB per image, and computation complexity of one forward pass is about 8.35E+10 FLOPs.
It is difficult to make explicit choices for the mobile phone hardware, because often you want all mobile phone users to use your software. So the most general option would be to run it on the mobile phone CPU. This of course requires some pretty sophisticated shrinking and optimization of the model. This is not yet in reach for most users of deep learning, but for small applications there is TensorFlow and now Caffe2, but both options still seem quite experimental.
If you want to aim at specific phones, then the best ones should be ones that have the Tegra K1 chip. NVIDIA’s cuDNN works on these ships and provides quite some computing power. Memory might still be in issue here though, but you could work on your own framework to use it on such devices. However, keep in mind that your software might become obsolete quickly as software firms working internally on android frameworks might open source them at any time.
Hi Tim,
Cool looking graphs of the gridworld in your post on NVidia: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-reinforcement-learning/
I want to use similar ones in my talks… Did you use some existing visualizer/framework or wrote the code yourself? Could you refer/share?
Thanks very much in advance!
Hi Denis,
I used Inkscape for this. I only had worked once before with Inkscape, so it still took some time to produce these figures, but I found the entire experience of designing figures quite good. I think with more experience it will be quite painless and fast to design nice figures. I would recommend Inkscape.
Ah, I see, so you didn’t run some RL learning algo to generate the plots. I also resort to Inkscape for my lectures when have no time to write code for experiments 🙂 But for RL slides – want to generate plots from real experiments to show various aspects of learning. Tried to find a simple-to-use gridworld visualizer and didn’t find one… Ended up writing one myself.
Anyway, thanks for the recommendation!
Denis.
Hi Tim,
I’m building my own Deep learning Rig!
From your experience which of the graphics cards do you recommend.
Pascal Titan X or Pascal GTX 1080 Ti?
From the Ti announcement, it seem 1080 Ti is even faster than Titan X. Does that performance edge carries over to deep learning?
Thanks in Advance,
Venkat
If you can wait for the GTX 1080 Ti, definitely go for the GTX 1080 Ti. It just gives you much more bang for the buck and the 1GB difference is negligible.
Hi Tim,
Thanks for the clarification. I already have a couple of them on pre-order. I hope it would’t be too late.
Best,
Venkat
Hello Tim,
When you’re running you’re neural network, in which proportion the DDR4 SDRAM is important with respect to the GDDR ? What should be the minimum amount of it ?
Which models of CPU support at least 32 PCIe lanes (SLI ready) ? It seems to be really expensive one.
Thanks in Advance,
Gaylou
Having 1:1 for CPU:GPU memory is often a good choice. Depending on your use-case and software framework that you use this will differ, but generally you will be okay with 1:1. If you have more than 1 GPU, you can often relax this to 1:2 in terms of CPU:GPU memory.
The lanes are not that important, only if you plan to do a lot of parallelization across GPUs. Often you do not want to do that anyways, because it is messy and requires special code in most cases. So I would go with a cheaper setup with less lanes and just use no parallelism across GPUs.
Hi Tim,
I am new to deep learning but serious about it. Do you think, using Ubuntu on the following gaming machine would be a good starting point? I will be coding models using Python.
https://www.razerzone.com/gb-en/gaming-systems/razer-blade
Theoretically it should work and the laptop seems like a good choice, practically probably not so much. The problem with laptops can be that their hardware is often not well supported by Ubuntu. So before buying a laptop you should make sure that you can actually run its hardware with Ubuntu. Usually, Lenovo laptops provide the best support for Linux systems, but even here very new Lenovo laptops do usually not have full Ubuntu (or Linux) support. Often you can get it working with some hacks here and there such as custom kernels, but this is difficult, especially if you do not have the experience. In some cases kernel updates will break NVIDIA drivers and thus render your laptop unusable for CUDA / deep learning.
So again, make sure your hardware is supported. If it is supported the laptop that you are linking is a good choice.
Thank you so much Tim.
I love your blog, it’s been extremely useful for building my own DL
setup. Right now I’m undertaking a large scale project, and as helpful
and informative as your blog posts are, I cannot piece together all the
necessary information from them and hence I’m asking you for help.
The aim is to have multiple nets running in one server. I think this is
possible by allocating different GPUs inside the server to separate
tasks, for example GPU0 for NetA and GPU 12 for NetB. I’m planning to
build a server with multiple GPUs inside it for this task (quite
obviously). Currently what I want to do is have one huge server with
lots of compute capability, and have all the developers sitting on
separate laptops or something similar developing the code. Later when
the developer is done he will copy the python (or whatever) files onto
the server via just a plain old USB (still to be finalized) and then run
the python files using the terminal on the server itself. Will this
work? Will this cause too much of an inconvenience to the devs? Is this
a better alternative to Virtual Machines?
I have a few more questions to bug you with :P. Could you please give me an email address? I wish to discuss this in more detail with you. I actually tried to mail firstname.lastname@gmail.com…I’m not very sure what I was expecting (it told it was an invalid address).
Thanks
This will work very fine and this seems to be a standard workflow (expect developers will copy the files using the network and not a usb). I would not virtualize the nodes since that will often cause problems with the GPUs. In general, if you have machines with many GPUs and you give developer ssh access it will be more than fine and they will be able to do their job. You can restrict access so that developers can only change local files; for that to work of course, you will need to setup all the tools that the developers need beforehand, or force them to use virtual environments as offered by local Anaconda installations.
The email should be correct; tim.dettmers@gmail.com.
Thanks a bunch! This makes my life so much easier. I’m planning to setup 6 machines with 20 GPUs each, linked together via InfiniBand so it becomes as good as one machine with 120 GPUs. Afterwards, is it as easy as installing Ubuntu 14.04 LTS on the head node and opening up different terminals and choosing different GPUs? I heard something about slurm and how it is needed in HPC clusters, what is the deal with that?
Thank you once again
Is firstname.lastname@gmail.com your current email address?
Love your blog posts!
Yes, it is tim.dettmers@gmail.com.
Hello Tim,
I’m new to deep learning and I’m thinking about building a system. What do you think of the new x399 platform and thread ripper? I wanted to start off with two Titans and upgrade to four later down the line but I wasn’t sure if I could take full advantage of the Titans with the x399 platform. Also if you don’t think x399 or AMD is a good choice which boards would you recommend?
It is a bit too early to tell how good the x399 boards / CPUs will be. I think 64 lanes are overkill for any rig that has less than 4 GPUs. The extra cores are not really needed for deep learning as is the DDR4 RAM. Overall it looks really fancy and expensive with no real payoff. The x399 platform might be more suitable for other domains, but for deep learning, it does not give any big benefit. I personally would probably go with a used DDR3 system with 36+ lanes CPU / board and some GTX 1070.
Hi Tim. Thanks for your blog. Posting this as of July 2017. I’m a doctor/researcher interested in training convolutional neural networks (densenets, resnets, inception nets) using tensorflow, keras, & python. My current system is old (2GB GPU, 8GB RAM) so, I need a new build.
I’ve settled on the ASUS X99 E 10G WS motherboard which has 40 PCIe lanes and multiple PCIe slots. CPU a 6 core i7 6850K, 64GB of fast DDR4 ram, with a 1 TB SSD and larger 8-10TB Hard drives. Only ONE GPU card for now.
Decision is between the 12GB Titan Xp (April 2017, $1500) and 11GB GTX 1080 Ti (2017 release, $750). With the new Volta P100 16GB coming out soon (volta’s 5120 cores vs pascal’s 3584), I’m afraid of serious buyer’s remorse with the Titan Xp. (I really want that extra 4GB!)
I think the answer is to get the 11GB GTX 1080Ti while I wait for the Volta P100 PCIe to be released, and then sell the 1080Ti’s & upgrade when it is out. I am concerned Titan Xp will take a huge hit in resale price when volta comes on market.
Questions:
A) Do you agree?
B) Can I mix/match the 1080Ti with 1-2 Volta 100 later, will it cause incompatibility problems or slow my training?
C) Heard of any problems with the 1080Ti & CUDA/Cudnn or other DL incompatibilities? Or works like the Titan Xp?
Thanks for your thoughts.
A) I would not buy a Volta V100 in any case, it will be very cost inefficient at about $4k-$6k a piece. There will be a commercial variant however in 2018 Q1/Q2.
B) I would do exactly this. You will not be able to parallelize across all GPUs, but you can still train models on the GTX 1080 Ti and parallelize across the GTX 1180s.
C) The GTX 1080 Ti is good. No problems for any deep learning application / software
Small typo in the year of your internship
“Research internships
2016-06 – Now Microsoft Research, Redmond”
Thanks for catching that!
Hi Tim,
do you provide 1 to 1 lectures ? can i have your email for further discussion ?
Thank you
Hi Tim,
Can we use TFLearn API for distributed training of the DNN over cluster of GPU or over Spark?
Thanks!
TensorFlow, in theory, supports distributed training. But I have not yet seen any successful implementation or usage of that outside of Google. So you can use it, but probably you do not want to use it. If you need parallel training over GPUs via Spark you can look at CNTK and their distributed computing framework. It is well designed for such as task, although I have not seen it being used outside of Microsoft either.
Hi, Tim!
I am new for machine learning.
Which laptop you will suggest to buy for beginning for running MATLAB, python, R, excel, tensor flow and further machine learning? Am I right for looking for laptop of less than 900 USD cost with DELL Vostro 5468, DELL Vostro 5568, ASUS S406UA, ASUS S510UQ, HP 15-bs563ur, Acer Swift 3 SF315-51? Which of them you would advice me to buy?
Should I buy with intel graphics 620 + amd radeon 530 or intel graphics 620 + nvidia ge force 940 mx? Is i5 enough and ssd of 256 gb for beginning my journey?
Try to buy a laptop with an NVIDIA GPU, that should get you started for TensorFlow and machine learning. To just try things out your GPU just needs about 1-2GB of RAM. If you want to be a bit more serious about it try to find an NVIDIA GPU with 4-6GB or more RAM. The rest does not matter that much.
Tim-
I’m an avid fan of your blogs, and the mix of practical advice and theory. We’re working on pedestrian avoidance using in-car mini-ITX servers. With Texas Inst c66x devices we can fit 64 c66x cores into a 4″ x 9 ” x 9″ box using half-size PCIe cards, along with a quad-core Atom, all under 150 W. c66x devices are extremely energy efficient at 8/16-bit convolutions with 32-bit acc, also they are C/C++ programmable so we have lots of options for running the most recent compressed models, and running different ones concurrently.
The problem with using TI devices is future roadmap; they have moved away from high performance multicore CPUs and towards multicore ARM SoCs with limited number of c66x cores, which is not an efficient architecture for HPC servers.
Can you recommend a half-size GPU PCIe card ? We need to keep each server at 150 W or less.
Thanks.
-Jeff
Hi Jeff,
I recommend at single slot version of GTX 1050 Ti. They are very efficient in terms of FLOPS/Watts. With the right software, you could also look into the right CPU as GPUs are often not so efficient when it comes to inference, but I think the GTX 1050 Ti would give you enough power to do real-time inference at 75W.