Tim Dettmers

My Journey Towards Coding Agents: Building SERA

2026-01-27 by Tim Dettmers Leave a Comment

If you look at how people cook coding agents today, they have an industrial kitchen: large-scale reinforcement learning systems with many components for efficiency spanning hundreds of GPUs, complex repository mixing, and large teams working on all angles to optimize the data pipeline for training. For the family of Open Coding Agents we released today from Ai2, we had the equivalent of a hot plate and a frying pan: 32 GPUs and five bright-eyed researchers who wanted to cook state-of-the-art coding agents.

This blog post is about how we cooked that coding agent.

[Read more…]

Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work

2026-01-13 by Tim Dettmers Leave a Comment

If you are reading this, you probably feel the FOMO. Maybe you have seen the Twitter threads about coding agents completing entire features in minutes. Maybe a colleague mentioned they are “10x more productive” now — or “Influencers” saying AGI is here and you need to learn their particular thing now. Maybe you tried Claude Code and felt confused about why the magic everyone talks about is not working for you. This blog post is for those who want to cut through the hype and understand what actually works, what does not, and how to think about using agents to automate your own job further and further to be more productive.

I have been using agents — primarily Claude Code — for eight months to automate my own work. What you will read here is not speculation or theory. It is the product of hundreds of hours of experimentation, many failures, and some surprising successes. As a professor who does not write much code anymore, my perspective is different from the software engineering discourse that dominates Twitter. Most of my agent use is actually for writing — blog posts, grant proposals, meta reviews. While these problems might be non-traditional, they provide the exact view of how to use coding agents for all kinds of tasks even beyond coding itself. This helps you understand how far you can go in all different directions of agent-use.

[Read more…]

Why AGI Will Not Happen

2025-12-10 by Tim Dettmers 4 Comments

If you are reading this, you probably have strong opinions about AGI, superintelligence, and the future of AI. Maybe you believe we are on the cusp of a transformative breakthrough. Maybe you are skeptical. This blog post is for those who want to think more carefully about these claims and examine them from a perspective that is often missing in the current discourse: the physical reality of computation.

[Read more…]

Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning

2023-01-30 by Tim Dettmers 1,665 Comments

Deep learning is a field with intense computational requirements, and your choice of GPU will fundamentally determine your deep learning experience. But what features are important if you want to buy a new GPU? GPU RAM, cores, tensor cores, caches? How to make a cost-efficient choice? This blog post will delve into these questions, tackle common misconceptions, give you an intuitive understanding of how to think about GPUs, and will lend you advice, which will help you to make a choice that is right for you.

[Read more…]

LLM.int8() and Emergent Features

2022-08-17 by Tim Dettmers 13 Comments

When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transformer inference at scale that makes large models more accessible. The other pitch talks about emergent outliers in transformers and how they radically change what transformers learn and how they function.

From that, I learned that quantization research is like printers. Nobody cares about printers. Nobody likes printers. But everybody is happy if printers do their job.

[Read more…]

How to Choose Your Grad School

2022-03-13 by Tim Dettmers 18 Comments

If you are reading this, then you probably finished the long and arduous journey to grad school. You emerged victoriously, and this success is well-deserved. But which school should you choose? How to make a right choice if all schools look great in their own way? This blog post is centered around these questions. It is most useful if you are a computer science student aiming to study machine learning and, in particular, natural language processing in the US, but most of the information here is equally valid for any field of research and any country.

The choice of grad school that is right for you can be tricky and confusing. We live in a time of hyper-competitiveness, where even undergrads need to optimize for metrics like paper count to make it to the next level — grad school. This heavily career-centered perspective was probably advantageous to get you into grad school, and it remains crucial to get you to the level after that: a great job in industry or academia. So choosing the school which is best for your career can feel like an obvious choice. However, a PhD is a very long journey, and choosing your grad school based on this perspective alone might make you more vulnerable to burn-out, disillusionment, and general dissatisfaction.

In this blog post, I will discuss this career-centered perspective in detail, but I also provide you with three other views that hopefully help you make a balanced choice that not only leads to academic success but long-term satisfaction and a full and rich life. Balancing your decision based on all four perspectives probably leads to a better choice than looking at one angle alone. Before I go into the details, let me briefly introduce these four perspectives: The Career Perspective, the Identity Perspective, the Stability Perspective, and the Variability Perspective.

On Creativity in Academia

2019-09-03 by Tim Dettmers 5 Comments

I recently had a discussion about creativity with a colleague. We were discussing music and how creative many bands and groups are. At the end of our conversation, my friend told me, half-sarcastic-half-serious, how much more creative the people in the music industry are than him and that he just cannot find good ideas in his area of research even though he tried so hard for such a long time. I was a bit surprised because I thought of him as someone very creative. However, it is not uncommon to hear scientists lament about their lack of creativity compared to academic superstars. I think about creativity in academia is a bit distorted and a straight view can help to feel less bad about one’s own creativity.

Sparse Networks from Scratch: Faster Training without Losing Performance

2019-07-11 by Tim Dettmers 38 Comments

This blog post is about my work, Sparse Networks from Scratch: Faster Training without Losing Performance, with Luke Zettlemoyer on fast training of neural networks which we keep sparse throughout training. We show that by developing an algorithm, sparse momentum, we can initialize a neural network with sparse random weights and train it to dense performance levels — all while doing just a single training run. Furthermore, If we use optimized sparse convolution algorithms, we can speed up training between 3.5x for VGG to 12x for Wide Residual Networks. This stands in stark contrast to computationally expensive methods which require repetitive prune-and-retrain cycles as used by the Lottery Ticket Hypothesis (Frankle and Carbin, 2019) and other work. Thus we show that training sparse networks to dense performance levels does not require “winning the initialization lottery” but can be done reliably from random weights if combined with a method that moves weights around the network in a smart way. We call the paradigm that maintains sparsity throughout training while maintaining dense performance levels sparse learning. While this work shows that sparse learning is possible, future work holds the promise to train larger and deep networks on more data while requiring the same or less computational resources as current dense networks.

A Full Hardware Guide to Deep Learning

2018-12-16 by Tim Dettmers 945 Comments

Deep Learning is very computationally intensive, so you will need a fast CPU with many cores, right? Or is it maybe wasteful to buy a fast CPU? One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high-performance system.

[Read more…]

Machine Learning PhD Applications — Everything You Need to Know

2018-11-26 by Tim Dettmers 154 Comments

I studied in depth how to be successful in my PhD applications and it paid off: I got admitted to Stanford, University of Washington, UCL, CMU, and NYU. This blog post is a mish-mash of how to proceed in your PhD applications from A to Z. It discusses what is important and what is not. It discusses application materials like the statement of purpose (SoP) and how to make sense of these application materials.

Skip links

Main navigation