Credit Assignment in Deep Learning

2017-09-16 by Tim Dettmers 15 Comments

This morning I got an email about my blog post discussing the history of deep learning which rattled me back into a time of my academic career which I rather not think about. It was a low point which nearly ended my Master studies at the University of Lugano, and it made me feel so bad about blogging that I took two long years to recover. So what has happened?

When I started my masters, I worked on blog posts for NVIDIA which featured introductions into deep learning. Part of this blog post series also discusses the history of deep learning. I hence discussed what I thought to be the historical milestones with the largest impact but in doing so, I inadvertently assigned credit to researchers that I thought had a good impact on the field. I worked on this blog post and circulated it in my deep learning class’s forums to the dismay of my then advisor who holds the opposite view of mine.

To evaluate the credit that a research idea deserves, I believe that it is not only important who has the first idea, but I also believe that it is equally important to actually make it work (the implementation). My ex-advisor believed that it only really matters who was the first who published the idea.

My advisor scolded me in class for my views since he felt very strongly that the first idea counts and that my view is plain wrong. To redeem myself and to salvage the relationship with him, I felt coerced to change my blog post to his wishes.

This quasi-censorship of my blog post eviscerated me, and in consequence, I lost all desire to blog for two years. Despite my efforts, the relationship with my then advisor deteriorated further, and I had to look for a new advisor.

Looking back at the blog post that I produced, I feel ashamed. It does not express my personal views. I value integrity, and my behavior did not reflect who I want to be.

I write this blog post to discuss my true beliefs about credit assignment and why I believe that the idea, its communication and its implementation are all equally important.

Who Deserves Credit for Deep Learning Ideas?

There has been a lot of discussion about how to assign credit to researchers, or in other words, how to determine whose work had a large impact. Note that I do not discuss here who deserves credit for discovering an idea, I look at who deserves credit for the impact that an idea has. Looking at this, there are two main camps: The first believes that ideas and implementation count equally, and, the second believes that it counts who had the ideas first.

The problem with this discussion is that it is not a scientific topic, but a philosophical one. How do we determine what has how much value? We use the scientific method. What is the scientific method in philosophy? Use reductions to arrive at simple statements, then use logic to derive other factual statements, failing that — like in this case — we make thought experiments where we isolate variables which we then take to extremes. Let’s do this now to get insight into the issue.

All Ideas, No Communication, No Implementation

Let’s imagine there exists a person that has come up with all ideas in deep learning of the past and all ideas in deep learning of the future. However, this person cannot communicate with either words or writing. This person also cannot write code. How much credit deserves such a person?

I would argue that such a person deserves zero credit. In fact, I think it is epistemologically correct that this person deserves no credit because nobody can know that he or she deserves credit.

All Ideas, 1 Communication + No Ideas, Full Communication

We have a Person 1 that invented everything in deep learning. Now this person can communicate, but he or she is so unclear that only a single Person 2 can understand these ideas.

Now, Person 2 has no creativity but is a perfect communicator. Person 2 basically just translates what Person 1 said and the entire world understands. Who deserves credit here?

It is tempting to think that Person 2 deserves all the credit because Person 1 is useless without Person 2. But similarly, Person 2 is useless without Person 1.

Both people thus deserve equal credit — no one can achieve anything without the other.

All Ideas, Full Communication, 1 Implementation

Let’s increase the complexity of the problem. Let us say the duo of Person 1 and Person 2 spread the ideas so that the entire world understands deep learning, but let us assume that all people are implementation agnostic. Nobody can make deep learning work. The world knows about all deep learning ideas but cannot solve any problem with it. In such a world, the ideas of deep learning are quickly abandoned by the large majority due to their uselessness (just like the majority of the population does not care much about pure mathematics, e.g., few care if $a n + b n = c n$ is true for all integer n >2).

Enter Person 3. Person 3 has no creativity, cannot communicate, but he or she can implement all the deep learning ideas in a practical manner. The world looks at this person’s code and suddenly is able to solve all problems which are solvable with deep learning.

Who deserves the most credit: Person 1, Person 2, or Person 3?

As discussed before, Person 1 and Person 2 deserve equal credit, and also here, I would argue, that Person 3 deserves equal credit.

This becomes apparent when we think about the value of ideas. Ideas are useful when they have an affect. If they have no or only a small effect they just deserve no recognition or little recognition. If deep learning ideas have no practical value then they would not deserve more recognition than, say, the idea that there might be something beyond the observable universe — it is a nice idea, but it will never produce anything of much value.

Comparative Individual Value For Collective Contributions

The evaluation changes if we distribute the contributions of ideas, communication, and implementation among many individuals. If we can take the three scenarios above, expand Person 1-3 into groups of people and subject them to comparative evaluation, that is, how much value the contributions of each individual has compared to all the other people have we arrive at the following thought experiment.

1 Ideas, 1000 Communication, 1000 Implementation

We have 1 person who has all the ideas, 1000 people who can understand these ideas and communicate them to the world, and 1000 people who can implement them to yield practical value, then how do we assign credit?

As discussed it is reasonable that each of the areas, (1) ideas, (2) communication, (3) implementation deserve equal credit. If now the groups of 1000 people made contributions (communications and implementations) of equal value, it would be fair to say that:

1 Ideas: 1/3 credit
1000 Communication: 1/3000 credit each
1000 Implementation: 1/3000 credit each.

We see in this case the one person with the idea should receive the largest amount of credit.

Similarly, if we weight the numbers differently, and if we assume contributions of individuals in groups are equal, then this credit assignment holds for all other combinations like (1000, 1, 1000), or (10000, 1000, 1).

Timing and Relational Effects

In the real world, we have timing effects and relational effects. Not all 1000 Ideas, Communication, or Implementation people will publish their work at the same time, but they will have a specific sequence. In this sequence, they will influence and build on each other — they stand on the shoulders of giants. Who are the giants? Who deserves what amount of credit?

If we think about it, it is not much different than our first analysis. Lets take Person 1 that only has ideas and can communicate his or her ideas to only one other Person 2; Person 2, standing on Person 1’s shoulders, is only able to communicate the ideas to another person Person 3; Person 3, standing on Person 2’s shoulders, in turn, can communicate the ideas clearly to the entire world.

If we express the ability of people as numbers which represent the fraction of all value ideas, communication, and implementation we could weight Person 1, Person 2, and Person 3 in this way:

Person1: [1, 1/10^10, 0]
Person2: [0, 1/10^10, 0]
Person3: [0, 1, 0]

Which means that Person 1, has all the ideas (1), could communicate these ideas to 1 person (we assume a total population of 10 billion people to make the math easier). Person 2 has no ideas, could understand Person 1’s idea but could only communicate this idea to one other person, Person 3. Person 3 has no ideas, understands the idea of Person 2 and can communicate it so that everybody understands. Note that this example is simplified so that all people are implementation agnostic.

From these fractions, we see that Person 2 has almost no fraction of contributions since Person 2 is not creative and also not a good communicator. However, if we look at the relational effects we know Person 3 would have no value without Person 2, and Person 1 would also have no value without Person 2. So how do we solve this credit assignment problem?

We can try to solve this problem by expressing it as a weighted graph which expressed relationships over time and the relationships of the fractions with respect to the world.

Graphical representation of this particular credit assignment problem: The world has 10^10 people (self-weight: 1). Person 1 (P1) has all the ideas that exist in the world (1) and can communicate to one other person in the world (1/10^10), that is P2 (1); P2 can communicate the ideas to one person in the world (1/10^10), which is P3 (1); P3 can communicate the idea to the entire world in an understandable way (1). Connections between P1-P2, and P2-P3 are bidirectional, meaning that it is important (a) to understand and (b) to communicate ideas.

How we weight the contribution of each person in this case? There are many answers to this, but here PageRank would be a good fit. PageRank works exactly as we discussed above, the credit is assigned comparatively, that is if we have a (1, 1000, 1000) distribution, the largest chunk of PageRank will be distributed by the single person. Thus it reflects our evaluation system. PageRank also takes into account the relationships between nodes and their recursive weight (standing on the shoulders of giants).

Using the scenario above, we find the contributions as follows:

Name	PageRank	Relative Contribution
P2	0.3450	0.4319
P1	0.2697	0.3376
P3	0.1841	0.2305

We see that P2 has the largest contribution despite being only the bridge between P1 and P3 who have the largest fractions (all the ideas and full communication abilities). However, P1’s success depends on P2, and P3’s success depends on P2 and as such P2 is the most critical link in the entire system.

This is quite insightful. If you understand some obscure research and communicate this to just a few researchers who, in turn, influence many other researchers then you will have made a substantial contribution to the deep learning community.

It would not feel this way because you will probably not experience any fame or recognition here. The recognition will come for P1 (having ideas) and P3 (communicating ideas). But still, the numbers do not lie here.

This experiment was quite interesting, and if you want to experiment a bit by yourself, you can download the code to see what happens if you add more people and more relationships among these people. This exercise can give quite some insight into what is valuable for research.

Response to Criticism on Reddit

There has been some sharp criticism on Reddit concerning ideas expressed in this blog post. The user metacurse makes the point that in science we credit usually those researchers who had the idea first and that communication and implementation are not valued. For example we value Albert Einstein more highly for the discovery of general relativity and the photoelectric effect and not its communication by Neil deGrasse Tyson; similarly, Cocks is credited for RSA even though he never implemented it in any way that was widely used (and he could not produce public implementations due to the classified status of RSA). However, this entire argument is rather weak and unfair:

I do not discuss who should be credited for an idea or the usage of the idea, I discuss who should be credited for the overall impact of an idea. These are very different questions.
He uses examples to try to prove his own hypothesis when we know that examples cannot prove anything (he uses classical philosophic techniques, which has some value, but it does not generate any reliable knowledge like analytical philosophy does). He mocks me for not using examples myself.
He appeals to the emotion of the readers, by saying that my views endorse unethical ideas like “stealing olds ideas and rebranding them as your own” when it has nothing to do with my argument (reductio ad Hitlerum). He does this quite successfully swaying many emotional readers. I do not think this is helpful.

To make a sharper contrast why metacurse’s argument is not relevant to mine take this thought experiment.

We have a super genius who knows about all possible ideas and writes them down so that everybody can understand it easily. Then she locks these notes away in a locker and dies the next second. Over the next billions of years humanity rediscovers all ideas and uses them to build a flourishing society where all living things live in harmony and every being is fulfilled and so forth. One second before the last human dies in heat death, that human discovers the notebook.

Metacurse’s argument would look for the answer to the question: Should our super genius be credited for inventing everything? Metacurse would argue, yes, and I would totally agree.

What I discuss in this blog post: How much impact did our super genius have on the overall impact of all ideas? Very little, she never had any direct or even indirect effect with any of the ideas; the only impact she had was that one other person understood that she had the ideas before others had them. That is the total impact of her ideas. Her impact is almost zero.

Conclusion

Here I discussed how it is best to think about contributions in deep learning. From thought experiments, we could see that ideas, their communication, and their implementation are equally important contributions.

We also discussed how timing effects and dependencies could be modeled in a relational graph. We found that people that link ideas to communicators can make substantial contributions to the research community even if they themselves are not creative or good communicators. Creating the links between influential ideas and influential communicators (or people that implement) are important here.

Comments

Murray Frank says
2018-01-03 at 02:48
Giving credit is a long debated problem. Frequently someone comes up with an idea that has a huge influence. Then other people say that in reality someone else had really thought of the idea earlier. Often such claims are true. In other cases you can see the essence of the idea but not the whole thing in the earlier work. In some cases we retroactively give credit. In other cases it does not happen. For example, Kuhn and Tucker came up with a standard theorem in optimization in 1951. Eventually people realized that it was also in Karush’s 1939 master’s thesis. To this day you will see the theorem called the Kuhn-Tucker theorem, and you will also see it listed as the Karush-Kuhn-Tucker theorem. There are many such examples.
Reply
- Tim Dettmers says
  2018-01-15 at 22:49
  There are many interesting examples indeed! Do you think this relates how past researchers communicated their work, or how “mature” their work is in general (master thesis vs full researchers).
  Reply
AdamP says
2017-11-15 at 08:01
Hi, just found this blog, great stuff!
Just a minor point – “Communication can be important even after publication. Just look at Immanuel Kant’s work, which is probably the most important philosophical work, yet it was not read for some time because nobody understood his ideas.”
I find that very strange, not a good example at all. “Probably the most important philosophical work” – I don’t know what that’s based on. “Arguably”, arguably, but ‘probably’?! I’ve never heard anyone claim that.
It’s news to me that Kant wasn’t read for some time. Whatever “some time” means. But I don’t think that’s right at all.
And “nobody understood his ideas” is even more murky. (There’s not even a single thing you could point to and call “an understanding of his ideas”, i.e. there are a wide range of interpretations, even to this day. What one person calls an understanding, to another is gross misunderstanding, etc.)
His Critiques have a repellent, almost impenetrable style, granted, maybe that’s what you meant.
p.s. Gauss invented the FFT, apparently, though it seems he never told anyone, not sure how much credit he deserves. I kept expecting to see his name on these pages in that connection. 🙂
Reply
- Tim Dettmers says
  2017-11-16 at 22:49
  I am talking about the “Critique of Pure Reason” here. Kant published it, and it was poorly received because people could not understand it. He rewrote it 6 years later, and suddenly people could actually understand his points, which in turn could help other people understand. Through this Kant became the most talked-about philosopher during that time.
  Reply
Karthikeyan Chittayil says
2017-09-30 at 07:53
Tim, I think you have a nice way of putting complex concepts in simple words, and elementary maths. Please keep it up. As you have brought it out, communication indeed is very important. Keep blogging !
Reply
- Tim Dettmers says
  2017-10-01 at 15:25
  Thank you, that means a lot of me!
  Reply
Yun Teng says
2017-09-28 at 03:16
Enlightening as always!
The saying “Those who can, do, those who can’t, teach” has always bothered me.
Because of that, I really liked your “Timing and Relational Effects” example with the PageRank, which showed that Person 2 was the most important, and even Person 3 had 0.2305 contribution.
To me, Person 2 is like a mentor/advisor and Person 3 is an instructor with many students, both roles having significant impact in the real world.
Reply
- Tim Dettmers says
  2017-09-29 at 14:52
  Indeed, I think this is a good way to think about Person 2 and Person 3.
  Reply
Alison B Lowndes says
2017-09-18 at 12:59
I will read this in full when I get chance but just wanted to add that if I’d listened to my Supervisor I’d have researched neural networks on CPU! I didn’t listen to him – which is lucky – because he also told me I’d never be able to recognise features in histology images!? Its a tough world out there so you just have to learn to be humble and courageous at the same time.
PS My Supervisor also told me to steer clear of your (ex) Supervisor!
PPS We still want to hire you!
Reply
- Tim Dettmers says
  2017-09-25 at 15:12
  Thanks for your comment, Alison. I really appreciate it! Indeed it can be messy with the wrong supervisor, but I must also say that it was a good experience for me since I learned a lot from that experience. With that, I will be able to make a better choice for my PhD advisor. So in the end, it was not so bad after all!
  Reply
Jeff says
2017-09-18 at 01:26
Hmm in other fields a lot of credit is given to the original person who came up with it, even if it wasn’t used or popularized right away. Like in computer graphics you give credit to the mathematician who came up with quaternions even if (as far as I know) they weren’t used for years any where else. It was just some obscure math. Likewise the guy who came up with plate tectonics was considered a quack when it was introduced, yet years later when we accept it we give him credit (even if he couldn’t popularize it). I think in a sense the purpose of academia and universities is to go beyond what’s necessarily useful today, to explore the far off distance, even if it isn’t worth popularizing right now (because there’s no use for it).
My understanding of Deep Learning is a lot of it got popularized due to faster computing machines, in particular GPUs. Certainly I believe the person who implemented DL on GPUs deserves a lot of credit for it, but I wouldn’t dismiss people who came beforehand with ideas because they didn’t implement it right away. (Actually this is kind of inspiring me to take a look into who first decided to use quaternions in graphics to see interesting early things they may have done.)
I was thinking maybe you’re coming from more of a corporate standpoint, where all that matters to you is utilization. But even in the corporate world credit to obscure ideas is given. An example is Apple. Popularizing GUIs and what have you. But we still give credit to Xerox, and even in interviews Steve Jobs discusses this!
In your examples you give this idea about someone being unable to communicate their ideas to the world. That makes sense to me, if they couldn’t get it out and it remained so obscure that it only remained in their minds, they probably don’t deserve much credit (like you say there wouldn’t even be proof). But if someone gets a publication out, that is no longer obscure, and I would say that’s a worthy of credit assignment.
Reply
- Tim Dettmers says
  2017-09-18 at 11:34
  You talk about who to credit for an idea. This blog post does not discuss this topic. This blog post discusses how the impact of the idea is distributed among people and thus how much credit people should receive.
  Xerox, of course, should be credited with the idea of the GUI. It was their original research. But who gets how much credit for the impact the idea of a GUI had over time?
  Communication can be important even after publication. Just look at Immanuel Kant’s work, which is probably the most important philosophical work, yet it was not read for some time because nobody understood his ideas. It was similar for the LSTM. People just could not understand the paper and thus the significance of LSTMs.
  Note that all these are mere examples which do not yield any reliable knowledge. You can look at it with the scientific method from other disciplines too, and I think this would be a better way to contribute to this discussion.
  For example, in social network analysis similar effects as I describe here are well known (central nodes in a network are strong even though their only merit is their network connectivity itself). You can see similar things in some games in game theory. This can be used to describe these effects mathematically and thus I believe these theories are better than using examples which have a hard time to prove an argument.
  Reply
  - Jeff says
    2017-09-18 at 14:22
    Ah sorry, I think I misunderstood your blog post originally, thought you were dismissing original credit. Impact isn’t something I have thought about seriously, and I think the topic is something that could easily be brushed aside for the status quo with lazy statements like “impact isn’t something I have thought about seriously” or with hostility to change. So with that said I think it’s good you’re questioning credit assignment, even if you are met with a lot of hostility. So thank you.
    I agree communication is important. I am very new to deep learning, and I find the initiatives within the field for improving communication to be extremely inspiring and helpful to me. Including your own work, especially your last blog post about research direction and computational efficiency. So thank you and I hope you continue to write.
    Reply
Rein Halbersma says
2017-09-16 at 21:36
Nice post! You could also interpret the credit assignment problem as a bargaining game in which each player bargains over the deployment of its assets (ideas, communication, implementation) to create something of value. Applying the tools from cooperative game theory, I would expect a solution concept like the Shapley-value to emerge as a fair credit assignment. Linking pins such as communicators connecting different communities also have great value in such bargaining games.
Reply
- Tim Dettmers says
  2017-09-16 at 21:56
  Thanks for your comment — this is a very interesting analogy! I think something like the Shapley-value and its problem fit this entire problem quite well and I would expect the solutions to be quite similar.
  Reply

Blog Posts Topics

Credit Assignment in Deep Learning

Who Deserves Credit for Deep Learning Ideas?

All Ideas, No Communication, No Implementation

All Ideas, 1 Communication + No Ideas, Full Communication

All Ideas, Full Communication, 1 Implementation

Comparative Individual Value For Collective Contributions

1 Ideas, 1000 Communication, 1000 Implementation

Timing and Relational Effects

Response to Criticism on Reddit

Conclusion

Related

Related Posts

Comments

Leave a Reply Cancel reply

Skip links

Main navigation

Who Deserves Credit for Deep Learning Ideas?

All Ideas, No Communication, No Implementation

All Ideas, 1 Communication + No Ideas, Full Communication

All Ideas, Full Communication, 1 Implementation

Comparative Individual Value For Collective Contributions

1 Ideas, 1000 Communication, 1000 Implementation

Timing and Relational Effects

Response to Criticism on Reddit

Conclusion

Related

Related Posts

Reader Interactions

Comments

Leave a Reply Cancel reply