If you are reading this, then you probably finished the long and arduous journey to grad school. You emerged victoriously, and this success is well-deserved. But which school should you choose? How to make a right choice if all schools look great in their own way? This blog post is centered around these questions. It is most useful if you are a computer science student aiming to study machine learning and, in particular, natural language processing in the US, but most of the information here is equally valid for any field of research and any country.
The choice of grad school that is right for you can be tricky and confusing. We live in a time of hyper-competitiveness, where even undergrads need to optimize for metrics like paper count to make it to the next level — grad school. This heavily career-centered perspective was probably advantageous to get you into grad school, and it remains crucial to get you to the level after that: a great job in industry or academia. So choosing the school which is best for your career can feel like an obvious choice. However, a PhD is a very long journey, and choosing your grad school based on this perspective alone might make you more vulnerable to burn-out, disillusionment, and general dissatisfaction.
In this blog post, I will discuss this career-centered perspective in detail, but I also provide you with three other views that hopefully help you make a balanced choice that not only leads to academic success but long-term satisfaction and a full and rich life. Balancing your decision based on all four perspectives probably leads to a better choice than looking at one angle alone. Before I go into the details, let me briefly introduce these four perspectives: The Career Perspective, the Identity Perspective, the Stability Perspective, and the Variability Perspective.
A quite intuitive perspective is the Career Perspective, which is about determining and weighing the factors that help you to be successful in your PhD and have a successful career.
A different perspective is the Identity Perspective: not looking at your career but at who you want to be and how your choice enables and facilitates that identity. The social environment that you are in has a strong causal effect on your development: We are strongly influenced by the people and culture around us, and the friends of friends that you do not even know will make you honest/deceitful, selfish/selfless, caring/exploiting, and so forth. If you choose a school where the unwritten motto is “The worth of a person is measured in papers and citations” you will slowly but surely grow to be a person that would live by such a motto. Would you like to be such a person? So by choosing a school you in some way also define and constrain the person that you can become.
The Stability Perspective says that choosing the “right” school is an illusion but that there are other choices that matter much more because they give you the stability that you need to succeed in the arduous PhD journey. It is well known that the effect of most moderately painful or enjoyable events that significantly affect your life will wear off within about two years and that you will return to your baseline happiness and stay there. However, some things are more stable. A great and friendly social environment where you always feel supported and not alone will provide you with the most human needs and will make a 5-year-or-so journey a breeze. On the other hand, a tiny research group with a distant advisor will make for an uncertain, lonely, and stressful 5 years.
Another valid way to select a school is by the variability of experience it will offer — the Variability Perspective. You probably sacrificed in some way to get into grad school. You neglected your passions outside work, neglected friends or your partner or your family, neglected self-development, neglected to work on your mental, physical or spiritual health, or you neglected other things that are important to you. By choosing the school that is best for your career, you might very well continue on this path of neglect. When does it stop? Once you have completed an excellent PhD, you might labor on by choosing that super competitive assistant professor job, then tenure, then being a leading figure in your field, and so on. There is nothing wrong with such a path through life, but continuous exploitation will lead to local minima. The two most common regrets of the dying are “I wish I’d had the courage to live a life true to myself, not the life others expected of me” and “I wish I hadn’t worked so hard.” The dying probably would have avoided their situation if they would have known better. Making sure you have the time and opportunity for further exploration is very helpful in gathering the information necessary to make better choices in the future that do not lead to regret.
The Career Perspective: Choosing Based on Expected Success
The career perspective looks at the most critical factors for your academic success and success beyond that and chooses the school that is best according to these factors. Let me go through each factor. I list the factors in order of importance, starting from the most important.
Finding suitable advisors is probably the most crucial task when choosing between grad schools. One could even go further and argue that one should not choose a school, but one should choose an advisor. A lousy advisor can make you miserable, unproductive, stressed, and might be the main reason why you would want to drop out of the program. The right advisor will help you to be productive, stay healthy, and help you to enjoy doing your research. It is important to emphasize personal fit here: Some advisors are great for you and bad for others and vice versa. The following criteria will help you identify advisors that might be better for you than others. However, there is a great deal of gut feeling to this decision. It is a bit similar to dating, even if everything is right on paper doesn’t mean this is the right person for you.
Another important note is that you should be looking for advisors and not a single advisor. This complicates an already complicated process, but it is risky to choose a school based on a single person. Relationships are complicated, and things might not work out as expected with your advisor. If possible, you should have an alternative advisor to whom you can switch if it does not work out with the other advisor. This strategy also offers the possibility of being co-advised — two advisors that complement each other may provide a great fit even though a single advisor might not.
The following advisor-related factors do not have a particular order.
Research style is probably the most elusive quality but also by far the most important quality that you acquire during your PhD. While many would say that the goal of a PhD is to become an independent researcher, the truth is that with the steep requirements for ML/NLP PhD positions, many students are already somewhat close to being independent. They can generate ideas with ease and execute them confidently in research projects. However, the actual quality that new students lack is research style.
Harriet Zuckerman is probably the person that studied scientific expertise to the greatest qualitative depth. In her work Scientific Elite, she interviewed almost every US Nobel prize laureate of the 20th century. She found that these individuals often rose through the ranks through accumulated advantage. One advantage helped them secure the next position/grant/collaboration, which increased their advantage and helped them secure the next one, and so forth. Zuckerman found that the main advantage gained through this ladder-climbing was not necessarily more resources (money, equipment), but having the opportunity to culturally adopt the research style of other successful scientists. Consistent with this, most future Nobel laureates have been advised by Nobel laureates or would be Nobel laureates. So good questions to ask are: Can my research advisor’s research style help me in my career? Do I want to be a researcher that follows the style of my advisor?
While your advisor will be the focal point of research culture, research culture is also created through interactions between your advisor’s lab students. It is usually subconsciously adopted over time. Most students might not be aware of how they were shaped by their advisor and research group. It happens automatically and does not necessarily require explicit thinking or effort.
To give you some examples of what facets of a research style might look like:
- Ideas are cheap and belong to the research group. Execution of those ideas as research projects is real research.
- Novel ideas are everything. If someone publishes something even remotely similar to what you have, you should give up the project and work on something nobody is working on.
- Good science is good math. A paper should be mathematically solid so that it will stand for years, holding valuable insights and generalizations that go beyond the current theoretical application.
- Good science is robust science. A paper should have careful claims with robust evidence. This will help make the field progress more quickly by providing reliable information to build on.
- Good science is a good research vision. A paper should be about what is possible in the future and where a line of research could lead to. Evidence augments vision, but a paper without vision is blind, incremental, and will be forgotten.
- Good science is good insight. Some insights can be extrapolated and be applied to many other scientific problems, many of which have not been formulated yet. Finding and expressing these insights is vital for scientific progress.
- It is all about productivity. Research is inherently noisy and messy, and it’s tough to predict the outcome of an idea or set of experiments in the development stage. Navigating this uncertainty is best done through fast iterations and balancing multiple projects to maximize the chances of a big success.
- Good science is collaborative. Different people can bring unique perspectives to a project and increase the chance of serendipitous insights. Collaborations bring the best out of people and can result in a sum that is larger than its parts.
- Good science is solitary. To gain the deepest insights into a problem, one has to understand a problem in its fullness without outside help. While collaborators can join later, the all-encompassing understanding of a problem through solitary pursuit is critical to tackling the most important scientific problems and for growing as a researcher.
These are just some examples, and usually, a research style is made up of a multitude of facets like these. Research style is complex but can best be encapsulated by the questions “What does good/bad research look like?” However, if you ask these questions during the visit days, you often find that people answer what they think good research is supposed to look like, rather than what it looks like for them. Therefore, better questions for visit days are:
- “What are examples of research papers you like?”
- “What research papers (in your area) do you think are the most important ones in the last years?”
These questions often reveal what people think are important problems and the “correct” manner of approaching these problems. Both qualities are strongly related to research style.
Since the acquisition of research style is largly automatic and subconscious, it is crucial to understand which research style you will be adopting by joining a particular school and lab/advisor. So, what can such adoption look like?
Take a friend as an example who, at the start of the PhD, can be culturally described as a minimalistic hacker-researcher—someone who tinkers around with minimal changes to a system to improve it in a simple manner. He teamed up with an insight-driven neat professor for his PhD. After a couple of years in his PhD, he learned to be an insight-driven hacker. He builds hacks, understands the deep relationships of how his hack affects the system, and then extracts this insight in the most minimalistic and well-formulated way possible along with his practical hack. The combination is a pretty potent mix: the minimalistic insight-driven hacker-researcher. This person finds small hacks that yield robust results and insights into how other research and the hack relate to other concepts.
One friend described me as a product-driven experimental hacker, meaning someone that rapidly prototypes changes to a system and tests them experimentally for reliable effects. If reliable effects are found, the hack is extracted into a product that other researchers can easily use. I was pretty surprised by that view at first, but I now think it pretty much hits the nail on the head.
Some friends I would describe as:
- concept-centered experimental visionary
- gregarious cool-stuff-can-be-good-science collaborator
- mind-the-gap collaborator
- principled neat-and-tidy collaborator
- I-like-cool-stuff researcher
It is important to note that there is neither right or wrong nor good or bad research style. For example, in Zuckerman’s work, two Nobel Laureates in the same field would sometimes have radically opposite research styles, yet each different research style made both Nobel Laureates and their students successful. Similarly, while an I-like-cool-stuff style sounds unimpressive, the Feynman-like playfulness of an I-like-cool-stuff researcher might lead to significant discoveries that others overlook because others do not deem these problems serious enough to consider working on them.
Looking at my friends, they often came in with a particular mindset, and looking at them now, they very clearly adopted the central cultural tenet of their research environment.
It can be very empowering to enter grad school with this view. A friend of mine entered grad school and, upon hearing this interpretation, actively thought about how he could augment his research style with a particular advisor’s style. He switched advisors until he found the right ones. Then he leaned in and tried to adopt the advisor’s central cultural research facet as quickly as possible. My friend’s primary research advisor told him four years into the PhD that there is nothing left that he can teach him and that he should graduate and move on to learn more. I was not surprised and think it directly related to my friend’s viewpoint that adopting a research style and developing research taste is the most crucial element of a PhD.
So while elusive and hard to define, the research style of a particular advisor or department can be an essential consideration to choosing the grad school that is right for you.
While the following sections will dive into other angles on choosing potential advisors, they can also be interpreted as sub-component of research style, particularly the advisor values section.
Advisor Research Fit
Students often do not know what to look for in an advisor and often cling to the idea that they need to find an advisor that is interested in the same research that they want to do. There is some truth to this idea, but this idea is more dangerous than it is helpful. From my friends in the 2nd year, about 66% changed their research direction completely — many of them in the first year. That number is higher if I look at later years. Most of them still work in their subfield (robotics/NLP/vision), but they switched to different research areas in those fields. Some examples:
- Multilingual parsing -> multilingual models -> machine translation
- question answering -> dialog -> reinforcement learning -> semantic parsing
- NLP architectures -> machine translation -> model efficiency
- human pose recognition -> sim2real
- question answering -> model efficiency -> interpretability -> model efficiency
What you see from these transitions is that an exact fit is not needed with advisors since your research interests will change. The same is true for your potential advisors: they might no longer be interested in research that they are well known for, or they might be interested in a direction which they have not yet published in. Compared to students, advisors have much more breadth and might be equally interested in many different research directions at once. Furthermore, while new professors are often compelled to stick closer to a specific research direction until they get tenure, tenured professors can be very flexible in research directions, and their interest might also be influenced significantly by the interests of their students. More senior professors are often happy to take on a completely new research direction that is interesting to you and compelling to them — this can be the advantage of hands-off advisors, which I will talk about in the next section.
Despite the overall fluidity of research interests of both you and your advisor, it is a good idea to have at least some overlap. It might be worth asking about the advisor’s long-term research vision, but be aware that such plans are often not well fleshed out and can change quickly based on changes in the field (e.g., BERT). It might also be worth looking at the values of an advisor because they are rather stable over time, and they can hint which kind of research they like — more on values later.
Advising style: Hands-on vs Hands-off
Advising styles can be mainly separated into hands-on and hands-off styles. What does this mean?
In general, what you can expect from a hands-off advisor is that you do all the work, and your advisor gives you feedback on what you have produced. For a hands-on style, the advisor also helps with the producing in some way.
More concretely, a hands-on advisor might be helping you with many details of your research: Brainstorming research ideas, discussing research ideas and problems in detail, help define research problems and ideas, thinking about a narrative for your paper, formulating claims, structuring research project into certain pieces with milestones, checking in frequently to discuss partial results, discussing programming problems and bugs, providing rapid feedback, steering the project to prevent failure, providing detailed feedback for the write-up, providing detailed feedback for presentation slides – all of these are signs for a hands-on advising style.
Hands off advisor will be helping you with high-level details of your research: Discussing viability and impact of a research idea, discussing research narrative/pitches/claims, discussing research results, providing (high-level) feedback on final paper draft and slides. Working with a very hands-off advisor has many benefits, but in terms of direct help and interaction with your hands-off advisor you often cannot expect much more than I list here.
The hands-on / hands-off dichotomy is a continuum — usually, an advisor exhibits a mix of these traits. For example, some advisors might be very hands-off, but are very involved in idea generation, while yet others really like to give detailed feedback on writing. Usually, advisors also adjust slightly to the needs of each student and can be more hands-on in research areas where he or she is well-established. It is useful to talk to students to get the exact details in which areas the advisor is hands-on or hands-off. Areas here can refer to activity areas (help with writing, brainstorming ideas, thinking about a research story, etc.), technical areas (helping with bugs/code, finding the right software framework), and research areas (machine translation, question answering, etc.). So you should not ask students, “Is your advisor hands-on or hands-off?”, but instead you should ask, “Is your advisor hands-on with giving feedback on writing?” and so forth. Ask about the areas that are most important to you (your weak areas).
A hands-on advisor is great if you are less experienced in research, need more structure and deadlines, are unsure about potential research topics, and are externally motivated. A hands-off advisor is great if you want more freedom and independence, and also, if you want to learn more through failure and adversity — being on your own for a good portion of the PhD can be difficult, but it also makes you a better independent researcher. If an advisor is not overly helpful, that is great for you in the long-term, but it can be difficult for you in the short-term, especially from in the first year or if you need to navigate important milestones such as conference deadlines.
Usually, hands-off advisors are more senior and can also provide more connections for internships and collaborations and are able to link ideas to some good-old research ideas that most people forgot about. They usually also have a more extensive lab with postdocs and a range of senior PhD students, which can provide valuable hands-on advice.
A hands-on advisor can usually develop you in more detail and, in the ideal case, will provide a gradual increase of independence towards the end of your PhD. Through this process, you will become similar to your advisor since a hands-on advisor develops you in his or her image. This can be a good or bad thing, depending on what you want to do with your career. If your hands-on advisor’s research vision is highly sought after in industry or academia, it is an advantage; if the market is saturated with the same research vision, you are just another fish in the sea.
Advisor Values, Strengths, and Weaknesses
What does the advisor care about? This is often overlooked, but the values of your advisor can make or break a good fit. It also defines the environment within the research group. Why do values matter?
As noted above, interests change all the time, but values are much more stable and vital for a healthy relationship. While differences in interests are often fine (machine translation vs question answering), differences in values can create conflicts (overclaiming vs underclaiming). In general, you want the same as in any relationship: Share as many values as possible and have differences in strengths and weaknesses which complement each other. So what do values in an academic relationship look like?
Neats vs Hackers
One fundamental difference in academic values is if your advisor is a Neat or a Hacker: A Neat is someone that values systematic investigation, sound assumptions, proofs, precise claims, and theoretical progress whereas a hacker believes that adherence to rigid schemes slows down progress. A neat is careful in their methodology, cautious in making claims, and lets results speak for themselves: “Another solid result for the literature.” A Hacker first and foremost values results and their impact and practicality: anything “that works” is acceptable. A hacker values unconstrained exploration, integration of things “that work” and progress that makes a difference in the real world. Hackers are usually less careful with making precise claims because they believe it is more important to think about the (yet unproven) potential and possibilities of an idea. Hackers like to show off their work: “Look at this cool method — the results are unbelievable!”
None of these roles is inherently more valuable than another — both are needed to make progress in science. The best results in science often come from critical discussion and work across these camps.
This is also a continuum. I am a Hacker at heart, but I get offended if someone misuses (or does not use) statistics or if someone makes theoretical claims built on weak theoretical foundations.
Discretion and In-group cohesion
Does the advisor value discretion, privacy, and is open at the same time? This encourages honesty and directness between you and your advisor, but you might know less about what other students work on and how they make progress on their project. A lack of such information might feel isolating. On the contrary, an advisor that tells you about his or her other student’s projects and progress makes it easy for you to get involved with other students, which facilitates in-group collaboration and cohesion — you stick together and support each other, and it feels a bit like a family. The problem with that is that if you say something, everybody will know soon enough — so you need to be careful what you say which can be stressful and can lead to a culture of closedness or faking: “Everything is okay with my project — I do not need help!”
Well-being and Research Progress
Does the advisor value your well-being over research progress or vice versa? An advisor who values your well-being will make sure that there is freedom for work-life balance and that your 1-on-1 meetings are not only about research. While your mental health and stress levels are first and foremost your responsibility, an empathetic advisor will be able to see if you are overdoing it and can offer guidance to avoid overwork and burn-out. On the other hand, such an advisor makes it easier for you to slack off and have research projects slide into oblivion, which can stop progress and make you feel depressed or make you feel like a failure.
Advisors that push you to your limit to do research might be a good fit if you need some pushing to be productive. However, too much pushing, or if you do not like to be pushed, might cause burn-out, high stress, or might make you anxious to meet the high expectations of your advisor.
Does the advisor value sharp, direct criticism or indirect, gentle hinting that something is off? If your advisor values head-on criticism, he or she will call out bullshit and tell you how much your project idea sucks. This is difficult to take as a student that labored hard on a research idea. On the other hand, you do not need to waste more time on this idea and can move on. If you are able to remain calm and digest such feedback, then you might be able to quickly adjust an idea and make it work. With such an advisor, you know a research idea is air-tight if he or she gives you good feedback, and it makes you proud that this idea “passed your advisor.” From there, it is easy to move on and work on the idea.
An advisor with an indirect communication style will hint that something is off, but you might not know what or why. That can make progress slow, or it can create considerable uncertainty if your project is any good even months into the project. However, your feelings are not hurt with such a communication style. Furthermore, this indirectness might also demonstrate the intellectual humbleness of your advisor: if an experienced advisor believes he or she can be wrong, it might open up the possibility for a candid dialog of exploring what is true and what is not. This is an admirable quality that many intellectuals value highly, and it might rub off onto you. In the long-term, indirect communication has the advantage that you need to think about problems more by yourself, which makes you more independent and a better researcher.
Strengths and Weaknesses
As noted before, beyond values, it is also essential to think about how you and your advisor’s strengths and weaknesses complement each other. This is generally important for collaborations. For example, you might be great at executing research projects quickly so that you get the evidence to decide along which path you push the project, but you might be bad at generating good research ideas. An advisor that matches your core values and complements your weaknesses — idea generation in this case — will make a great tag team partner and will make it easy to wrestle those challenging research projects into submission. On the other hand, sharing weaknesses can make you and your advisor blind to problems in your research. Good advisors will recognize your weaknesses and strengths and will try to complement your style of research.
Self-reflection Key to Good Decision
To understand the relationship between values, strengths, and weaknesses between you and your potential advisor, it might be well worth it to find some time for a session or two of careful self-reflection to understand who you are and how you align with potential advisors and schools. Beyond alignment, it might also help you to identify schools and advisors, which possibly facilitate growth toward specific values and strengths that you cherish but not yet possess.
Some questions that could get you started: Can you deal with direct, sharp criticism? How much do you value your privacy? How much honest and open conversation? Are you more like a Hacker or more like a Neat? Are you a “family person” and favor very close cohesion within the research group? How self-motivated are you? Do you need deadlines and milestones to keep you motivated and on track? Do you work well if someone pushes you? How much work-life balance do you need?
Advisor Availability and Absent-mindedness
Availability does make a difference. It is better to have more frequent meetings, even if these meetings are with more hands-off advisors, and you have no results. If an advisor also works at a startup/company or has many students, it might be that meetings are infrequent, canceled, postponed, and additional meetings in the time of need are not possible. However, availability is not only about your advisor’s busy schedule but also their attitude. For some advisors, student meetings are “holy” and are rarely canceled or rescheduled. Some advisors are also open for frequent meetings in times of need while others are not.
Another closely related factor is absent-mindedness. There are advisors who forget about projects, and you have to explain them over and over again what you are working on. Even if an advisor is available, a certain degree of absent-mindedness can make interactions frustrating and unproductive. On the other hand, similarly to a hands-off advisor, this forces you to think carefully about your project and formulate exact problems before a meeting, which will make you a better researcher in the long-term. Being able to formulate your project as a concise elevator pitch with a precise definition is a highly valuable skill that will impress anyone whom you meet at a conference. The other extreme are advisors that reserve blocks of time to think about your project outside of meetings just on their own — which has obvious benefits: better feedback, guidance, and new angles to the project, which might improve it significantly. On the other hand, this can make you dependent on your advisors’ thoughts, which can prevent you from becoming an excellent independent researcher.
Peers, Postdocs, and Research Group
The peers and the research group are the second most important factor to go to a school, and this factor is not far behind the advisor in importance. Regarding research interest, it is a bit similar to advisors: Your peer’s research interests change over time but will usually stay in a related area. As such, it might be possible to have long and fruitful collaborations with particular students, but probably it is more realistic to see people in your research group as peers with whom to discuss research ideas and get feedback from.
But there are other things which are robust over time, such as general interests and values. During your visit days, you get to know some of your possible peers — both other admits and people in research groups — and sometimes it is evident when you “click”. Although it is difficult to get to know people in detail in this short time, a group of people that you click with might be a good reason to go to that school. If you have a friend who supports you through the difficult times and who challenges you to grow will be very helpful in your journey through a PhD and beyond.
Beyond individual peers, you should also consider the research group of your potential advisor in your choice of grad school. The dynamics of the research group are quite revealing about the norms and values of the research group. The values of the advisor (see above) shapes the dynamics in the research group strongly. You can use a similar framework as presented above to assess the values and expectations of your peers within a research group.
Another critical view on research groups are the power dynamics and diversity, which are strong predictors for the success of the overall research group. Research says groups work best if a powerful individual brings together people with very diverse backgrounds, views, and experiences, and once they are among themselves gives up his or her power and lets these people collaborate on an even playing field.
Diversity is particularly important for creative endeavors because diversity helps to prevent echo-chambers. Let’s say you have a group of hackers that reads about a new research method A:
Hacker 1: “Wow method A is so exciting. The results on Task C are so great! It would be so cool to mash it together with method B and try it on task D!”
Hacker 2: “You are right, that would be so interesting!”
Hacker 1: “Let’s do it! Let’s hack it together!”
It is a very different dynamics if you add some neats into the mix:
Neat 1: “From Author, et al. 2020, I know that the standard deviation on task C is quite high and I think confidence intervals from method A would overlap with method X — the results from method A do not seem any better than results from the simpler method X. So, by Occam’s razor, I do not think there is any reason to extend method A.”
Neat 2: “I think their performance is mostly explained by their unusual initialization rather than method A. With that initialization you expect lower relative differences between the eigenvalues of the Hessian and thus faster training —so I think the number of epochs is a confounding factor and their comparison is invalid. I do not believe method A is actually better. They should have used the same initialization or at least done a grid search over learning rates and epochs for a proper comparison.”
This is an example of a neat vs hacker debate, but the same goes for many other traits and values. For example, if you have only people who discuss ideas with direct, blunt criticism, the interactions can feel pretty overwhelming and intense, and good ideas might be lost within the group because it is too tiresome to talk about it. Instead, a mix of playful and serious people might be able to balance the free idea generation with rigor and carefulness.
Other extreme dichotomies might include theory vs applications thinking: “Life is temporary, only proofs are eternal.” vs “If you make the “greatest” invention ever and it does not affect a single life, then what is the point of that?” Quantitative vs qualitative thinking: “If you cannot measure it, it does not exist!” vs “Try to measure how much you love your spouse and then tell me which number it is — it does not work!” There are probably many more of these extremes.
Of course, virtually nobody believes in these statements, but some people identify more with one than the other, and having a healthy mix of each of these perspectives within a research group prevents groupthink, bias, and unreasonable extremes.
Postdocs and Senior PhD Students
Briefly mentioned above, postdocs and senior PhDs can also have a tremendous impact on the advising situation and should be considered carefully in your choice. If your advisor has postdocs and senior PhDs which frequently collaborate with new PhD students, it can be a big win for both parties: You get additional hands-on experience, and a research perspective which is different from your advisor (especially with postdocs) and they might be able to get another publication before they move on to the next job. Having senior PhDs and postdocs is, in particular, valuable if your potential advisor is hands-off — in this case, you can get the best of both worlds in terms of advising.
Other important factors for a good research group are how much ideas are shared and discussed (what happens in a regular research group meeting) and how much students collaborate (easy to check by looking at their publications). The degree of collaboration is also a good proxy of group cohesion. I will talk a bit more about the importance of socializing in research groups further below in the “Stability Perspective” section, and I will not repeat myself here.
School Name and Resources
To make rational choices about the prestige of a school, it is essential to understand why it actually matters.
The scientific reason why school names matter is that they represent a proxy of accumulated advantage, which is a good predictor of current performance. Cumulative advantage is the idea that the more privilege you had in life, the more likely you had the resources (money, educated parents, mentoring, good peers, free time, extracurricular activities, extensive social network) to do well (rapid development, good grades) and this gives you more resources (better schools, better jobs, better connections) to do even better (promotions, tenure, grants) which yields even more resources (even more extensive social network, collaborations, grants, funding) to do even better (Nobel prize, Fields Medal, unicorn startups).
The distribution of advantage at any of these stages is highly unequal with the top few percents being the most productive and gaining the most resources: ⅓ of the US population get a Bachelor’s degree, 2% a PhD, 0.2% a top 20 undergrad degree, 0.06% a tenure track position, 0.0006% of people publish 41% of papers in research journals. But at the same time, at a top school, 73% of PhD positions are given to people with undergrads from the top 20 schools, and the top 18 schools produce 50% of professors. We can do some back-of-the-envelope calculation with these data by making some simplifying assumptions to calculate the probabilities of becoming a professor if you do a bachelor or a PhD at a top 18 school. If we assume that the 50% of professor from the top 18 schools are equally distributed then 1/36 of all professors come from each top 18 school.
Thus if you do a PhD at a top 20 school, your prior probability of becoming a professor jumps from 0.06% to about 2.8% — about 50 times more probable, but still only as likely as rolling two sixes with a pair of dice. This means you can increase your chances dramatically by choosing a prestigious school, but the odds are still heavily stacked against you. Similar statistics hold true for making other choices based on prestige or school ranking. Making a choice based on school ranking alone will probably not lead to success. Other factors, like a great advisor, great peers, a productive research group, school culture, and social opportunities, are probably more critical for success.
Some Failure and Adversity is Critical for Success
A different perspective that might seem unintuitive at first is that a long streak of privilege can have harmful long-term consequences for you. Failure and adversity are great tools for personal growth and growth as a researcher. This is a well-established finding in psychology: To succeed in life, you need to fail sometimes but not too often. The intuition is that the extremes of privilege or adversity lead to poor mental models of perfectionism and learned helplessness, respectively, while occasional failures lead to a mindset of learned industriousness. This means, too much privilege will make you afraid to take risks and fail because you never failed before. Occasional failure will make you resilient because you know adversity is normal and temporary — a mindset that enables the pursuit of creative but risky ideas.
For example, if you are at a top 20 school, it might be expected of you that you behave like a top 20 school researcher: Publish many world-class papers in a short period of time. Such a competitive environment might encourage “safe” research that is easily publishable over creative research that is prone to fail. Such a school, while providing a boost in privilege and resources, might prevent you from becoming a successful and creative researcher in the long-term. Challenging yourself in a non-perfectionist way is important — make sure there is enough opportunity for lessons learned through failure at the school that you choose.
Similar strategies are also used in industry: Failing a startup is often seen as a requirement for founding a successful startup. Joining a scrappy startup might make you a skilled engineer while joining a big tech company might lead to stagnation in skill — you are just another cog in the machine.
After the release of BERT, some of my peers felt energized by the exciting results of large pretrained models, but equally many of them seemed defeated. It must be excruciating to see your research sub-field and the research that you worked so hard on being crushed by the simple idea of throwing more GPUs and data at the problem. But this is the reality that we live in. With the advent of GPT-4, we reach another of such critical points, and one should take care that one is in a position where one can do meaningful research after GPT-4. What does this mean?
In most sub-fields, there are general ideas that are unaffected by scale or even incorporate scaling into their outlook. As such, hope is not lost even if your research will be affected by GPT-4. However, this may mean that the research you are doing will be very different before and after GPT-4—just like it was with BERT. While it is very unpredictable how research will shift with models like GPT-4, it can be worthwhile to think of possible ways your research could adapt in specific scenarios.
While it might be completely counter to your current research and values, it might be helpful to think about hypothetical worst-case scaling scenarios. Consider this example that may border on outrageousness depending on your values. While most bias and fairness research in *CL conferences mostly frames large pretrained models as a problem, there is already a hint that scaling models might solve such issues if used correctly. As such, a worst-case hypothetical to ask yourself could be: “If scaling laws for bias and fairness show that scaling resolves bias and fairness eventually with scale, what would you do?” An answer to this particular case might be among others:
- Shift your research to the broader perspective of human preferences, which already has a foothold into scaling.
- Incorporate scaling laws into your research and analyze the properties of models/data/methods that lead to improved bias scaling, particularly at a smaller scale or with fewer resources.
- Try to scale differently from computing and data, for example, by using reinforcement learning on bias and fairness user feedback data.
- Think about alternative research sub-fields you might want to switch to if this happens.
- Relate bias and fairness to some effect known to be a factor at scale. For example, how does memorization relate to bias and fairness?
The next question to ask yourself is this: how many computational resources will I need to proceed in this manner? And with that, it follows: will I have these computational resources if I join a particular school over another?
School location (campus & city)
I will not elaborate here since I will address important considerations for these factors, mostly in the Stability and Variability Perspective. To foreshadow a bit how to think about these: The campus and city with its possibilities will offer opportunities that help you to do the things that you know will ground you and make you stable so that you can sustain the difficult journey that a PHD is (stability). Each city and campus also offers a different range of new activities and experiences (variability), which help you to explore who you are and what you like and make you a fuller, more vibrant human being.
The right way to think about this factor is very personal and can either be an insignificant factor or a factor which is more important than the other factors listed before — it is worth it to stop, think and reflect quite a bit on this. But more on that in the Stability and Variability Perspective sections.
There are other factors that I could write about, but they are not that important. Housing, living costs, stipend / salary are not that important. There are differences, and one school might pay more than another or has lower living costs, but the outcome will be the same — you will not be rich, you will not be poor, and it does not matter what your living arrangements are it will feel like home eventually. At some universities, you can work part-time in an industry research lab, and you can make much more money, but it also adds extra complications.
It also makes sense to consider the university culture and the research group culture, but these are quite closely tied to the values that your potential advisor, peers, and research group holds. Culture is also closely related to the Identity Perspective, which I will introduce next.
The Identity Perspective: Who do you want to be?
If you choose a particular school, you will be actively shaped by the environment you live in and the people you interact with on a day-to-day basis. The identity perspective is then the perspective where you try to optimize for the person that you might become. While the career perspective looks at the question “How much success am I expected to have?” the identity perspective looks at the person that you might become and asks, “Do I want to be that person?”
Choosing a school based on who you think you will be is a very personal and subjective choice. I do not believe some specific examples will help you understand how to think about this choice. Instead, I want to give you my personal experience and how that experience shaped my belief about the person that I might become for each school. I gathered most of this experience during visit days where people tend to show their best selves, but I also experienced interactions at conferences and internships, which might have been a bit more authentic. I believe the aggregate identities can give you an accurate picture of the person that you might become.
My Experience at Visit Days
While I have mixed experiences from students of most schools, there is one school from which I never met someone that was nice or treated me with decency. They often ask where I study, and if the answer is not the right university, they will move on to people that actually study or studied at these “right” universities. Sometimes, they would look at my badge if it showed my university (Università della Svizzera italiana); they would proceed to ignore me in a conversation with other people. During the visit days of one school, someone looked at my badge that displayed my undergrad university, The Open University, and said: “Wow! It is nice that they give people like you a chance!” He left before I could respond. I am sure there are friendly people at these places, but if I meet 15 people from the university and they are all very superficial and disrespectful, that is quite telling. So do I want to be a vain, shallow, and rude person? No, thanks.
At another school, I had the most alienating and isolating experience I have had during my visit days. People from elite universities formed cliques and did not let other people in. My accommodation that I had to make because of flight scheduling was not paid for. I felt as if I was being made fun of for my food preferences. My meeting with a potential advisor was botched, and I needed to share a time-slot with another student. This happened twice. One potential advisor was not there and did not try to contact me before/after the visit days. Another potential advisor belittled me in the meeting I had with him. For many people that I met at that school, it seemed that they felt the need to put on a happy face when they were actually very sad or stressed. Do I want to be a person that supports an environment where anything goes, where deception is the norm? Do I want to be a person that feels the need to show off a “happy face” even when miserable? No, thanks.
My Experience at the University of Washington
In stark contrast, the environment at the University of Washington (UW) was designed, so nothing can go wrong. I felt that everyone had a good time at the UW visit days. That at least shows that people are conscious and aware of social dynamics. What was most striking to me was the student panel: A visitor brought up the question of mental health and stress during the PhD. The panel went all out, talked about their mental health issues, and how they coped with it and what mental health resources are at your disposal as a UW student. Similarly, many students were honest, open, and emphasized that the students are a team and that they look after each other. They also made clear that time outside of work is very important to them. One other very endearing thing to me was how friendly people in Seattle are in general. When I arrived in New York, I could immediately feel the tension and impatience. The opposite is true for Seattle. There is a certain gentleness about things. Drivers are more patient and responsible. In Seattle, people exit through the backdoor of articulated buses and yell to the bus driver: “Thank you!” And in their faces, you can see that they do not just say that for show or to conform to social norms — they mean it. So, do I want to be an honest and open student that admits to his struggles, supports his peers, is collaborative, has a life outside work, and who kindly thanks the bus driver? Yes, please.
You might think it is silly that you decide on a grad school based on if people in that city thank the bus driver. But really it is not! Since I started at UW, I have embraced being a bus-driver-thanking person. Maybe this made me more kind. Maybe that makes me more appreciative of the hard work that all the people around me do. Maybe it made me write this blog post with which I hope to help you in your difficult choice. Maybe, if I would have chosen another school, this blog post would be about a cool startup idea; or perhaps I would just have spent that extra time on more research. Identities matter. With the choice of studying at UW, I, in part, chose who I want to be.
If I take off my UW hat, I can see that one could also have a very different interpretation. Maybe the mean people that disrespected me at the elite school were just protecting their limited resources by giving time to people that probably mattered more. Perhaps the lovely people from UW are socially naive — trying to make everybody happy, which is clearly not possible. Maybe it is wiser to concentrate resources where they matter.
It can also be different for different people. I have a good friend who is very blunt and direct — totally normal for the country he is coming from where this bluntness is a signal of trust and honesty. However, calling out bullshit in a blunt and direct way does not fit in nicely with the UW identity and culture, and that can lead to misunderstandings and problems.
The Stability Perspective: Schools do not matter, but what does?
Grad school is incredibly tough: By definition, the final goal of the PhD is to gain the skill to independently explore and confront the unknown to produce new knowledge. This requires a lot of self-motivation, enduring failure and rejection, and lots of hard work. I often heard how difficult a PhD is, but I did not believe it. Now I understand what that means. And it has not only been tough for me, but for most of my peers. As such, you want to have something to cling on to that stabilizes you and helps you to cope and enjoy the experience.
The stability perspective acknowledges that many factors, such as happiness, gradually revert to a personal set-point while other elements are stable over time and provide you with energy, resilience, and courage in the long-term. As such, the stability perspective is about prioritizing factors that you know will help you to have a successful grad school experience in the long-term. Research shows that relationships are the most important and stable source of well-being. So it is crucial to look at the social environment when you choose a grad school.
Usually, within a grad school, the social environments are the office, research group meetings, other group meetings, lunchtime, and social activities organized by the department or by grad organizations. Fewer research groups have social outings as a group, but that definitely helps to make grad school more enjoyable and manageable.
One reason why I really wanted to go to University College London (UCL) was that I already knew the people there, and they were super friendly and helpful. Sebastian Riedel is an absolute great advisor and a very wise person, and it was a joy working with him. But another important reason why I wanted to join UCL was actually the daily lunches.
Someone would announce “lunch” in the office. Some would go downstairs to buy some food. Some people brought some food. Then we sit all around the table. We would chat about our everyday life, everyday problems. Sometimes our passions or the one or other curiosity. Some politics and news. Sometimes some research ideas. It felt like a family where people cared for each other. It was great! It gave me the energy and focus to do great work even after lunchtime when I am usually tired and less focused. If I would know, I could get this experience at a school or research group, this would definitely be part of the reason why I choose that school.
A great source of stability for me at the University of Washington (UW) has been my office. As desks freed up, we moved more and more NLP people into our office. Now we have an NLP office where we chat about research, support each other for deadlines, and in general, take care of each other, and it feels great. If I had known that I can have all these great, friendly peers around me, this would have been another reason for UW.
The office environment, group meetings, and having more social lunches is something that will keep you stable and mentally tough throughout the PhD and can be a valid and important reason to choose one school over another.
Beyond the social environment, there can also be fundamental personal reasons to prefer one school over another. These are usually not discussed much because they are too distinct — I will give you some examples anyway, which might be a guide on how to think about your personal reasons.
For me personally, Stanford was one of the top choices. Stanford is impressive academically, and I had a great fit with potential advisors there. However, one other thing that stood out for me was the bicycle track around Stanford. I am an avid inline-skater — or rollerblader, as it is called in the US — and inline-skating gives me a lot of stability. It is vital for my mental health. The joy and freedom I feel while skating helps me to get through the dark periods in my life. The bicycle track around Stanford is an absolute dream if you are a skater: Very smooth, flat, dry weather. I imagined myself getting up at 5:00 am and skating every morning through a deserted campus — what a pleasant thought!
Another popular topic are relationships, family, and friends. Most people do a PhD because it is their passion. If you can combine your passion with a great partner, it could give you all you need to flourish as a person. If you can go to a school together with your partner, this can be a good reason to choose one school over another. If you make this choice, you should, however, also beware that doing a PhD is a significant stressor for a relationship, and you should think about if you would like to stay at a particular school if your relationship would end. From my friends who started the PhD with me, for most of them, including for myself, their relationship ended partly due to doing a PhD. A PhD is not easy for a relationship: Moving across country or continent, adjusting to a new culture, working long-hours, little pay, night shifts, and high-stress before deadlines. The pressure and stress from a PhD can make you depressed, anxious, absent-minded, and unresponsive — not the ideal state to be in for a relationship. You get used to this and learn how to handle these stressors, but in particular, when you do your first year of as PhD it can be an enormous strain on your relationship. On the other hand, being able to bring your partner along or to reunite with your family will give you great strength and motivation. You will be able to push harder and further with your research. You will be able to cope better and recuperate faster. A PhD is challenging, and having the most important person close behind you makes it much more manageable.
The Variability Perspective: The possibility of a better you
The stability perspective was about choosing a school based on factors that you know will stabilize you so that you can do your best work. The variability perspective is about choosing a school based on possibilities that will enable you to become your full self — a flourishing human being. Possibilities mean you do not know for sure that these factors are important to you, but you have a hunch, a feeling, a common thread through your life that makes it look like that you just need to try certain things. Schools that enable you to explore certain unexplored parts of yourself and your interests have the potential to make you mature and a fuller human being with the right breadth and depth of experiences. Schools with low or the wrong kind of variability do bear the risk that at one point in life, you will stop, and feel that you lived a life that was not your own.
But it is not only about experiences per se but also about memories. Even the greatest moments pass, and your happiness will regress back to its mean — but memories will stick with you. Memories that you create will be your own for your entire life. However, if you look back at your most precious memories, it probably is not the time when you hit the library and studied really, really hard for that test. More likely, it is about a unique experience and moments which are emotionally meaningful, and it involves people that you care about. How likely are you going to have these exceptional moments at a school where it is common culture to work really hard on weekends to get in that extra paper for the next conference deadline? How likely are you going to have these exceptional moments at a school where the school is deserted on the weekend, and your advisor tells you “It is time to submit the final paper draft” 10:00pm on the deadline night, even though the deadline is 4:00am the next morning?
Academic excellence is great and important, but it is not all that matters in life.
You might have experienced that first-hand in this crazy competitive environment where it is all about coming out at the top to make it to the next stage: Be it PhD admissions, an excellent postdoc position, the superb research scientist job in industry, an assistant professor job, tenure, being recognized as a “great” professor and so forth. If you want to turn the hamster’s wheel — you can turn it all day long just fine. But as you turn the wheel over and over, you might realize, to your dismay, that you never made all the experiences that other people call common life experience.
Maybe you wanted to learn to play the guitar or try that cool sport at the gym, but then you realize the research deadline is in 3 months, and it will be tight — so you better put in those couple of extra hours! Maybe you had the feeling that you might really enjoy doing improv theater with the local group, but somehow you could not squeeze it in between classes and research. Maybe you always wanted to write some blog posts about that one topic that you are passionate about, but how can you justify spending your weekend on a blog post when on Monday you have a meeting with your advisor, and you do not have new results yet? Maybe you wanted to improve your social skills, and you want to ask your coworker to go out and have fun, but then you realize that all your coworkers have no time because they are stressing out about the next research deadline: “Let’s do it after the deadline!” If you find yourself in a trap like this, it might be time that you make choices that offer you a different range of experiences and opportunities.
The critical bit is that a school should not only have opportunities that interest you, but the culture should also be one that encourages the exploration of those opportunities. If you live in the best city in the world and have the best people around you, it does still not really work if your advisor and coworkers expect you to work on weekends and long hours and give you a hard time when you do not. Both opportunity and a culture that supports exploration are needed for a choice to have a good variability of experiences. The variability in experiences and memories that you will get is more like the minimum of those two factors — so also try to figure out how much freedom you have in your research group.
What might a concrete case look like where variability makes sense? Let me tell you a bit about my situation when I was about to start my master’s degree. During my Bachelor studies, I discovered machine learning and a bit later deep learning. I was hooked and realized that this is something I wanted to do for the rest of my life. However, I also knew if I wanted to get into a good PhD program, I would need to have research experience. The problem was that at the online university I was studying, I could not do research, and since I did not have any credentials at that time, nobody I contacted wanted to work with me on research. So I decided to quit my job, study full time, and do my own research during my online Bachelor studies. The work was relentless and fraught with dead ends and failure, but I did not want to give up. I was highly motivated to succeed, and so I decided to isolate myself and work tirelessly with an intense focus on a research problem that involved parallelization on multiple GPUs. It was a surreal time where sometimes months fly by without any human contact. I eventually succeeded, wrote up a paper, and published it at ICLR2016. This was a big success, but it took its toll. While my peers gained life experience, social experience, got to know who they are, what they like, and enjoy; I just learned how to do research and degenerated into a weird, confused, isolated hermit. On top of that, all my PhD applications were rejected, and my only choices were Master’s.
I did not want to do a master’s since I knew I would not learn much. I knew math, I knew computer science, I knew machine learning — it seems a master’s degree would just be a piece of paper, and I would not benefit from all the experience.
Enter the variability perspective.
When looked from the variability perspective, doing a master’s is an excellent opportunity to figure out parts of life that are unclear and to catch up on social and life experience. Since I was already pretty good at the things that I need to study for the degree, I could slack off in class and just focus on something outside of class. That is precisely what I did, and the experience was absolutely marvelous, and it made me into the person that I am today.
I chose the University of Lugano for my master’s degree. It had small, intimate classes perfect for getting to know everyone. The master’s degree was highly international, and usually, we did not have two people of the same nationality in a class. I overcame my social anxiety, just hung out with people, and developed my social skills. I made my first romantic experiences, which are still very special to me. I also learned that I do not like to hang out with people in bars, who would tell me how drunk they were the last time and what they did in their drunken state, or how nice their vacations were. But then I organized a weekly philosophical evening with two friends where we would talk about philosophy, neuroscience, psychology, research in deep learning, rationality, game theory, altered states of consciousness and how all these things relate and it was always lots of fun and very meaningful to us — from there I knew where I belonged.
In my spare time, I experimented with blog posts. For example, thinking about the future of computing and how it is related to the brain and deep learning. I experimented with writing guest blog posts for NVIDIA. I experimented with spontaneous blog posts about what is on my mind. I finished such a blog post in one morning, and I would consider it to be one of my best blog posts, despite the little effort I invested in it.
I got back into inline skating. Inline skating along the park and lake in Lugano was a unique experience. I will remember forever the early mornings with a deserted town, with mist on the mountains while skating with high speed along the still water and beautiful flowers.
Also I used my extra time to gather more research experience with an internship at Microsoft Research in the US and a research internship at UCL in London. While already very valuable for my self-development, these experiences have been instrumental to my success in my PhD application.
At the end of it, I can say that I met people from dozens of different countries with very different cultures. I lived in 4 different countries across 2 continents. I learned how to be a good friend and learned which people I belonged to. I learned what my place is in this world. I revived the joy in some activities that I enjoyed in the past and found new activities that I enjoyed. I made unique memories that will always be with me. All of these experiences and memories are at the very heart of the variability perspective. I could not have gotten all of this if I had chosen the program that offered more of the same or if I had submitted to the attitude of “my master’s degree is just a piece of paper.” So with your choice of grad school, you also have the power to choose the range of experiences that will shape you into the beautiful person that you will become.
This blog post features contributions from Gabriel Ilharco. I would like to thank Hattie Zhou, Nelson Liu, Noah Smith, Gabriel Ilharco, Mitchell Wortsman, Luke Zettlemoyer, Aditya Kusupati, Jungo Kasai, and Ofir Press for their valuable feedback on drafts of this blog post.
2022-03: Added sections on Research Style and Computational Resources.