In this blog post I will delve into the brain and explain its basic information processing machinery and compare it to deep learning. I do this by moving step-by-step along with the brains electrochemical and biological information processing pipeline and relating it directly to the architecture of convolutional nets. Thereby we will see that a neuron and a convolutional net are very similar information processing machines. While performing this comparison, I will also discuss the computational complexity of these processes and thus derive an estimate for the brains overall computational power. I will use these estimates, along with knowledge from high performance computing, to show that it is unlikely that there will be a technological singularity in this century.
This blog post is complex as it arcs over multiple topics in order to unify them into a coherent framework of thought. I have tried to make this article as readable as possible, but I might have not succeeded in all places. Thus, if you find yourself in an unclear passage it might become clearer a few paragraphs down the road where I pick up the thought again and integrate it with another discipline.
First I will give a brief overview about the predictions for a technological singularity and topics which are aligned with that. Then I will start the integration of ideas between the brain and deep learning. I finish with discussing high performance computing and how this all relates to predictions about a technological singularity.
The part which compares the brains information processing steps to deep learning is self-contained, and readers which are not interested in predictions for a technological singularity may skip to this part.
Part I: Evaluating current predictions of a technological singularity
There were a lot of headlines recently about predictions that artificial intelligence will reach super-human intelligence as early as 2030 and that this might herald the beginning of human extinction, or at least dramatically altering everyday life. How was this prediction made?
Factors which help to predict a singularity
Ray Kurzweil has made many very accurate predictions and his methods to reach these predictions are quite simple for computing devices: Look at the exponential growth of computing power, efficiency, and size, and then extrapolate. This way, you could easily predict the emergence of small computers which fit into your hands and with a bit of creativity, one could imagine that one day there would be tablets and smartphones. The trends were there, you just needed to imagine what could be done with computers which you can hold in your hand.
Similarly, Ray Kurzweil predicted the emergence of strong AI which is as intelligent or more intelligent than humans. For this prediction he also used data for the exponential growth of computing power and compared this to an estimate for the computational power of the brain.
He also acknowledges that the software will be as important as the hardware, and that the software development of strong AI will take longer because such software can only be developed once fast computer systems are available. This can be felt in the area of deep learning, where solid ideas of the 1990s were unfeasible due to the slow computers. Once graphic processing units (GPUs) were used, these computing limitations were quickly removed and rapid progress could be made.
However, Kurzweil also stresses that once the hardware level is reached, first “simple” strong AI systems will be developed quickly. He sets the date for brain-like computational power to 2020 and the emergence of strong AI (first human like intelligence or better) to 2030. Why these numbers? With persisting growth in computing power in 2019 we will reach the computing power which is equivalent to the human brain — or will we?
This estimate is based on two things: (1) The estimate for the complexity of the brain, (2) the estimate for the growth in computing power. As we will see, both these estimates are not up-to-date with current technology and knowledge about neuroscience and high performance computing.
Our knowledge of neuroscience doubles about every year. Using this doubling period, in the year of 2005 we would only have possessed about 0.098% of the neuroscience knowledge that we have today. This number is a bit off, because the doubling time was about 2 years in 2005 while it is less than a year now, but overall it is way below 1 %.
The thing is that Ray Kurzweil based his predictions on the neuroscience of 2005 and never updated them. An estimate for the brains computational power based on 1% of the neuroscience knowledge does not seem right. Here is small list of a few important discoveries made in the last two years which increase the computing power of the brain by many orders of magnitude:
- It was shown that brain connections rather than being passive cables, can themselves process information and alter the behavior of neurons in meaningful ways, e.g. brain connections help you to see the objects in everyday life. This fact alone increases brain computational complexity by several orders of magnitude
- Neurons which do not fire still learn: There is much more going on than electrical spikes in neurons and brain connections: Proteins, which are the little biological machines which make everything in your body work, combined with local electric potential do a lot of information processing on their own — no activation of the neuron required
- Neurons change their genome dynamically to produce the right proteins to handle everyday information processing tasks. Brain: “Oh you are reading a blog. Wait a second, I just upregulate this reading-gene to help you understand the content of the blog better.” (This is an exaggeration — but it is not too far off)
Before we look at the complexity of the brain, let us first look at brain simulations. Brain simulations are often used to predict human-like intelligence. If we can simulate a human brain, then it will not be long until we are able to develop human-like intelligence, right? So the next paragraph looks at this reasoning. Can brain simulations really provide reliable evidence for predicting the emergence of artificial intelligence?
The problems with brain simulations
Brain simulations simulate the electrical signals which are emitted by neurons and the size of the connections between neurons. A brain simulation starts with random signals and the whole system stabilizes according to rules which are thought to govern information processing steps in the brain. After running these rules for some time, stable signals may form which can be compared to the signals of the brain. If the signals of the simulation are similar to recordings of the brain, this increases our confidence that our chosen rules are somewhat similar to the rules that the brain uses. Thus we can validate large scale information processing rules in the brain. However, the big problem with brain simulations is, that this is pretty much all we can do.
We do not gain any understanding what these signals mean or what function they could possess. We cannot test any meaningful hypotheses with this brain model other than the vague “our rules produce similar activity”. The lack of precise hypotheses which make accurate predictions (“If the activity is like this, then the circuit detected an apple instead of an orange”) is one of the loudest criticism of the European brain simulation project. The brain project is regarded as rather useless by many neuroscientists and even dangerous, because it sucks away money for useful neuroscience projects which actually shed light on neural information processing.
Another problem is that these brain simulations rely on models which are outdated, incomplete and which dismiss many biological parts in neurological information processing. This is mainly so, because the electrical information processing in the brain is much better understood. Another more conveniently reason is, that current models are already able to reproduce the needed output patterns (which is the main goal after all) and so there is no need to update these models to be more brain-like.
So to summarize, the problems with brain simulations are:
- Not possible to test specific scientific hypotheses (compare this to the large hadron collider project with its perfectly defined hypotheses)
- Does not simulate real brain processing (no firing connections, no biological interactions)
- Does not give any insight into the functionality of brain processing (the meaning of the simulated activity is not assessed)
The last point is the most important argument against the usefulness of brain processing for strong-AI estimation. If we could develop a brain simulation of the visual system, which would do well on say, the MNIST and ImageNet data sets, this would be useful to estimate progress in brain-like AI. But without this, or any similar observable function, brain simulations remain rather useless with respect to AI.
With this said, brain simulations are still valuable to test hypothesized general rules of information processing in the brain —we have nothing better for this — but they are quite useless to make sense of what the information processing in the brain means, and thus constitute unreliable evidence for predicting the progress in AI. Anything that relies on brain simulation as evidence for predictions of future strong-AI should be looked at with great skepticism.
Estimating the brains computational complexity
As mentioned in the introduction, the estimates of the brain’s complexity are a decade old and many new discoveries made this old estimate obsolete. I never came across an estimate which is up to date, so here I derive my own estimate. While doing this, I will focus mostly on the electrochemical information processing and neglect the biological interactions within the neuron, because they are too complex (and this blog post is already very long). Therefore the estimate that is derived here can be thought of as a lower bound of complexity — it should always be assumed that the brain is more complex than this.
During the construction of this model of complexity, I will also relate every step in the model with its deep learning equivalents. This will give you a better understanding of how close deep learning is related to and how fast deep learning really is compared to the human brain
Defining reference numbers for the model
We know some facts and estimates which help us to start with our model building:
- The brain uses learning algorithms which are very different from deep learning, but the architecture of neurons is similar to convolutional nets
- The adult brain has 86 billion neurons, about 10 trillion synapse, and about 300 billion dendrites (tree-like structures with synapses on them)
- The brain of a child has far more than 100 billion neurons, and has synapses and dendrites in excess of 15 trillion and 150 billion, respectively
- The brain of a fetus has more than a trillion neurons; neurons which are misplaced die quickly (this is also the reason why adults have fewer neurons than children)
- The cerebellum, the super computer of the brain, contains roughly ¾ of all neurons (this ratio is consistent in most mammal species)
- The cerebrum, the main driver of “intelligence”, contains roughly ¼ of all neurons
- An average neuron in the cerebellum has about 25000 synapses
- An average neuron in the cerebrum has about 5000-15000 synapses
The number of neurons is well known; the number of synapses and dendrites is only known within a certain boundary and I chose conservative estimates here.
The average synapses per neuron differ wildly between neurons, and the number here is a rough average. It is known that most synapses in the cerebellum are made between dendrites of Purkinje neurons and two different types of neurons that make connections that “climb up” or “cross parallel” with the Purkinje’s synapses. It is known that Purkinje cells have about 100000 synapses each. Because these cells have by far the largest weight in the cerebellum, one can estimate the complexity of the brain best if one looks at these neurons and at the interactions that they make.
It is important to differentiate between the complexity of a brain region and its functional importance. While almost all computation is carried out by the cerebellum, almost all important functions are carried out by the cerebrum (or cortex). The cortex uses the cerebellum to generate predictions, corrections and conclusions, but the cortex accumulates these insights and acts upon them.
For the cerebrum it is known that neurons almost never have more than 50000 synapses, and unlike the cerebellum, most neurons have a number of synapses within the range of 5000-15000.
How do we use these numbers?
A common approach for estimating the computational complexity of the brain is to assume all information processing in the brain can be represented by the combination of impulses when a neuron fires (action potentials) and the size (mostly number of receptors) of the synapses that each neuron has. Thus one can multiply the estimates for the number of neurons and their synapses and add everything together. Then one multiplies this by the rate of fire for the average neurons which is about 200 action potentials per second. This model is what Ray Kurzweil uses to create his estimate. While this model was okay a few decades ago, it is not suitable to model the brain from a modern view point, as it leaves out much of the important neurological information processing which is so much more than mere firing neurons.
A model which approximates the behavior of neurons more accurately is the extended linear-nonlinear-Poisson cascade model (LNP). The extended LNP model is currently viewed as an accurate model of how neurons process information. However, the extended LNP model still leaves out some fine details, which are deemed unimportant to model large scale brain function. Indeed adding these fine details to the model will add almost no additional computational complexity, but makes the model more complex to understand — thus including these details in simulations would violate the scientific method which seeks to find the simplest models for a given theory. However, this extended model is actually very similar to deep learning and thus I will include these details here.
There are other good models that are also suitable for this. The primary reason why I chose the LNP model is that it is very close to deep learning. This makes this model perfect to compare the architecture of a neuron to the architecture of a convolutional net. I will do this in the next section and at the same time I will derive an estimate for the complexity of the brain.
Part II: The brain vs. deep learning — a comparative analysis
Now I will explain step by step how the brain processes information. I will mention the steps of information processing which are well understood and which are supported by reliable evidence. On top of these steps, there are many intermediary steps at the biological level (proteins and genes) which are still poorly understood but known to be very important for information processing. I will not go into depth into these biological processes but provide a short outline, which might help the knowledge hungry readers to delve into these depths themselves. We now begin this journey from the neurotransmitters released from a firing neuron and walk along all its processes until we reach the point where the next neuron releases its neurotransmitters, so that we return to where we started.
The next section introduces a couple of new terms which are necessary to follow the rest of the blog post, so read it carefully if you are not familiar with basic neurobiology.
Neurons use the axon — a tube like structure— to transmit their electric signals over long stretches in the brain. When a neuron fires, it fires an action potential — an electrical signal— down its axon which branches into a tree of small endings, called axon terminals. On the ending of each of these axon terminals sit some proteins which convert this electrical message back into a chemical one: Small balls — called synaptic vesicles — filled with a couple of neurotransmitters each are released into an area outside of the neuron, called synaptic cleft. This area separates the axon terminal from the beginning of the next neuron (a synapse) and allows the neurotransmitter to move freely to pursue different tasks.
The synapses are most commonly located at a structure which looks very much like the roots of a tree or plant; this is the dendritic tree composed of dendrites which branch into larger arms (this represents the connections between neurons in a neural network), which finally reach the core of the cell, which is called soma. These dendrites hold almost all synapses which connect one neuron to the next and thus form the principal connections. A synapse may hold hundreds of receptors to which neurotransmitter can bind themselves.
You can imagine this compound of axon terminal and synapses at a dendrite as the (dense) input layer (of an image if you will) into a convolutional net. Each neuron may have less than 5 dendrites or as many as a few hundred thousand. Later we will see that the function of the dendritic tree is similar to the combination of a convolutional layer followed by max-pooling in a convolutional network.
Going back to the biological process, the synaptic vesicles merge with the surface of the axon terminal and turn themselves inside-out spilling their neurotransmitters into the synaptic cleft. There the neurotransmitters drift in a vibrating motion due to the temperature in the environment, until they (1) find a fitting lock (receptor protein) which fits their key (the neurotransmitter), (2) the neurotransmitters encounter a protein which disintegrates them, or (3) the neurotransmitters encounter a protein which pulls them back into the axon (reuptake) where they are reused. Antidepressants mostly work by (3) preventing, or (4) enhancing the reuptake of the neurotransmitter serotonin; (3) preventing reuptake will yield changes in information processing after some days or weeks, while (4) enhancing reuptake leads to changes within seconds or minutes. So neurotransmitter reuptake mechanisms are integral for minute to minute information processing. Reuptake is ignored in the LNP model.
However, the combination of the amount of neurotransmitters released, the number of synapses for a given neurotransmitter, and how many neurotransmitters actually make it into a fitting protein on the synapse can be thought of as the weight parameter in a densely (fully) connected layer of a neural network, or in other words, the total input to a neuron is the sum of all axon-terminal-neurotransmitter-synapse interactions. Mathematically, we can model this as the dot product between two matrices (A dot B; [amount of neurotransmitters of all inputs] dot [amount of fitting proteins on all synapses]).
After a neurotransmitter has locked onto a fitting protein on a synapse, it can do a lot of different things: Most commonly, neurotransmitters will just (1) open up channels, to let charged particles flow (through diffusion) into the dendrites, but it can also cause a rarer effect with huge consequences: The neurotransmitter (2) binds to a G-protein which then produces a protein signaling cascade which, (2a) activates (upregulates) a gene which is then used to produce a new protein which is integrated into either the surface of the neuron, its dendrites, and/or its synapses; which (2b) alerts existing proteins to do a certain function at a specific site (create or remove more synapses, unblock some entrances, attach new proteins to the surface of the synapse). This is ignored in the NLP model.
Once the channels are open, negatively or positively charged particles enter into the dendritic spine. A dendritic spine is a small mushroom-like structure on to which the synapse is attached. These dendritic spines can store electric potential and have their own dynamics of information processing. This is ignored in the NLP model.
The charge of the particles that may enter the dendritic spine are either negatively or positively charged — some neurotransmitters only open channels for negative particles, others only for positive ones. There are also channels which let positively charged particles leave the neuron, thus increasing the negativity of the electric potential (a neuron “fires” if it becomes too positive). The size and shape of the mushroom-like dendritic spine corresponds to its behavior. This is ignored in the NLP model.
Once particles entered the spine, there are many things they can affect. Most commonly, they will (1) just travel along the dendrites to the cell body in the neuron and then, if the cell gets too positively charged (depolarization) they induce an action potential (the neuron “fires”). But other actions are also common: The charged particles accumulate in the dendritic spine directly and (2) open up voltage-gated channels which may polarize the cell further (this is an example of the dendritic spine information processing mentioned above). Another very important process are (3) dendritic spikes.
Dendritic spikes
Dendritic spikes are a phenomenon which has been known to exist for some years, but only in 2013 the techniques were advanced enough to collect the data to show that these spikes were important for information processing. To measure dendritic spikes, you have to attach some very tiny clamps onto dendrites with the help of a computer which moves the clamp with great precision. To have some sort of idea where your clamp is, you need a special microscope to observe the clamp as you progress onto a dendrite. Even then you mostly attach the clamp in a rather blind matter because at such tiny scale every movement made is a rather giant leap. Only a few teams in the world have the equipment and skill to attach such clamps onto dendrites.
However, the direct data gathered by those few teams was enough to establish dendritic spikes as important information processing events. Due to the introduction of dendritic spikes into computational models of neurons, the complexity of a single neuron has become very similar to a convolutional net with two convolutional layers. As we see later the LNP model also uses non-linearities very similar to a rectified linear function, and also makes use of a spike generator which is very similar to dropout – so a neuron is very much like an entire convolutional net. But more about that later and back to dendritic spikes and what exactly they are.
Dendritic spikes occur when a critical level of depolarization is reached in a dendrite. The depolarization discharges as an electric potential along the walls of the dendrite and may trigger voltage-gated channels along its way through the dendritic tree and eventually, if strong enough, the electric potential reaches the core of the neuron where it may trigger a true action potential. If the dendritic spike fails to trigger an action potential, the opened voltage-gated channels in neighboring dendrites may do exactly that a split second later. Due to channels opened from the dendritic spike more charged particles enter the neuron, which then may either trigger (common) or stifle (rare) a full action potential at the neurons cell body (soma).
This process is very similar to max-pooling, where a single large activation “overwrites” other neighboring values. However, after a dendritic spike, neighboring values are not overwritten like during max-pooling used in deep learning, but the opening of voltage-gated channels greatly amplifies the signals in all neighboring branches within the dendritic tree. Thus a dendritic spike may heighten the electrochemical levels in neighboring dendrites to a level which is more similar to the maximum input — this effect is close to max-pooling.
Indeed it was shown that dendritic spikes in the visual system serve the same purpose as max pooling in convolutional nets for object recognition: In deep learning, max-pooling is used to achieve (limited) rotation, translation, and scale invariance (meaning that our algorithm can detect an object in an image where the object is rotated, moved, or shrunk/enlarged by a few pixels). One can think of this process as setting all surrounding pixels to the same large activation and make each activation share the weight to the next layer (in software the values are discarded for computational efficiency — this is mathematically equivalent). Similarly, it was shown that dendritic spikes in the visual system are sensitive to the orientation of an object. So dendritic spikes do not only have computational similarity, but also similarities in function.
The analogy does not end here. During neural back-propagation — that is when the action potential travels from the cell body back into the dendritic tree — the signal cannot backpropagate into the dendritic branch where the dendritic spike originated because these are “deactivated” due to the recent electrical activity. Thus a clear learning signal is sent to inactivated branches. At first this may seem like the exact opposite from the backpropagation used for max-pooling, where everything but the max-pooling activation is backpropagated. However, the absence of a backpropagation signal in a dendrite is a rare event and represents a learning signal on its own. Thus, dendrites which produce dendritic spikes have special learning signals just like activated units in max-pooling.
To better understand what dendritic spikes are and what they look like, I very much want to encourage you to watch this video (for which I do not have the copyright). The video shows how two dendritic spikes lead to an action potential.
This combination of dendritic spikes and action potentials and the structure of the dendritic tree has been found to be critical for learning and memory in the hippocampus, the main brain region responsible for forming new memories and writing them to our “hard drive” at night.
Dendritic spikes are one of the main drivers of computational complexity which have been left out from past models of the complexity of the brain. Also, these new findings show that neural back-propagation does not have to be neuron-to-neuron in order to learn complex functions; a single neuron already implements a convolutional net and thus has enough computational complexity to model complex phenomena. As such, there is little need for learning rules that span multiple neurons — a single neuron can produce the same outputs we create with our convolutional nets today.
But these findings about dendritic spikes are not the only advance made in our understanding of the information processing steps during this stage of the neural information processing pathway. Genetic manipulation and targeted protein synthesis are sources that increase computational complexity by orders of magnitude, and only recently we made advances which reveal the true extend of biological information processing.
Protein signaling cascades
As I said in the introduction of this part, I will not cover the parts of biological information processing extensively, but I want to give you enough information so that you can start learning more from here.
One thing one has to understand is that a cell looks much different from how it is displayed in text books. Cells crawl with proteins: There are about 10 billion proteins in any given human cell and these proteins are not idle: They combine with other proteins, work on a task, or jitter around to find new tasks to work on.
All the functions described above are the work of proteins. For example the key-and-lock mechanism and the channels that play the gatekeeper for the charged particles that leave and enter the neuron are all proteins. The proteins I mean in this paragraph are not these common proteins, but proteins with special biological functions.
As an example the abundant neurotransmitter glutamate may bind to a NDMA receptor which then opens up its channels for many different kinds of charged particles and after being opened, the channel only closes when the neuron fires. The strength of synapses is highly dependent on this process, where the synapse is adjusted according to the location of the NDMA receptor and the timing of signals which are backpropagated to the synapses. We know this process is critical to learning in the brain, but it is only a small piece in a large puzzle.
The charged particles which may enter the neuron may additionally induce protein signaling cascades own their own. For example the cascade below shows how an activated NMDA receptor (green) lets charged calcium CA2+ inside which triggers a cascade which eventually leads to AMPAR receptors (violet) being trafficked and installed on the synapse.
It was shown again and again that these special proteins have a great influence on the information processing in neurons, but it is difficult to pick out a specific type of protein from this seemingly chaotic soup of 10 billion proteins and study its precise function. Findings are often complex with a chain of reactions involving many different proteins until a desired end-product or end-function is reached. Often the start and end functions are known but not the exact path which led from one to the other. Sophisticated technology helped greatly to study proteins in detail, and as technology gets better and better we will further our understanding of biological information processing in neurons.
Genetic manipulation
The complexity of biological information processing does not end with protein signaling cascades, the 10 billion proteins are not a random soup of workers that do their tasks, but these workers are designed in specific quantities to serve specific functions that are relevant at the moment. All this is controlled by a tight feedback loop involving helper proteins, DNA, and messenger RNA (mRNA).
If we use programming metaphors to describe this whole process, then the DNA represents the whole github website with all its public packages, and messenger RNA is a big library which features many other smaller libraries with different functions (something like the C++ boost library).
It all begins with a programming problem you want to solve (a biological problem is detected). You use google and stackoverflow to find recommendations for libraries which you can use to solve the problem and soon you find a post that suggests that you use library X to solve problem Y (problem Y is detected on a local level in a cell with known solution of protein X; the protein that detected this defect then cascades into a chain of protein signals which leads to the upregulation of the gene G which can produce protein X; here upregulation is a “Hey! Produce more of this, please!” signal to the nucleus of the cell where the DNA lies). You download the library and compile it (the gene G is copied (transcribed) as a short string of mRNA from the very long string of DNA). You then do configure the install (the mRNA leaves the core) with the respective configuration (the mRNA is translated into a protein, the protein may be adjusted by other proteins after this), and install the library in a global “/lib” directory (the protein folds itself into its correct form after which it is fully functional). After you have installed the library, you import the needed part of the library to your program (the folded protein travels (randomly) to the site where it is needed) and you use certain functions of this library to solve your problem (the protein does some kind of work to solve the problem).
Additional to this, neurons may also dynamically alter their genome, that is they can dynamically change their github repository to add or remove libraries.
To understand this process further, you may want to watch the following video, which shows how HIV produces its proteins and how the virus can change the host DNA to suit its needs. The process described in this video animation is very similar to what is going on in neurons. To make it more similar to the process in neurons, imagine that HIV is a neurotransmitter and that everything contained in the HIV cell is in the neuron in the first place. What you have then is an accurate representation of how neurons make use of theirs genes and proteins:
You may ask, isn’t it so that every cell in your body has (almost) the same DNA in order to be able to replicate itself? Generally, this is true for most cells, but not true for most neurons. Neurons will typically have a genome that is different from the original genome that you were assigned to at birth. Neurons may have additional or fewer chromosomes and have sequences of information removed or added from certain chromosomes.
It was shown, that this behavior is important for information processing and if gone awry, this may contribute to brain disorders like depression or Alzheimer’s disease. Recently it was also shown, that neurons change their genome on a daily basis to improve information processing demands.
So when you sit at your desk for five days, and then on the weekend decide to go on a hike, it makes good sense that the brain adapts its neurons for this new task, because entirely different information processing is needed after this change of environment.
Equally, in an evolutionary sense, it would be beneficial to have different “modes” for hunting/gathering and social activity within the village — and it seems that this function might be for something like this. In general, the biological information processing apparatus is extremely efficient in responding to slower information processing demands that range from minutes to hours.
With respect to deep learning, an equivalent function would be to alter the function of a trained convolutional net in significant but rule-based ways; for example to apply a transformation to all parameters when changing from one to another task (recognition of street numbers -> transform parameters -> recognition of pedestrians).
Nothing of this biological information processing is modeled by the LNP model.
Looking back at all this, it seems rather strange that so many researchers think they that they can replicate the brain’s behavior by concentrating on the electrochemical properties and inter-neuron interactions only. Imagine that every unit in a convolutional network has its own github, from which it learns to dynamically download, compile and use the best libraries to solve a certain task. From all this you can see that a single neuron is probably more complex than an entire convolutional net, but we continue from here in our focus on electrochemical processes and see where it leads us.
Back to the LNP model
After all this above, there is only one more relevant step in information processing for our model. Once a critical level of depolarization is reached, a neuron will most often fire, but not always. There are mechanisms that prevent a neuron from firing. For example shortly after a neuron fired, its electric potential is too positive to produce a fully-fledged action potential, and thus it cannot fire again. This blockage may be present even when a sufficient electric potential is reached, because this blockade is a biological function and not a physical switch.
In the LNP model, this blockage of an action potential is modeled as an inhomogeneous Poisson process which has a Poisson distribution. A Poisson process with a Poisson distribution as a model means that the neuron has a very high probability to fire the first or second time it reached its threshold potential, but it may also be (with a exponentially decreasing probability) that a neuron may not fire for many more times.
There are exceptions to this rule, where neurons disable this mechanism and fire continuously at the rates which are governed by the physics alone — but these are special events which I will ignore at this point. Generally, this whole process is very similar to dropout used in deep learning which uses a uniform distribution instead of a Poisson distribution; thus this process can be viewed as some kind of regularization method that the brain uses instead of dropout.
In the next step, if the neuron fires, it releases an action potential. The action potential has very little difference in its amplitude, meaning the electric potential generated by the neuron almost always has the same magnitude, and thus is a reliable signal. As this signal travels down the axon it gets weaker and weaker. When it flows into the branches of the axon terminal, its final strength will be dependent on the shape and length of these branches; so each axon terminal will receive a different amount of electrical potential. This spatial information, together with the temporal information due to the spiking pattern of action potentials, is then translated into electrochemical information (it was shown that they are translated into spikes of neurotransmitters themselves that last about 2ms). To adjust the output signal, the axon terminal can move, grow or shrink (spatial), or it may alter its protein makeup which is responsible for releasing the synaptic vesicles (temporal).
Now we are back at the beginning: Neurotransmitters are released from the axon terminal (which can be modeled as a dense matrix multiplication) and the steps repeat themselves.
Learning and memory in the brain
Now that we went through the whole process back to back, let us put all this into context to see how the brain uses all this in concert.
Most neurons repeat the process of receive-inputs-and-fire about 50 to 1000 times per second; the firing frequency is highly dependent on the type of neuron and if a neuron is actively processesing tasks. Even if a neuron does not process a task it will fire continuously in a random fashion. Once some meaningful information is processed, this random firing activity makes way for a highly synchronized activity between neighboring neurons in a brain region. This synchronized activity is poorly understood, but is thought to be integral to understanding information processing in the brain and how it learns.
Currently, it is not precisely known how the brain learns. We do know that it adjusts synapses with some sort of reinforcement learning algorithm in order to learn new memories, but the precise details are unclear and the weak and contradicting evidence indicates that we are missing some important pieces of the puzzle. We got the big picture right, but we cannot figure out the brain’s learning algorithm without the fine detail which we are still lacking.
Concerning memories, we know that some memories are directly stored in the hippocampus, the main learning region of the brain (if you lose your hippocampus in each brain hemisphere, you cannot form new memories). However, most long-term memories are created and integrated with other memories during your REM sleep phase, when so called sleep spindles unwind the information of your hippocampus to all other brain areas. Long-term memories are generally all local: Your visual memories are stored in the visual system; your memories for your tongue (taste, texture) are stored in the brain region responsible for your tongue, etcetera.
It is also known, that the hippocampus acts as a memory buffer. Once it is full, you need to sleep to empty its contents to the rest of your brain (through sleep spindles during REM sleep); this might be why babies sleep so much and so irregularly —once their learning buffer is full, they sleep to quickly clear their buffer in order to learn more after they wake. You can still learn when this memory buffer is full, but retention is much worse and new memories might wrangle with other memories in the buffer for space and displace them —so really get your needed amount of sleep. Sleeping less and irregularly is unproductive, especially for students who need to learn.
Because memories are integrated with other memories during your “write buffer to hard-drive” stage, sleep is also very important for creativity. The next time you recall a certain memory after you slept, it might be altered with some new information that your brain thought to be fitting to attach to that memory.
I think we all had this: We wake up with some crazy new idea, only to see that it was quite nonsensical in the first place — so our brain is not perfect either and makes mistakes. But other times it just works: One time I tortured myself with a math problem for 7 hours non-stop, only to go to bed disappointed with only about a quarter of the whole problem solved. After I woke, I immediately had two new ideas how to solve the problem: The first did not work; but second made things very easy and I could sketch a solution to the math problem within 15 minutes — an ode to sleep!
Now why do I talk about memories when this blog post is about computation? The thing is that memory creation — or in other words — a method to store computed results for a long time, is critical for any intelligence. In brain simulations, one is satisfied if the synapse and activations occur in the same distribution as they do in the real brain, but one does not care if these synapses or activations correspond to anything meaningful — like memories or “distributed representations” needed for functions such as object recognition. This is a great flaw. Brain simulations have no memories.
In brain simulation, the diffusion of electrochemical particles is modeled by differential equations. These differential equations are complex, but can be modeled with simple techniques like Euler’s method to approximate these complex differential equations. The result has poor accuracy (meaning high error) but the algorithm is very computationally efficient and the accuracy is sufficient to reproduce the activities of real neurons along with their size and distribution of synapses. The great disadvantage is that we generally cannot learn parameters from a method like this — we cannot create meaningful memories.
However, as I have shown in my blog post about convolution, we can also model diffusion by applying convolution — a very computationally complex operation. The advantage about convolution is that we can use methods like maximum-likelihood estimation with backpropagation to learn parameters which lead to meaningful representations which are akin to memories (just like we do in convolutional nets). This is exactly akin to the LNP model with its convolution operation.
So besides its great similarity to deep learning models, the LNP model is also justified in that it is actually possible to learn parameters which yield meaningful memories (where with memories I mean here distributed representations like those we find in deep learning algorithms).
This then also justifies the next point where I estimate the brain’s complexity by using convolution instead of Euler’s method on differential equations.
Another point to take away from for our model is, that we currently have no complexity assigned for the creation of memories (we only modeled the forward pass, not the backward pass with backpropagation). As such, we underestimate the complexity of the brain, but because we do not know how the brain learns, we cannot make any accurate estimates for the computational complexity of learning. With that said and kept in the back of our mind, let us move on to bringing the whole model together for a lower bound of computational complexity.
Bringing it all together for a mathematical estimation of complexity
The next part is a bit tricky: We need to estimate the numbers for N, M, n and m and these differ widely among neurons.
We know that 50 of the 86 billion neurons in the brain are cerebellar granule neurons, so these neurons and their connection will be quite important in our estimation.
Cerebellar granule neurons are very tiny neurons with about 4 dendrites. Their main input is from the cortex. They integrate these signals and then send them along a T-shaped axon which feeds into the dendrites of Purkinje neurons.
Purkinje neurons are by far the most complex neurons, but there are only about 100 million of them. They may have more than a 100000 synapses each and about 1000 dendrites. Multiple Purkinje neurons bundle their outputs in about a dozen deep nuclei (a bunch of densely packed neurons) which then send signals back to the cortex.
This process is very crucial for non-verbal intelligence, abstract thinking and abstract creativity (creativity: Name as many words beginning with the letter A; abstract creativity: What if gravity bends space-time (general relativity)? What if these birds belonged to the same species when they came to this island (evolution)?). It was thought a few decades ago that the cerebellum only computes outputs for movement; for example while Einstein’s cerebrum was handled and studied carefully, his cerebellum was basically just cut off and put away, because it was regarded as a “primitive” brain part.
But since then it was shown that the cerebellum forms 1:1 connections with most brain regions of the cortex. Indeed, changes in the front part of the cerebellum during the ages 23 to 25 may change your non-verbal IQ by up to 30 points, and changes of 10-15 IQ points are common. This is very useful in most instances, whereas we lose neurons which perform a function which we do not need in everyday lives (calculus, or the foreign language which you learned but never used).
So it is crucial to get the estimation of the cerebellum right not only because it contains most neurons, but also because it is important for intelligence and information processing in general.
Estimation of cerebellar filter dimensions
Now if we look at a single dendrite, it branches off into a few branches and thus has a tree like structure. Along its total length it is usually packed with synapses. Dendritic spikes can originate in any branch of a dendrite (spatial dimension). When we take 3 branches per dendrite, and 4 dendrites in total we have a convolutional filter of size 3 and 4 for cerebellar granule neurons. Since linear convolution over two dimensions is the same as convolution over one dimension followed by convolution over the other dimension, we can also model this as a single 3×4 convolution operation. Also note that this is mathematically identical to a model that describes the diffusion of particles originating from different sources (feature map) which diffuse according to a rule in their neighborhood (kernel) — this is exactly what happens at a physical level. More on this view in my blog post about convolution.
Here I have chosen to represent the spatial domain with a single dimension. It was shown that the shape of the dendritic tree is also important in the resulting information processing and thus we would need two dimensions for the spatial domain. However, data is lacking to represent this mathematically in a meaningful way and thus I proceed with the simplification to one spatial dimension.
The temporal dimension is also important here: Charged particles may linger for a while until they are pumped out of the neuron. It is difficult to estimate a meaningful time frame, because the brain uses continuous time while our deep learning algorithms only know discrete time steps.
No single estimate makes sense from a biological perspective, but from a psychological perspective we know that the brain can take up unconscious information that is presented in an image in about 20 milliseconds (this involves only some fast, special parts of the brain). For conscious recognition of an object we need more time — at least 65 milliseconds, and on average about 80-200 milliseconds for reliable conscious recognition. This involves all the usual parts that are active for object recognition.
From these estimates, one can think about this process as “building up the information of the seen image over time within a neuron”. However, a neuron can only process information if it can differentiate meaningful information from random information (remember, neurons fire randomly if they do not actively process information). Once a certain level of “meaningful information” is present, the neuron actively reacts to that information. So in a certain sense information processing can be thought of as an epidemic of useful information that spreads across the brain: Information can only spread to one neuron, if the neighboring neuron is already infected with this information. Thinking in this way, such an epidemic of information infects all neurons in the brain within 80-200 milliseconds.
As such we can say that, while the object lacks details in the first 20 milliseconds, there is full detail at about 80-200 milliseconds. If we translate this into discrete images at the rate of 30 frames per second (normal video playback) —or in other words time steps — then 20 milliseconds would be 0.6 time steps, and 80-200 milliseconds 2.4-6 time steps. This means, that all the visual information that a neuron needs for its processing will be present in the neuron within 2.4 to 6 frames.
To make calculations easier, I here now choose a fixed time dimension of 5 time steps for neural processes. This means for the dendrites we have spatio-temporal convolutional filters of size 3x4x5 for cerebellar granule neurons. For Purkinje neurons a similar estimate would be filters of a size of about 10x1000x5. The non-linearity then reduces these inputs to a single number for each dendrite. This number represents an instantaneous firing rate, that is, the number represents how often the neuron fires in the respective interval of time, for example at 5 Hz, 100 Hz, 0 Hz etcetera. If the potential is too negative, no spike will result (0 HZ); if the potential is positive enough, then the magnitude of the spike is often proportional to the magnitude of the electric potential —but not always.
It was shown that dendritic summation of this firing rate can be linear (the sum), sub-linear (less than the sum), supra-linear (more than the sum) or bistable (less than the sum, or more than the sum, depending on the respective input); these behaviors of summation often differ from neuron to neuron. It is known that Purkinje neurons use linear summation, and thus their summation to form a spike rate is very similar to the rectified linear function max(0,x) which is commonly used in deep learning. Non-linear sums can be thought of different activation functions. It is important to add, that the activation function is determined by the type of the neuron.
The filters in the soma (or cell body) can be thought of as an additional temporal convolutional filter with a size of 1 in the spatial domain. So this is a filter that reduces the input to a single dimension with a time dimension of 5, that is, a 1x1x5 convolutional filter (this will be the same for all neurons).
Again, the non-linearity then reduces this to an instantaneous firing rate, which then is dropped out by a Poisson process, which is then fed into a weight-matrix.
At this point I want to again emphasize, that it is not correct to view the output of a neuron as binary; the information conveyed by a firing neuron is more like an if-then-else branch: “if(fire == True and dropout == False){ release_ neurotransmitters(); }else{ sleep(0.02); }”
The neurotransmitters are the true output of a neuron, but this is often confused. The source of this confusion is that it is very difficult to study neurotransmitter release and its dynamics with a synapse, while it is ridiculously easy to study action potentials. Most models of neurons thus model the output as action potentials because we have a lot of reliable data here; we do not have such data for neurotransmitter interactions at a real-time level. This is why action potentials are often confused as the true outputs of neurons when they are not.
When a neuron fires, this impulse can be thought of as being converted to a discrete number at the axon terminal (number of vesicles which are released) and is multiplied by another discrete number which represents the amount of receptors on the synapse (this whole process corresponds to a dense or fully connected weight in convolutional nets). In the next step of information processing, charged particles flow into the neuron and build up a real-valued electric potential. This has also some similarities to batch-normalization, because values are normalized into the range [0,threshold] (neuron: relative to the initial potential of the neuron; convolutional net: relative to the mean of activations in batch-normalization). When we look at this whole process, we can model it as a matrix multiplication between two real-valued matrices (doing a scaled normalization before or after this is mathematically equivalent, because matrix multiplication is a linear operation).
Therefore we can think of axon-terminal-synapse interactions between neurons as a matrix multiplication between two real-valued matrices.
Estimation of cerebellar input/output dimensions
Cerebellar granule neurons typically receive inputs from about four axons (most often connections from the cortex). Each axon forms about 3-4 synapses with the dendritic claw of the granule neuron (a dendrite ending shaped as if you would hold a tennis ball in your hand) so there are a total of about 15 inputs via synapses to the granule neurons. The granule neuron itself ends in a T shaped axon which crosses directly through the dendrites of Purkinje neurons with which it forms about 100 synapses.
Purkinje neurons receive inputs from about 100000 connections made with granule neurons and they themselves make about 1000 connections in the deep nuclei. There are estimates which are much higher and no accurate number for the number of synapses exists as far as I know. The number of 100000 synapses might be a slight overestimate (but 75000 would be too conservative), but I use it anyways to make the math simpler.
All these dimensions are taken times the time dimension as discussed above, so that the input for granule neurons for example has a dimensionality of 15×5.
So with this we can finally calculate the complexity of a cerebellar granule neuron together with the Purkinje neurons.
So my estimate would be 1.075×10^21 FLOPS for the brain, the fastest computer on earth as of July 2013 has 0.58×10^15 FLOPS for practical application (more about this below).
Part III: Limitations and criticism
While I discussed how the brain is similar to deep learning, I did not discuss how the brain is different. One great disparity is that the dropout in the brain works with respect to all inputs, while dropout in a convolutional network works with respect to each single unit. What the brain is doing makes little sense in deep learning right now; however, if you think about combining millions of convolutional nets with each other, it makes good sense to do as the brain does. The dropout of the brain certainly would work well to decouple the activity of neurons from each other, because no neuron can depend on information from a single other neuron (because it might be dropped out), so that it is forced to take into account all the neurons it is connected with, thus eliminating biased computation (which is basically regularization).
Another limitation of the model is that it is a lower bound. This estimate does not take into account:
- Backpropagation, i.e. signals that travel from the soma to the dendrites; the action potential is reflected within the axon and travels backwards (these two things may almost double the complexity)
- Axon terminal information processing
- Multi-neurotransmitter vesicles (can be thought of multiple output channels or filters, just as an image has multiple colors)
- Geometrical shape of the dendritic tree
- Dendritic spine information processing
- Non-axodendritic synapses (axon-axon and axon-soma connections)
- Electrical synapses
- Neurotransmitter induced protein activation and signaling
- Neurotransmitter induced gene regulation
- Voltage induced (dendritic spikes and backpropagating signals) gene regulation
- Voltage induced protein activation and signaling
- Glia cells (besides having an extremely abnormal brain (about one in a billion), Einstein also had abnormally high levels of glia cells)
All these things have been shown to be important for information processing in the brain. I did not include them in my estimate because this would have made everything:
- Too complex: What I have discussed so far is extremely simple if you compare that to the vastness and complexity of biological information processing
- Too special: Non-axodendritic synapses can have unique information processing algorithms completely different from everything listed here, e.g. direct electrical communication between a neighboring bundle of neurons
- And/or evidence is lacking to create a reliable mathematical model: Neural backpropagation, geometry of the dendritic trees, and dendritic spines
Remember that these estimates are for the whole brain. Local brain regions might have higher computational processing speed than this average when they are actively processing stimuli. Also remember that the cerebellum makes up almost all computational processing. Other brain regions integrate the knowledge of the cerebellum, but the cerebellum acts as a transformation and abstraction module for almost all information in the brain (except vision and hearing).
But wait, but we can do all this with much less computational power! We already have super-human performance in computer vision!
I would not say that we have super-human performance in computer vision. What we have is a system that beats human at naming things in images that are taken out of context of the real world (what happens before we see something in the real world shapes our perception dramatically). We almost always can recognize things in our environment, but we most often just do not know (or care about) the name of what we see.
Humans do not have the visual system to label things. Try to make a list of 1000 common physical objects in the real world —not an easy task.
To not recognize an object for us humans would mean that we see an object but cannot make sense of it. If you forgot the name of an old classmate, it does not mean you did not recognize her; it just means you forgot her name. Now imagine you get off a train stop and you know a good friend is waiting for you somewhere at the stop. You see somebody 300 meters away waving their hands who is looking in your direction — is it your friend? You do not know; you cannot recognize if it is her. That’s the difference between mere labels and object recognition.
Now if you cannot recognize something in a 30×30 pixel image, but the computer can, this also does not necessarily mean that the computer has super-human object recognition performance. First and foremost this means that your visual system does not work well for pixeled information. Our eyes are just not used to that.
Now take a look outside a window and try to label all the things you see. It will be very easy for most things, but for other things you do not know the correct labels! For example, I do not know the name for a few plants that I see when I look out of my window. However, we are fully aware what it is what we see and can name many details of the object. For example, alone by assessing their appearance, I know a lot about how much water and sunshine the unknown plants need, how fast they grow, in which way they grow, if they are old or young specimens; I know how they feel like if I touch them — or more generally — I know how these plants grow biologically and how they produce energy, and so on. I can do all this without knowing its name. Current deep learning systems cannot do this and will not do this for quite some time. Human-level performance in computer vision is far away indeed! We just reached the very first step (object recognition) and now the task is to make computer vision smart, rather than making it just good at labeling things.
Evolutionarily speaking, the main functions of our visual system have little to do with naming things that we see: Hunt and avoid being hunted, to orient ourselves in nature during foraging and make sure we pick the right berries and extract roots efficiently— these are all important functions, but probably one of the most important functions of our vision is the social function within a group or relationship.
If you Skype with someone it is quite a different communication when they have their camera enabled compared to if they have not. It is also very different to communicate with someone whose image is on a static 2D surface compared to communicating in person. Vision is critical for communication.
Our deep learning cannot do any of this efficiently.
Making sense of a world without labels
One striking case which also demonstrates the power of vision for true understanding of the environment without any labels is the case of Genie. Genie was strapped into place and left alone in a room at the age of 20 months. She was found with severe malnutrition 12 years later. She had almost no social interaction during this time and thus did not acquire any form of verbal language.
Once she got in contact with other human beings she was taught English as a language (and later also sign language), but she never really mastered it. Instead she quickly mastered non-verbal language and was truly exceptional at that.
To strangers she almost exclusively communicated with non-verbal language. There are instances where these strangers would stop in their place, leave everything behind, walk up to her and hand her a toy or another item — that item was always something that was known to be something liked and desired.
In one instance a woman got out of her car at a stoplight at an intersection, emptied her purse and handed it to Genie. The woman and Genie did not exchange a word; they understood each other completely non-verbally.
So what Genie did, was to pick up cues with her visual system and translated the emotional and cognitive state of that woman into non-verbal cues and actions, which she would then use to change the mental state of the woman. In turn that the woman would then desire to give the purse to Genie (which Genie probably could not even see).
Clearly, Genie was very exceptional at non-verbal communication — but what would happen if you pitched her against a deep learning object recognition system? The deep learning system would be much better than Genie on any data set you would pick. Do you think it would be fair to say that the convolutional net is better at object recognition than Genie is? I do not think so.
This shows how primitive and naïve our approach to computer vision is. Object recognition is a part of human vision, but it is not what makes it exceptional.
Can we do with less computational power?
“We do not need as much computational power as the brain has, because our algorithms are (will be) better than that of the brain.”
I hope you can see after the descriptions in this blog post that this statement is rather arrogant.
We do not know how the brain really learns. We do not understand information processing in the brain in detail. And yet we dare to say we can do better?
Even if we did know how the brain works in all its details, it would still be rather naïve to think we could create general intelligence with much less. The brain developed during many hundreds of millions of years through evolution. Evolutionary, it is the most malleable organ there is: The human cortex shrunk by about 10% during the last 20000 years, and the human brain adapted rapidly to the many ways we use verbal language — a very recent development in evolutionary terms.
It was also shown that the number of neurons in each animal’s brain is almost exactly the amount which it can sustain through feeding (we probably killed off the majority of all mammoths by about 20000 years ago). We humans have such large brains because we invented fire and cooking with which we could predigest food which made it possible to sustain more neurons. Without cooking, the intake of calories would not be high enough to sustain our brains and we would helplessly starve (at least a few thousand years ago; now you could survive on a raw vegan diet easily — just walk into a supermarket and buy a lot of calorie-dense foods). With this fact, it is very likely that brains are optimized exhaustively to create the best information processing which is possible for the typical calorie intake of the respective species — the function which is most expensive in an animal will be most ruthlessly optimized to enhance survival and procreation. This is also very much in line with all the complexity of the brain; every little function is optimized thoroughly and only as technology advances we can understand step by step what this complexity is made for.
There are many hundreds of different types of neurons in the brain, each with their designated function. Indeed, neuroscientists often can differentiate different brain regions and their function by looking at the changing architecture and neuron types in a brain region. Although we do not understand the details of how the circuits perform information processing, we can see that each of these unique circuits is designed carefully to perform a certain kind of function. These circuits are often replicated in evolutionary distinct species which share a common ancestor that branched off into these different species hundreds of millions of years ago, showing that such structures are evolutionarily optimal for the tasks they are processing.
The equivalent in deep learning would be, if we had 10000 different architectures of convolutional nets (with its own set of activation functions and more) which we combine meticulously to improve the overall function of our algorithm ― do you really think we can build something which can produce as complex information processing, but which follows a simple general architecture?
It is rather naïve to think that we can out-wit this fantastically complex organ when we are not even able to understand its learning algorithms.
On top of this, the statement that we will develop better algorithms than the brain uses is unfalsifiable. We can only prove it when we achieve it, we cannot disprove it. Thus it is a rather nonsensical statement that has little practical value. Theories are usually useful even when there is not enough evidence to show that they are correct.
The standard model of physics is an extremely useful theory used by physicists and engineers around the world in their daily life to develop the high tech products we enjoy; and yet this theory is not complete, it was amended just a few days ago when a new particle was proven to exist in the LHC experiment.
Imagine if there were another model, but you would only be able to use it when we have proven the existence of all particles. This model would then be rather useless. When it makes no predictions at all about the behavior in the world, we would be unable to manufacture and develop electronics with this theory. Similarly, the statement that we can develop more efficient algorithms than the brain does not help; it rather makes it more difficult to make further progress. The brain should really be our main point of orientation.
Another argument, which would be typical for Yann LeCun (he made a similar argument during a panel) would be: Arguably, airplanes are much better at flying than birds are; yet, if you describe the flight of birds it is extremely complex and every detail counts, while the flight of airplanes is described simply by the fluid flow around an airfoil. Why is it wrong to expect this simplicity from deep learning when compared to the brain?
I think this argument has some truth in it, but essentially, it asks the wrong question. I think it is clear that we need not to replicate everything in detail in order to achieve artificial intelligence, but the real question is: Where do we draw the line? If you get to know that neurons can be modeled in ways that closely resemble convolutional nets, would you go so far and say, that this model is too complex and we need to make it simpler?
Part IV: Predicting the growth of practical computational power
There is one dominant measure of performance in high-performance computing (HPC) and this measure is floating point operations per second (FLOPS) on the High Performance LINPACK (HPL) benchmark – which measures how many computations a system can do in a second when doing distributed dense matrix operations on hundreds or thousands of computers. There exists the TOP 500 list of supercomputers, which is a historical list based on this benchmark which is the main reference point for the performance of a new supercomputer system.
But a big but comes with the LINPACK benchmark. It does not reflect the performance in real, practical applications which run on modern supercomputers on a daily basis, and thus, the fastest computers on the TOP 500 list are not necessarily the fastest computers for practical applications.
Everybody in the high performance computing community knows this, but it is so entrenched in the business routine in this area, that when you design a new supercomputer system, you basically have to show that your system will be able to get a good spot on the TOP 500 in order to get funding for that supercomputer.
Sometimes such systems are practically unusable, like the Tianhe-2 supercomputer which still holds the top spot on the LINPACK benchmark after more than three years. The potential of this supercomputer goes largely unused because it is too expensive to run (electricity) and the custom hardware (custom network, Intel Xeon Phi) requires new software, which would need years of development to reach the levels of sophistication of standard HPC software. The Tianhe-2 runs only at roughly one third of its capacity, or in other words, it practically stands idle for nearly 2 out of 3 minutes. The predecessor of the Tianhe-2, the Tianhe-1, fastest computer in the world in 2010 (according to LINPACK), has not been used since 2013 due to bureaucracy reasons.
While outside of China, other supercomputers of similar design fare better, they typically do not perform so well in practical applications. This is so, because the used accelerators like graphic processing units (GPUs) or Intel Xeon Phis can deliver high FLOPS in such a setup, but they are severely limited by network bandwidth bottlenecks.
To correct the growing uselessness of the LINPACK benchmark a new measure of performance was developed: The high performance conjugate gradient benchmark (HPCG). This benchmark performs conjugate gradient, which requires more communication than LINPACK and as such comes much closer to performance numbers for real applications. I will use this benchmark to create my estimates for a singularity.
However, this benchmark still dramatically overestimates the computing power that can be reached for artificial intelligence applications when we assume that these applications are based on deep learning.
Deep learning is currently the most promising technique for reaching artificial intelligence. It is certain that deep learning — as it is now — will not be enough, but one can say for sure that something similar to deep learning will be involved in reaching strong AI.
Deep learning, unlike other applications has an unusually high demand for network bandwidth. It is so high that for some supercomputer designs which are in the TOP 500 a deep learning application would run slower than on your desktop computer. Why is this so? Because parallel deep learning involves massive parameter synchronization which requires extensive network bandwidth: If your network bandwidth is too slow, then at some point deep learning gets slower and slower the more computers you add to your system. As such, very large systems which are usually quite fast may be extremely slow for deep learning.
The problem with all this is that the development of new network interconnects which enable high bandwidth is difficult and advances are made much more slowly than the advances of computing modules, like CPUs, GPUs and other accelerators. Just recently, Mellanox reached a milestone where they could manufacture switches and InfiniBand cards which operate at 100Gbits per second. This development is still rather experimental, and it is difficult to manufacture fiber-optic cables which can operate at this speed. As such, no supercomputer implements this new development as of yet. But with this milestone reached, there will not be another milestone for many quite a while. The doubling time for network interconnect bandwidth is about 3 years.
Similarly, there is a memory problem. While the speed of theoretical processing power of CPUs and GPUs keeps increasing, the memory bandwidth of RAM is almost static. This is a great problem, because now we are at a point where it costs more time to move the data to the compute circuits than to actually make computations with it.
With new developments such as 3D memory one can be sure that further increases in memory bandwidth will be achieved, but we have nothing after that to increase the performance further. We need new ideas and new technology. Memory will not scale itself by getting smaller and smaller.
However, currently the biggest hurdle of them all is power consumption. The Tianhe-2 uses 24 megawatts of power, which totals to $65k-$100k in electricity cost per day, or about $23 million per year. The power consumed by the Tianhe-2 would be sufficient to power about 6000 homes in Germany or 2000 homes in the US (A/C usage).
Physical limitations
Furthermore, there are physical problems around the corner. Soon, our circuits will be so small that electrons will start to show quantum effects. One such quantum effect is quantum tunneling. In quantum tunneling an electron sits in two neighboring circuits at once, and decides randomly to which of these two locations it will go next.
If this would happen at a larger scale, it would be like charging your phone right next to your TV, and the electrons decide they want to go to your cell phone cable rather than to your TV; so they jump over to the phone cable cutting off the power to your TV. Quantum tunneling will become relevant in 2016-2017 and has to be taken into account from there on. New materials and “insulated” circuits are required to make everything work from here on.
With new materials, we need new production techniques which will be very costly because all computer chips relied on the same, old but reliable production process. We need research and development to make our known processes working with these new materials and this will not only cost money but also cost time. This will also fuel a continuing trend where the cost for producing computer chips increases exponentially (and growth may slow due to costs). Currently, the tally is at $9bn for such a semiconductor fabrication plant (fab) increasing at a relatively stable rate of about 7-10% higher costs per year for the past decades.
After this, we are at the plain physical limits. A transistor will be composed of not much more than a handful of atoms. We cannot go smaller than this, and this level of manufacturing will require extensive efforts in order to get such devices working properly. This will start to happen around 2025 and the growth may slow from here due to physical limitations.
Recent trends in the growth of computational power
So to summarize the previous section: (1) LINPACK performance does not reflect practical performance because it does not test memory and network bandwidth constraints; (2) memory and network bandwidth are now more important than computational power, however (3) advances in memory and network bandwidth will be sporadic and cannot compete with the growth in computational power; (4) electrical costs are a severe limitation (try to justify a dedicated power plant for a supercomputer if citizen face sporadic power outages), and also (5) computational power will be limited by physical boundaries in the next couple of years.
It may not come to a surprise then that the growth in computational power has been slowing down in recent years; this is mainly due to power efficiencies which will only be improved gradually, but the other factors also take its toll, like network interconnects which cannot keep up with accelerators like GPUs.
If one takes the current estimate of practical FLOPS of the fastest supercomputer, the Tianhe-2 with 0.58 petaflops on HPCG, then it would take 21 doubling periods until the lower bound of the brain’s computational power is reached. If one uses Moore’s Law, we would reach that by 2037; if we take the growth of the last 60 years, which is about 1.8 years per doubling period, we will reach this in the year 2053. If we take a lower estimate of 3 years for the doubling period due to the problems listed above we will reach this in 2078. While for normal supercomputing applications memory bandwidth is the bottleneck for practical applications as of now, this may soon change to networking bandwidth, which doubles about every 3 years. So the 2078 estimate might be quite accurate.
Now remember that, (1) the HPCG benchmark has much higher performance than typical deep learning applications which rely much more on network and memory bandwidth, and (2) that my estimate for the computational complexity of the brain is a lower bound. One can see that an estimate beyond 2100 might be not too far off. To sustain such a long and merciless increase in computation performance will require that we develop and implement many new ideas while operating at the border of physical limitations as soon as by 2020. Will this be possible?
Where there’s a will, there’s a way — the real question is: Are we prepared to pay the costs?
Conclusion
Here I have discussed the information processing steps of the brain and their complexity and compared them to those of deep learning algorithms. I focused on a discussion of basic electrochemical information processing and neglected biological information processing.
I used an extended linear-nonlinear-Poisson cascade model as groundwork and related it to convolutional architectures.
With this model, I could show that a single neuron has an information processing architecture which is very similar to current convolutional nets, featuring convolutional stages with rectified non-linearities which activities are then regularized by a dropout-like method. I also established a connection between max-pooling and voltage-gated channels which are opened by dendritic spikes. Similarities to batch-normalization exist.
This straightforward similarity gives strong reason to believe that deep learning is really on the right path. It also indicates that ideas borrowed from neurobiological processes are useful for deep learning (the problem was that progress in deep learning architectures often preceded knowledge in neurobiological processes).
My model shows that it can be estimated that the brain operates at least 10x^21 operations per second. With current rates of growth in computational power we could achieve supercomputers with brain-like capabilities by the year 2037, but estimates after the year 2080 seem more realistic when all evidence is taken into account. This estimate only holds true if we succed to stomp limitations like physical barriers (for example quantum-tunneling), capital costs for semiconductor fabrication plants, and growing electrical costs. At the same time we constantly need to innovate to solve memory bandwidth and network bandwidth problems which are or will be the bottlenecks in supercomputing. With these considerations taken into account, it is practically rather unlikely that we will achieve human-like processing capabilities anytime soon.
Closing remarks
My philosophy of this blog post was to present all information on a single web-page rather than scatter information around. I think this design helps to create a more sturdy fabric of knowledge, which, with its interwoven strains of different fields, helps to create a more thorough picture of the main ideas involved. However, it has been quite difficult to organize all this information into a coherent picture and some points might be more confusing than enlightening. Please leave a comment below to let me know if the structure and content need improvement, so that I can adjust my next blog post accordingly.
I would also love general feedback for this blog post.
Also make sure to share this blog post with your fellow deep learning colleagues. People with raw computer science backgrounds often harbor misconceptions about the brain, its parts and how it works. I think this blog post could be a suitable remedy for that.
The next blog post
The second post in this series on neuroscience and psychology will focus on the most important brain regions and their function and connectivity. The last and third part in the series will focus on psychological processes, such as memory and learning, and what we can learn from that with respect to deep learning.
Acknowledgments
I would like to thank Alexander Tonn for his useful advice and for proofreading this blog post.
Important references and sources
Neuroscience
Brunel, N., Hakim, V., & Richardson, M. J. (2014). Single neuron dynamics and computation. Current opinion in neurobiology, 25, 149-155.
Chadderton, P., Margrie, T. W., & Häusser, M. (2004). Integration of quanta in cerebellar granule cells during sensory processing. Nature, 428(6985), 856-860.
De Gennaro, L., & Ferrara, M. (2003). Sleep spindles: an overview. Sleep medicine reviews, 7(5), 423-440.
Ji, D., & Wilson, M. A. (2007). Coordinated memory replay in the visual cortex and hippocampus during sleep. Nature neuroscience, 10(1), 100-107.
Liaw, J. S., & Berger, T. W. (1999). Dynamic synapse: Harnessing the computing power of synaptic dynamics. Neurocomputing, 26, 199-206.
Ramsden, S., Richardson, F. M., Josse, G., Thomas, M. S., Ellis, C., Shakeshaft, C., … & Price, C. J. (2011). Verbal and non-verbal intelligence changes in the teenage brain. Nature, 479(7371), 113-116.
Smith, S. L., Smith, I. T., Branco, T., & Häusser, M. (2013). Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. Nature, 503(7474), 115-120.
High performance computing
Dongarra, J., & Heroux, M. A. (2013). Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744, 312.
Interview: Why there will be no exascale computing before 2020
Slides: Why there will be no exascale computing before 2020
Interview: Challenges of exascale computing
Image references
Anwar, H., Roome, C. J., Nedelescu, H., Chen, W., Kuhn, B., & De Schutter, E. (2014). Dendritic diameters affect the spatial variability of intracellular calcium dynamics in computer models. Frontiers in cellular neuroscience, 8.
Omar says
This article is pretty interesting, I am new to AI and I find the contents very exciting but I wish the image links were not broken. It just makes it hard to visualize what I am reading.
Still, this is a great read. Thanks!
Tim Dettmers says
Thanks for leaving a comment about that. I have a plug-in that detects dead/broken links, but it did not pick up on this. Thanks for letting me know!
Rolando says
Superb blog! Do you have any recommendations for aspiring writers?
I’m planning to start my own website soon but I’m
a little lost on everything. Would yyou advise starting with a free platform like
Wordpress or go for a paid option? There aree soo many options out there
that I’m tktally confused .. Any recommendations?
Cheers!
Tim Dettmers says
Yes, I would go for free wordpress. I did that two for a couple of years before I got my own server and own domain.
Peter Kinnmark says
Even without a definition of intelligence, it’s resonable to claim that humans have it. But human intelligence, probably closely linked to to at least the scences, is the result of evolution, not our thinking. There is no reason to assume that our intellectual power is enough to properly decode the workings of the brain, the imprint of 600 million years of evolution. Predicting strong AI based on the progress of computer hardware is thus a false idea.
The above dilemma is true for any step in the improvement of intelligence. The idea that intelligence, summed up in a fuction i, will automatically lead to a new intelligence, i+1, is false parallel to the same argument. Thus Kurtzweils singularity is not just wrong. It’s in fact infinitely wrong.
Tim Dettmers says
Thanks for your comment. I can see your point and it makes sense to me. I believe that it is not really possible to accurately picture a distinct future, but it helps to have some reasonable paths to the future. I think mine is reasonable as is yours, but in the end we have to wait it out to see what turns out to be true.
dirk bruere says
It does not matter whether Kurtzweil is right or wrong. It is something we will discover by experiment. Right now NN h/w is on a steep “More than Moore” curve. When that starts to level out in 10-15 years we will have a better view of the landscape. Meanwhile at the very least narrow AI promises to deliver very impressive performance enhancements in a vast number of fields. Also, I’m not too sure about the emphasis on Human scale AGI. What we want of AI may well be quite achievable without “recreating the Human brain”. Let’s just see where this goes
Valentino Zocca says
I don’t need to reiterate what other people have said about how good this article is. I want to add that Yann LeCun, at a recent talk (https://www.youtube.com/watch?v=vdWPQ6iAkT4&t=6300) mentioned that our brain does roughly 10ˆ18 to 10ˆ20 operations per second, while the fastest CPU can only do 10ˆ13 operations per second. This is another measure of how better the brain still is.
John Smith says
Please provide references where you say “Neurons change their genome dynamically to produce the right proteins to handle everyday information processing tasks”
This is interesting, and I think you are talking about epigentic changes (e.g. DNA Metylation) , which results in changing the expression levels of the proteins, but it does not actually change the DNA code itself.
Unless it DOES change the DNA code itself, which would also be nice to have a reference paper to read.
John Smith says
Also your example with HIV, can you back that up, you’re implying that the cells use reverse transcription to add new dna?
The dna code is different in neruons because of mutations and the fact that neuron cells are very long-lived cells https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558435/pdf/nihms888777.pdf
They change is through epignetics, not by code editing…
https://www.sciencedaily.com/releases/2015/04/150427112559.htm
Tim Dettmers says
Not all viruses work like this, but retroviruses work like this in general.
Tim Dettmers says
Here a popular reference for a short overview. For papers just follow the links in the article: https://www.scientificamerican.com/article/scientists-surprised-to-find-no-two-neurons-are-genetically-alike/
Ghost says
Great article, when will you write the second and third part?
Tim Dettmers says
I would really like to work on this and it might happen when I start my PhD next year. In the next 9 months I will be swamped with work.
lhyz says
Another estimation that would be interesting, if possible, would be to start with the brains known energy consumption and work backwards into how many maximum operations that would yield.
You would have to treat both chemical pathways and electrical as information processing operations.
For example a pathway from sensor to motor consisting of four neurons in serial, would be seen as a very simple circuit, but complexity rises when you account for all chemical activity that could enhance the information processing.
And more interesting. How is the energy used. Electrical spikes could be seen as information packets, that needs energy to overcome the ohm-resistance throughout the axon. But what “resistance” does the chemical pathways has?
If you think of a dead cell, entropy would take over. It takes energy to make order.
But in the neuron cell it seems as if everything is triggered as in some chainreaction that doesn’t consume energy, it would probably need energy to restore the triggers.
Tim Dettmers says
That is an interesting angle. I think one could also come up with very good estimates based on this. In the end, supercomputers are also constrained by energy and not by computation and an estimate which calculates the energy efficiency of the brain (how much operations can we crank out with 20 watts of brain power) would be a very good estimate of where we are now compared to the brain.
Your last point is very much true. The ion-pumps in the neuron consumes more than 80% of all energy. It is as you say, the brain needs to fight against entropy and this is the largest cost of energy. Other parts are quite negligible in energy consumption.
dirk bruere says
Given the Landauer Limit of around 0.2eV per bit at room temperature, doesn’t this imply the brain is limited to, at most, around 10^22 operations per second? And probably several orders of magnitude less given likely inefficiencies.
dirk bruere says
Sorry – 0.02eV
Paul says
Supercomputers are constrained by wire density and heat dissipation
Not by power.
Fyi
Alex Sobolevski says
This is so good. Thank you. I am printing this out to re-read it many times and take more notes. As someone who had only a handful of courses in physiology and machine learning – I enjoyed and learned a lot.
Tim Dettmers says
Comments like these always give me the motivation to write more about such topics — thank you!
George Michaelson says
I would invite you to consider an alternate hypothesis: What if Ray Kurzweil has not actually successfully predicted anything in this domain, and so the application of even strong logic to inferences of his predictions, carries no weight? The arithmetical equivalent statement would be “if your proof depends on divide-by-zero, what do you say to anyone observing you can prove anything with a proof which includes a divide-by-zero”
I appreciate this is only a polemic. I don’t expect to convince you.
But as a thought experiment, instead of paying respect to Kurzweil, and building on him, why don’t you consider what it would mean for your model-extension, if in fact, his words carried no force at all.
What would you change? What would be the basis of your starting points? What would be invalidated, and what becomes more supposition grounded in your own belief?
(from a computer science perspective, many of us (I freely admit) outside the AI/DL field do not regard what Ray Kurzweil says as carrying strong directive force, or of being “predictive” in any sense we understand. There is no singularity, and we are not in the process of being uplifted into it. Machine Intelligence is not intelligent, and deep learning is neither deep, nor learning. Do not mistake terms of art for colloquial meaning.)
Tim Dettmers says
That is a bit of what I am trying to do. I try to show that Ray Kurzweil’s view is too simplistic and naive. I try to replace it with something that makes more sense. Still, I cannot predict the future with this, but I can show boundaries. With current progress, it will take at least a century for AGI to happen. Personally, I think it will take much longer, but I cannot predict that. If the progress is good and exponential, we can hope for AGI in a 100 years and not any earlier really. So what I am providing here is a lower bound, but not any predictive on its own.
:/ says
“to show that it is unlikely that there will be a technological singularity in this century.”
… or in any other century, since the very concept is nonsense.
Exponential growth in processing power or technology does not lead to any kind of singularity: exp(t) increases faster and faster with time, but never reaches infinity.
(Anyway, things that appear to be exponential growth usually turn out to actually be logistic growth: as the carrying capacity of the system is reached, growth slows down and then stops. Logistic growth *definitely* doesn’t have a singularity.)
Tim Dettmers says
Logistic growth is indeed what seems to be happening. I agree that the concept of a singularity is nonsense. With my new blog posts, I see again that many disagree even when the facts are rather clear.
Dan says
I re-read your post, and find it most interesting. A holistic and torough analisys. Two things I would like to remark: 1. the fact that we could totally replicate the network of the brain means little in itself. Let’s imagine you can replicate a complex computer. If yo do not know what kind of software it uses, the hardware tells you very little about it.
2. As the IT thechnology progresses, the gap between Hardware and Software (H/S gap) increases, meaning that we got closer to the brain capacity to compute, and have no clues about the software (neurological level, equivalent of the machine code for computer). And we don’t have an Intelligence Theory, that would explanin differences between humans on one hand and between humans and animals on the other.
Tim Dettmers says
There are few differences between the human brain and other animal brains. There are some details like prefrontal interneurons which only share a few animals with us, but the major structures and neuronal composition of you and, say, a hippo are basically the same. The difference there is only the count of neurons.
I agree however that there is a mismatch between what hardware does and what software does. However, to some extend the hardware determines the software. Any software on traditional processors uses the hardware in the same way (memory operations, disk reads, vectorized computation) to achieve this goal and the details only differ for specific applications (mobile phone vs database). This is the same as saying: Humans use the same brain structures to achieve a goal, but your specific implementation of riding a bike is different from mine (or riding a bike is different from walking).
I would argue the specifications of our hardware determines to some extent what behavior we get. With current hardware, we get stupid but fast behavior (looking up a table, looking up a website, matching a string in memory, doing a matrix multiplication vs picking something up, learning a language, seeing the objects in the world). If we assume that we stick to this hardware I think my blog post would make quite correct predictions.
DMH says
Hi Tim, thanks for this extraordinary, excellent and very instructive article! 🙂
You said in the “Genetic Manipulation” section
” You may ask, isn’t it so that every cell in your body has (almost) the same DNA in order to be able to replicate itself? Generally, this is true for most cells, but not true for most neurons. Neurons will typically have a genome that is different from the original genome that you were assigned to at birth. Neurons may have additional or fewer chromosomes and have sequences of information removed or added from certain chromosomes…”
I’m not an expert in biology but this statement seems to imply that the genetic structure of neurons changes during a persons life, despite the fact that the “general” genetic structure of the rest of the body doesn’t. Is that correct?
I’m asking this because there is a strong debate now related with trans-sexuality in which some people say that it is “innate” (which is absurd, from the very definition of sex), but if the above statement means that the genetic structure of the neurons (in the brain I suppose) can change interactively during a person’s life, then the “naturalness” of trans-sexuality could arguably be defined as “acquired”, instead of “innate”.
Do you think that this explanation of trans-sexual behavior/nature could be correct in some way?
Thanks again!
Tim Dettmers says
Theoretically, this makes sense. Changes at the genetic level could alter a person’s nature/behavior towards becoming transsexual. Practically, I think more simple explanations are more valuable (Occam’s Razor). It is difficult to test if a person has become transsexual because of genetic changes at the neuronal level, and I think other hypotheses which actually can be tested are more valuable.
Ali F. says
Tim,
I hope scum-of-the-earth comments from brave folks like “Bart Simpson” haven’t prevented you from doing more articles like this. I will give him this, in picking a name like Bart Simpson, he at least did not attempt mislead us regarding his true colors. He’s not putting his full name, when he throws eggs, but at least he knows what he is; just another bart.
Be encouraged Tim, for every flame-war-hero and troll that came after you here and elswhere, there have been five to ten times as many who found your research refreshing and sober all at once. If I could use one word it would be “exemplary”.
I for one am a sold-out born-again Christian. Yet I take nothing but pleasure in your posts. So you see? You have the ear of scientists and religious nuts alike!
Keep on brother,
-Ali
Tim Dettmers says
Thanks for you comment! I will definitely keep on pushing. I hope to get to blogging about the brain soon.
Hieu Nguyen says
I’m very greatful for this super informative post.
Thank you 😀
Tjaco says
Tim, breathtaking post. Tremendous breath and depth at the same time. Amazing and outstanding. Your conclusion underlines an important introspective truth, namely that it is just unbelievable what our brains can do and that computers are nowhere near that. It also seems that our brains can learn some tasks (e.g. language comprehension) using much less data than a computer can – although this may change as learning algorithms become better.
dirk bruere says
The thing is, nobody knows to what level the brain must be simulated in order to create a Human scale AGI. These are all necessary experiments upon which nobody should place too many hopes. However, the fact that deep learning algorithms and rather crude neural nets are producing very significant (commercially useful) results does indicate we might not need computing power at the higher end of the Human brain estimates in order to get rather remarkable AI.
Pavel Surmenok says
Hi Tim,
Very interesting article!
Could you please point me to any papers which explain how neurons may also dynamically alter their genome?
Thanks,
Pavel
Tim Dettmers says
Here some relevant work: http://www.nature.com/articles/ncomms12091
The paper has more to do with epigenetic methylation of DNA rather than direct changes at the genome level, but the mechanism is similar and this study has very clear results: Changes of the DNA at the neuronal level has strong effect for information processing of these neurons.
Takuya Matsuda says
I have a few comments on cerebellum and supercomputers.
Cerebellum: in 2016 T. Yamasaki et al. performed real time simulations of a cat cerebellum with billion neurons by a modest supercomputer “Shoubu” with 2 PFLOPS at RIKEN, Japan.
Supercomputer
A supercomputer with 100 times more capable will be available in the end of 2017 in Japan, and they will be able to conduct simulations of human cerebellum having 100 billion neurons.
I have seen an simulation of mortar cortex, basal ganglia and thalamus combined in order to mimic the Parkinson’s disease by the 10 PF machine. In 2018-2019 an Exascale supercomputer will be available in Japan, so simulations of entire human cerebrum will be feasible.
Tim Dettmers says
The problem with these models is that they are too simplistic and do not provide much information what the brain is really doing. Simulations have been run, but scientist are still unsure if they were so useful. Very few papers have been published with solid results on brain simulations. So to relate this to your argument, I may be that they are able to simulate simple models of the cerebellum, but I do not think it really matters in that it does not help in the quest for AI and it does not help us to understand the brain.
Rick says
I think you present a pretty convincing argument that dendrites do some amount of convolution (and max pooling-like aggregation) across their synapses. However we also know that within brain regions neurons cluster around specific types of processing; for example orientation columns within the visual cortex. To me this suggests that the kind of convolution being done by a single neuron is more akin to a single filter in a convolution layer, rather than the entire convolution layer.
So with the orientation column example: it seems that the dendrites of a neuron in a particular column, in a particular part of the receptive field, process convolutions that allow it to recognise orientation in that part of the receptive field. In DNNs we use a single filter that we “slide” around an image. In biology evolution has crafted for us the ability to recreate that same filter within each neuron. This is less space efficient but is like a compiler optimisation where the convolution operation is inlined into each neuron – it produces larger “code” (more neurons) but has better parallelism.
I assume that this sort of organisation is common in the brain. i.e. that clusters of neurons perform the same types of convolution but each one represents are particular filter within a specific part of a receptive field. But the way that you present your argument suggests that you believe a single neuron is capable of processing an entire set of convolution filters across the entire visual field. Have I misinterpreted?
Tim Dettmers says
Thank you for pointing this out. My post is not clear on this issue. I agree that the best interpretation, when one follows this dendritic convolution argument, is that there is only a single filter.
This is complicated by the fact that a neuron often has multiple branches where activity is concentrated. One could interpret this as multiple filters that are active at different times, but it is unclear to me if that interpretation would make sense.
This would be another point I am off which would change my estimate, but I think the overall point still stands quite clear in this blog post: Deep learning is in some way similar to the brain, and we still cannot hope to build neural networks which are as capable as brains.
Thanks again for your comment, Rick, such discussion is much appreciated.
Dan Vasii says
There is very difficult to make people understand what they refuse to. The basis for the animal intelligence is the causal chain. This for humans means effectively nothing – is even lower than the base. I saw with my own eyes how a six month babe pushed the button of a music device – he crawled to it and started! This while an intelligent adult ape, like a chimp or a gorrila needs one or two days to realise that the key can be turned and this way the door will open. Causal chain is a fickle or man. Human intelligence works a totally different way – by creating in human mind a virtual model and analysing it based on previous informations, experience and SIMILAR MODELS! This similarity is totally abstract and animals do not function that way. This essential difference between AnI and HumI explains actually the dead end in AI – there is no way to replicate the basis of HumI.
dirk bruere says
Logical reasoning in crows:
Dan Vasii says
https://www.youtube.com/watch?v=nY_TIv8Ykks
In fact monkeys might talk -only if they had the brain. They do not have IT.
Anton says
Tim, thanks for this extensive post: I learned a lot. You looked at a (very computationally expensive) model of neurons and thereby the brain in order to estimate the computational power of the brain. However, I think there is a flaw in this reasoning. The flaw is that the computational cost of a model does not equal to the computational power of the thing modeled. For instance, there are computationally very expensive models of the workings of one transistor, but this does not relate at all to the computational capabilities of that transistor. Analogous to your reasoning, a model of the flow of electrons in the channel of a transistor will enable us to (partially) understand the working of a transistor and by estimating the computational cost of this model, we can get a (lower bound) of the computational power of the transistor. Subsequently, multiplying the number of transistors in a processor times the computational cost of the model of the transistor will give us (a lower bound of) the computational capabilities of the processor. Obviously, this isn’t right. You said it yourself: we have no idea how the brain actually learns and until we do, I think we have no way of reliably estimating the computational capabilities of the brain.
Tim Dettmers says
This is a good point and I think it is quite right. A similar point is also that we currently do not understand when a neuron actually produces “useful” information. Synchronicity is probably important here, but it is still poorly understood. We observe similar things for deep learning, where we can remove 90% of the model parameters and still obtain a model with the exact expressiveness.
But when you think about it, both in deep learning and in human biology we need to go through “useless” expressions of information to arrive at the “useful” ones. I think in that sense it makes sense to also look at the amount of “useless” processing and take it into account. Probably the degree of that processing is quite different in deep learning, but I think these “useless” forces become important in a pseudo-evolutionary process which exploits randomness to find useful expressions of information, knowledge and reasoning. In that sense I think a computationally rich model is still reasonable.
Dan Vasii says
There is a fundamental catch-22 in AI. Right now there are two kinds of I, that man uses simultaneously, and that is the probem. One is the AnI- the Animal-grade Intelligence. It deals with concrete problems and not with the abstract ones, like Human-grade Intelligence. As long as the scientists cannot explain away the fact that animals are unable to deal with the abstract (that is wha an animal cannot get a joke, do not have art, science and so on), while humans can, the very understanding of intelligence as phenomenon is fundamentally flawed, preventing any real advance in the field. As such, there are two trends that are still making scientists believe that there are progress. One trend is digitalizing human capabilities – as the capability to play a certain game/recognize certain patterns, The other is believing that if they create systems such as Watson ( bassicaly just a Golden Retriever of infos, therefore AnI), useful by themselves, they also made progress. The fundamental flaw is to believe that COMPUTING=THINKING.
Tim Dettmers says
I think animals can reason quite well. Some animal even make use of abstract moral reasoning as demonstrated in some experiments with different ape species [1]. The hunting patterns of certain whale and dolphin species also indicates that they have complex reasoning. There are several species of birds, whales and so forth which have and cultivate culture. I think the boundaries between animal and human reasoning are much more fluid than you assume and I would argue that we are even far away of getting to a level of animal reasoning. What you mention is more information retrieval (IBM Watson) than animal reasoning and I would agree on this point.
[1] See https://www.youtube.com/watch?v=GcJxRqTs5nk for a good summary
Dan Vasii says
I think you the keyhole fallacy – when the researcher looked through the keyhole, he saw an eye – of the monkey he was studying it. But the two had totally different purposes – monkey to see if it can get more food, while the reseacher to see what kind of behaviour the monkey exibits.
Animals do not have: humour, religion, science, art, all results of abstract thinking. Even things that some may consider abstract are not: for example, some birds can make the difference between piles of seeds – one seed, two, three…. but this does not in any way involves the concept of numbers.
Joe says
Various people have posted on the differences in estimating computer level performance required to match human level intelligence, with estimates from 10^15 – 10^21 (Bostrom has summary of arguments).
However, I can’t help but notice that these levels i.e. 10^21 are the maximum possible performance NOT the actual performance used by a human (& as shown by fMRI)
i.e. what is required to perform a given task e.g. vision, movement etc using mechanical substitutes? In many cases this is GFlops – TFlops not EFlops & the combination of all tasks is a summation NOT a product
More importantly, whilst some tasks are still not at human level e.g. gymnast, other tasks have been surpassed by computers – e.g. encyclopedic knowledge, “seeing” in multiple frequencies NOT just light, recently (transcription of voice), multiple language translation through inference etc i.e. there are many tasks that AI machines perform significantly better than any human on earth. At the moment humans can perform
more tasks, on average, better than all current AI.
qu: How long before someone with deep pockets combines all tasks capable by AI into
a single entity?
It is also important to compare AI to “average” human, not the best in the world.
e.g. alphago was compared to previous world champion (now top 10 player). BUT Lee Seedol is top 10 player out of 7Billion humans. Alphago can be put on ANY computer in the world & ALL of them can play at that level.
One of the reasons why I think Kurzweil has been relatively accurate in his predictions is that he doesn’t look at the individual technologies & mini trends (= sigmoidal) but only at the overall trend (= exponential). To that end, his estimates of ~2045 being the point we reach singularity is not unreasonable.
Tim Dettmers says
Note that fMRI plots measure the difference between baseline BOLD activity and task BOLD activity and do not say anything about activity per se. In general all neurons in the brain are active all the time. There is coordinated, synchronized activity which is a good indicator that some important computation takes place, but it is still poorly understood.
Combining current AI models is an extremely difficult undertaking and currently there exist no methods how to combine any two systems, say computer vision and language understanding. You can combine two modes of information when you design a model for a specific task but we are unable to put two separate systems together in a coherent manner that makes sense.
Human level performance is often reached in research dataset. If you apply these methods to real world data you often get poor performance without careful engineering. This is especially apparent if you use these methods in Kaggle competitions, or yet better, in the corporate world. To generalize these methods is a difficult undertaking only the largest companies like Google can manage. Also note that these systems do very specific tasks. An algorithm may detect objects in an image, but is unable to interpret the relation of these objects. For example no algorithms exist that detects visually if a situation is dangerous, funny, or interesting and without those judgement any interaction with humans or exploration of terrain is hardly possible. We can translate simple language, but translating poetry, recipes, or cultural texts if far beyond the reach of current algorithms.
The limitation is also not money here, but talent. So pouring money into the problem will not accelerate it in the short-term. A few decades will be needed to train the talent that will be needed for a collaborative push towards AGI.
Joe says
Self driving cars have been around for a while and, subject to legislation, will be ubiquitous in the 2020s. They can “see” & react to the environment better than any human. This alone is evidence of visual based algorithms performing at or better than human levels esp with multiple sensors.
Please do not respond re: ethical decisions of when a stupid human goes infront of any AI driven car as to which human the AI decides to save.
AI is the next evolution of the human brain, which if nothing else is limited by size constraints.
I disagree re: your estimates of human brain capacity re: processing for many reasons but more simply because you over-complicate the processing of decision making.
I suggest you make the following table. Write down every single processing & physical task a human does from calculating a simple 2 digit number, reading, comprehension, to any physical activities. For each task estimate the processing power required by current technology to achieve such a task e.g. calculating a 5 x 5 digit number requires a significant amount of energy & processing for a human but only a calculator worth of cpu & energy. Some tasks humans will excel at e.g. mostly physical whereas other tasks e.g. calculation, sensor reading etc an AI will excel at. Once you sum all tasks you will find that the brains computing power is significantly than previously thought and well within the reach of modern AI. I just don’t think anyone has put the whole lot together yet. And the other significant issue is efficiency. The human body is orders of magnitude more efficient than machines for the moment – human brain @ 20 watts, body ~ 100 watts. As to the question of when – any date between 2035-2080 is reasonable based on current projections.
Tim Dettmers says
Lifting a chess piece, taking another with the same hand, putting it on the square and putting the other cheese piece at the side of a chess board. That is a problem so difficult with a robot arm, we would not be able to solve it with the current technology that we have (at least for the general case: All chess figures, all board positions, all angles etc). This is extremely easy for most humans. So I agree that computers can specialize in things and achieve results which go beyond the ability of people — I do not doubt this at all! AI will dominate the world by 2035. What I doubt that machines will have any significant intelligence. They will be good at crunching numbers, summarizing data and spitting out narrow responses for narrow contexts, but that’s it.
Joe says
Have a look at:
(i) a robot playing & beating a table-tennis player
https://www.youtube.com/watch?v=tIIJME8-au8
(ii) robots performing surgery better than the best surgeons
http://allaboutroboticsurgery.com/surgicalrobots.html
(iii) cooking robots
http://www.moley.com/
(iv) innumerable industrial robots with better efficiency than any human
http://www.fanuc.eu/bg/en/robots
There are still many free-form calisthenic motor activities that robots can’t do but they will get there. Humans will not!
As for intelligence – AI is already smarter than any human.
All the calculations most scientists do to calculate total processing power of the brain assumes that the processing power of each synapse is additive. IT IS NOT! ie. 100 Billion connections do NOT work in concert to perform a single task. If they did you would die. This compares to an AI which can be programmed to do this. I personally use the HPCG or TEPS benchmark to compare machines to human performance as it is a more realistic benchmark of computer processing power than the Linpack HPL benchmark.
Currently, I see that the only thing we need to figure out is how to program “I exist & want to live” into an AI. i.e. self-awareness . If someone can figure out what the “program” for that is then you have a sentient AI which will be significantly smarter than any human alive today.
Theories on this exist e.g.
https://phys.org/news/2013-11-super-intelligent-machine-equation.html – AIXI formulae
https://www.sciencealert.com/a-robot-has-just-passed-a-classic-self-awareness-test-for-the-first-time – personally I’m not convinced due to various assumptions
However, we are not that far from achieving this and the latest innovations in FPGA
http://www.anandtech.com/show/11748/hot-chips-intel-emib-and-14nm-stratix-10-fpga-live-blog-845am-pt-345pm-utc
& IBM true north chip
https://www.ibm.com/blogs/research/2017/07/brain-inspired-cvpr-2017/
re: improving efficiency of dnn calculations means that within the next 5-10 years we will have human-like efficiency in hardware & software.
The question is not whether human level AI will be achieved in the next 20 years BUT will humans be ready for it. I just hope that the technology for human-machine integration develops at the same time because all those that do not accept it will become obsolete.
Tim Dettmers says
The problem with (i-iv) is that these require simple, repetitive movements with few joints and actuators. A surgent bot will only be able to perform non-critical route surgery. If something unexpected happens it will not be able to have the fine location and movement abilities of a human to correct this problem. Robots will assist surgeons, but they will never replace them.
Your brain usually uses all brain regions all the time (except for sleeping then it’s more like 75%). You are right that never at any time all neurons are active at the same time, but if you look over the stretch of a second this is true. The same holds for a computer. Not all transistors compute anything all the time (even if your processor is at 100% of capacity). By definition information can only exist in patterns and not in randomness. The brain and the computer are similar here and it applies to both of them.
Eric Hoft says
Bravo! I really enjoyed it, and I am going to forward it to other people that I know; a circle of my friends are very interest in the future of AI so I will have to forward it to them. What is your take on the work of DeepMind with its AlphaGo supercomputer, since this blog was published before Lee Se-Dol was beaten? Do you think this is indicative of a string of breakthroughs that could lead to a low level Artificial General Intelligence in the next 20 years?
Tim Dettmers says
Thank you, I am glad that you enjoyed it!
The problem with Go is that it is a game with very simply rules. Although a game can be very complex, the behavior learned by AlphaGo is only good for this game. DeepMind is working on more sophisticated games like Quake and Starcraft and I think once they beat humans there one might see where all of this is going. Currently, it is just too limited to some very specific domains and it makes no sense to extrapolate from that in my opinion.
Paul says
What a great post! Very though provoking and educational. Thank you.
“Unless an AI is fully independent and can grow by itself we do not really have reached the singularity.”
False. All it has to do is increase the growth rate of the human race + AI at rate that is exponentially faster in the “longer” run that having another human around does.
You make it sound like the first AI needs to be able to do everything on it’s own. Why would you say that?
All you need for a singularity is to change the dynamic of what we already had before AI by a little bit….and a little bit goes a long way if we decide to build zillions more of these thingys cause theyre so useful and fun to have around, etc.
e.g. if we had super amazing math octopi savants that can’t even get out of the rain, they would provide a huge kicker to the human race. And imaging 1000 other such specialized savants. You don’t think they will eventually help the planet create even better versions of them. Isn’t that the start of the singularity?
You seem to think the ‘singularity’ means a very short instant in time with a large inflection. I would argue that the singularity will be happening and (what with human’s capability to assume the new wonderment/horror is the new norm) all of us will still get up and have our morning coffee.
But a gradually increasing blossoming of technologies will get us to the technology that you imagine. It will just happen across 10-20 years, not one. During that time more and more people will realize what’s happening but some will still say..”nope, still looks like a non-singularity morning to me!”
Nevertheless, very soon you get the whole enchilada.
I noticed you said you are unwilling to make claims without actual data. Sadly, this statement means you are lacking in the skills required to prognosticate the future. 😉
Cheers!
paul
Tim Dettmers says
What you say makes sense. I think I was thinking and writing a bit in black and white terms. I think the singularity will happen gradually, but I do not think that an AI that is better at some things will such an huge impact if it uses too much resources in terms of computational power. I think we will be on the verge of the singularity if powerful AIs, which are good at some things, but generally sub-human, become so cheap to operate that a lot of people can operate them, or at least large corporations like Google will be able to have many of them. I think that will will still take considerable time.
Thanks again for your comment!
Paul says
How many people actually advance the leading edge of the human race? The number is really small. Einstein one in a billion. Most of us make very small contributions and the vast majority struggle to keep up with entropy.
So I don’t think you need that much power, besides you can always use solar and checkpoint your results during the night 😉
J R says
And so it begins: http://www.nextplatform.com/2016/08/08/deep-learning-chip-upstart-set-take-gpus-task/
This is what I was talking about a year ago, I didn’t expect it to happen so quickly, but here we are: A specialized deep learning chip that packs 55 teraops per second and an interconnect of 2.4 terabits per second. Read about how they design the chip, it’s completely different from traditional CPUs and GPUs, you drop all the general computing crap and just optimize for what matters for deep learning. Suddenly the prediction of the growth of practical computational power isn’t so certain, is it?
Tim Dettmers says
Interesting. That might indeed be the future. Thanks for sharing!
Scott LeGrand says
Great article! I was going to write something up this weekend about how LINPACK enabled Intel’s dominance of HPC until the resurgence in deep learning, but you covered that nicely.
That said, I think you’ve built a bit of a strawman here with human-level intelligence as th goalpost. I don’t think you need 1000 EFLOPS to see the emergence of a computational intelligence. Cephalapods (~500M neurons) demonstrate significant problem-solving ability and planning skills. Approximating one would bring us to 25 EFLOPS or by your own roadmap or within an order of magnitude by 2025 or so, certainly 2050. That’s well within the upper end of my own lifetime, and most of the deep learning community, no?
An octopus-level intelligence seems more than enough to create an annoying popup maximizer or a polymorphic virus that understood itself and its enemies sufficiently to continuously evolve adversarial variants of itself to evade detection. Further, since such an entity would be able to concentrate entirely on the task at hand, I will go out on a limb and suspect that 1-2 EFLOPs is more than enough for whatever serves as its brain analog, commanding and coordinating a loosely coupled network of agents inhabiting laptops and mobile devices. And I suspect much like ants, there would be multiple copies of the brain in case one or more were eradicated, perhaps even a distributed variant of sub-PFLOP (100 TFLOPs? We’re at ~20 already in 2016 with GTX 1080 (dp2a)) devices even harder to kill.
All this brings us to ~2020, a year I jokingly refer to as The Rodent Singularity. And if such a system doesn’t arise from the same sort of idiots who disable messaging on their mobile web site to force one to use their insipid messaging app, there are plenty of virus and botnet writers who will jump on the opportunity to do so. I’m somewhat surprised that some primitive variants of this haven’t already happened, but then, understanding machine learning is a very lucrative career path at the moment so perhaps there’s no real incentive to turn black hat yet.
And of course, Ray Kurzweil, upon seeing such a thing come into existence, will quickly redefine his predictions to insist that he nailed it back in 2005.
Tim Dettmers says
I quite agree with you. There will be specialized computer programs, or “agents” who will have human-like capabilities in their niche task as they require much less computational resources. But for a true singularity we would need an human-like AI to teach other human-like AIs so that they quickly get better and better. If we would have a lot of rodent-brain specialist we would not see a singularity because these agents would not be able to learn from each other. They would me simple, intelligent tools for the specialized task at hand.
To create memories that can be used across domains you first need to be able to understand information across domains. For a singularity to happen an AI would at least need to be able to understand a large portion of interdisciplinary domains and thus would need a large portion of computational power which is similar to our brain.
If you could teach an octopus how to do math and how to teach others to do math they would still be unable to match our abilities. Even if such an octopus would understand the theory of everything physically and mathematically it will not be capable of any intelligent action per se, for example it would still be unable to create a machine which protects it from whales and sharks. For that you would also need knowledge and the capability of understanding engineering, craftsmanship, metallurgy, chemistry and since such an endeavor would be hopeless by oneself, the octopus would also need an understanding of politics, justice, morality, leadership and so forth.
This thought experiment is not so much different from machines. If an intelligent AI would hack a car factory to build new AIs, it would still need to understand everything from above to succeed in its plans. A robot that builds cars is just good for that and any adjustment needs intelligence in many domains. Hacking some 3D printers will get an AI only thus far. Unless an AI is fully independent and can grow by itself we do not really have reached the singularity.
Even if an AI overtakes the internet and spreads everywhere it will be unable to improve itself without any human aid. It can read the internet, maybe understand large portions of it, but what then? It could influence people on the internet to do its bidding, but what are 100, 1000 or even 10000 people gonna do? It takes quite some computational resources to convince people to do something radical. Probably an AI would be so smart that you could not distinguish between it being a AI or a human, so when you see it is messaging you or talking to you will not be able to tell if it is an AI or not, but you will be quite sure that it is just another lunatic.
Being born with Einstein’s brain has a chance of approximately one in a billion. But even if you had the brain of Einstein you would be incapable to change the world to any large extend. I do not think anything below human intelligence will be ever more than a mere tool for us. I think for a true singularity we would need enormous computational power. Probably much more than a human brain is capable of. And even then there are many limitations.
So how can an octopus-like intelligence lead to a singularity?
test says
Hey there, You’ve done a fantastic job. I’ll certainly digg
it and personally recommend to my friends. I am confident they’ll be benefited from this site.
Tim Dettmers says
Thank you!
Eduardo says
Hi, I am starting a new research project and would like to know if there is any chance or trend for improving performance through model parallelization?
Tim Dettmers says
Not right now. One option might be to use in-cache model parallelism with 8 Tesla P100s, but this is a very special application which will work only for recurrent nets. In general, data parallelism is all that you really need.
Mark Gubrud says
Tim, thanks for this salient contribution to an important debate. Most human-level AGI naysayers are just denialists, I think; but you have laid out arguments in detail.
However, it seems to me that if one were to take your method for estimating the brain’s “FLOPS” and apply it to modern digital computers, rather than just quoting the measured performance, one would write a series of mini-articles on the complexity of algorithms embedded in the gate-level architecture, the intricacies of device physics, quantum mechanics and so on, concluding that the “FLOPS” of the machine was actually many orders of magnitude higher than the measured performance. If one approached this as a scientist armed with microscopes and mass specs but little or only fragmentary knowledge of computer science, one might say it was “arrogant” of anyone to assume that the lower-level physical structure of devices could be ignored or modeled by some simple Boolean functions. If you knew that you were dealing with a binary machine, you could say that its behavior, at least to those aspects we care about, was insensitive to details of its physical structure; but if we were talking about an analog computer or something like a retina chip, that would no longer be true.
I think one has to break up “information processing” in the brain into two large baskets: one is essentially electrical activity on the network as it exists now, which must account for immediate behavior, and the other is learning, which modifies that structure over time. Most of the interior complexity of things that individual neurons and local networks are doing besides summing and firing relate to the latter basket, while it still seems likely that activity in the former basket cannot be much more complex than the time series of axon firing events, defined to ms resolution. In fact, it seems that even the activity in the learning basket must be sufficiently determined by the firing time series.
Of course, you need the internal mechanisms for learning to occur, otherwise no mind. But it isn’t clear that the basic algorithms, whatever they are, are really as complex as neuroscience reads out their implementations to be. Birds are still much more complex structures than airplanes, but airplanes fly pretty well. Or at least, birds are more complex if we arrogantly assume we can ignore the microstructure of aluminum and so on. Reduce your description of a bird to its aerodynamics and suddenly you are looking at something a lot simpler.
A benchmark I like to cite of where we are in the development of AI is that state-of-the-art systems now do things like visual object recognition and processing sound to natural language text, each of which apparently keeps a macroscopic fraction of the neocortex occupied at any given time. You say that deep learning vision systems only “label” objects but don’t have semantic understanding of them, but the latter surely involves more than just the visual areas of cortex and, conversely, once an AI can “label” (identify) an objects it can access database information about it – including information that is predictive of whether it’s good to eat or might eat you. As you are aware, this business of prediction or narrative generation is now the frontier for AI.
I think your method is a correct rebuttal to those who suggest that “brain uploading” and simulation is a likely route to human-level AGI, but we knew that already without much doubt. AI takes inspiration from neuroscience, but it is a branch of engineering which looks for efficient solutions using the technology that is available. For example, we don’t know how to make self-tuning transistors, but we can make self-tuning transistor circuits and self-tuning numerical neural net models. Rather than comparing structures, I think we should compare performance on both narrow and more general tasks. This gets us back to the real world, where we can see that AI is indeed already outperforming and displacing humans in the workforce on a massive scale.
Tim Dettmers says
Thank you, Mark, for your lengthy comment. It is well thought out and adds some good points to the discussion. I agree with you on some points and disagree with you on others. Unfortunately, I do not have time to respond in detail, but really appreciate the view you have on the topic — thanks!
Mohamad Zeina says
This is very thought provoking. It’s not only made me re-evaluate my thoughts on The Singularity, but also the brain generally.
Regarding your statement “It was also shown that the number of neurons in each animal’s brain is almost exactly the amount which it can sustain through feeding” – do you have a specific reference for this? I would love to include that idea in an essay but I can’t find any source for it, I’d appreciate any help finding the original study.
Tim Dettmers says
This talk sums it up pretty neatly.
Carlos Perez says
The brain evolved through an inefficient time consuming process known as evolution. One cannot make the argument that it is anywhere near optimal in its functioning.
Deep Learning networks continue to evolve in that the number of artificial nodes uses continues to decrease for the same recognition problem. The Inception networks that Google has implemented has pruned the number of nodes by around 20 times. The required numeric precision has gone done to 12 bits, in fact there’s some work nodes with binary weights.
We still don’t understand why the networks works at all, however there is a fundamental mechanism to learning that is independent of the actual construction of the hardware.
AlphaGo has exhibited super human capability that no human can beat. It is actually terrifying that it is able to boot-strap new knowledge by playing against itself. Although most real world problems have imperfect information, the kind of intuition that humans use to navigate complex problems don’t seem to be out of reach of current technologies.
I therefore think that the predictions here are a bit too conservative.
Tim Dettmers says
Evolution picks up on genes which increase fitness less than 1% and optimizes them until a new niche forms which is results in a new branch of species. If you think about it, all animals that are alive stem from a single organism and have been optimized by evolution. Even today evolution is a highly active process: Fish get smaller due to global warming, and humans and other animals have their own bacterial species in their guts which is optimized for each person’s gut microbiome. A large fraction of these bacteria evolve so rapidly that after a few months they could be classified as a new species.
If you think of mathematical optimization you will find that evolutionary strategies yield much better results than gradient based methods, but they are currently not used because they are too slow.
Evolution can be best modelled with game theory. If you think that evolution is nowhere near optimal can you then give some examples which show that it is not optimal strategy under game theory?
GoogLenet trades parameters for depth and it is no surprise that is does better. If you translate the “depth” in deep learning to the depth in the human brain that you will find that the depth in the brain is much larger. So much larger in fact that you would need to bound it by time to get any sensible number. So I do not think that there is any real trend which shows that we can do more with less units. This is a coincidence where we apply some special model on some special data set. Our data sets are very, very small right now and on such data sets you can expect to require less parameters as the dataset gets more and more optimized. I do not see that this will hold true for very large data sets.
Currently we cannot leverage the reduction in precision since no hardware can execute that efficiently. The new GPUs from NVIDIA will support 16-bit computations which will be a bit step. We have yet to see bit-wise neural networks on hardware that perform well. Emulation on hardware is just too slow to be beneficial right now.
AlphaGo learned to play one game. Life itself can be thought to be composed of thousands and thousands of sub-games. I do not see how AlphaGo can generalize in an efficient, unsupervised fashion.
Valentino Zocca says
AlphaGo beat humans at one task, albeit a complex one, but just one. Computers have been able to beat humans for a long time, for example in terms of computational speed, but the range of their activity, though expanding, is still minuscule with respect to what humans can do. We may soon have cars that drive better than humans do, and machines that, in general, perform better than humans in specific tasks, but that is a far cry from having machines with a “general purpose” intelligence, which is what evolution, through millions of years of evolution, was able to come up with.
Valentino Zocca says
Tim, this is one of the best posts I have read in a while, and I like how you combed through several arguments to make your points very clear. I wrote a post on the topic of singularity and took the liberty of quoting this post in mine. If you want to look at my post, here it is: http://tinyletter.com/vzocca/letters/the-singularity-that-will-never-happen
MK says
The flaw with aircraft analogy is that we came up with much simpler solution, because and only because we agreed to sacrifice many other qualities, such as safety or robustness.
Pavel Surmenok says
>>An average neuron in the cerebellum has about 25000 million synapses
Is it 25 billion synapses per neuron?
Tim Dettmers says
That was a typo, thanks for pointing that out!
Philippe says
I have a background mostly in computer science and neuroscience. I have read Mr. Kurzweil book the singularity is near and I always was and still am fascinated by the brain. I have to say that I would like you to be right, I would like to think that we are very far from attaining this singularity, but I my opinion, we are awfully close.
First, there is a lot of complexity in the brain that isn’t really needed, and is just there because the brain evolved from biological life. this complexity is superfluous to intelligence. in the end, the atomic element of intelligence is the neuron, whatever its substrate is. it can be made of software or it can be made of cells that need to break down sugar to function. neurons could even be made of woods and gears, it doesn’t matter, what matters is that they can sum up action potential from afferent neurons and send a signal to some others. There are roughly 2 big differences from how the brain function from how we try to mimic it using software and neural networks. 1. not all neurons can make another fire, since every neuron are sensible to certain neurotransmitters and 2. the brain is highly compartmentalized, it has evolved structures that are connected only in a certain way to the other structures, and the way they are connected isn’t efficient and is the product of evolution. those things aren’t things we should “try” to reproduce.
You spoke about back-propagation. Back-propagation is a clever trick that we can use because we use computer, the neural network itself doesn’t do the math that is needed to reassign better weights to its neurons, the program does. In the brain, something probably much less efficient happen. Before being “trained” a child brain that is motivated will try, using a lot of energy, to find a way to connect its neurons so that the desired effect is achieved ( not sure how it exactly happens but some protein guide the growth of axon during various part of the development and it might be a part of the answer). The brain evolved reward pathways, so the brain can have feedback from the external world. When the neurons participating newly formed networks that correspond to how a problem must be solved in the real world, somehow, the person will know socially or otherwise that the goal has been achieved and the acknowledgment of that achieved goal will trigger the release of say, dopamine, that will in a way tell the neurons that this particular arrangement of dendrites and that the“weights” between them is working and that it should stay that way. Much less efficient than back-prop, but it works without a meta layer on top the neural network that does the back-prop. I think, but it’s personal, that back-propagation is something that the brain, or any neural network, must not do in order to have proper consciousness, because it happens at a level where the neural network really have no control of.
There is also a lot of neurons that aren’t in the neo-cortex, but as far as intelligence is concerned, they really don’t matter that much. They really are mostly hardwired to respond in predefined way so that our organs work well and that homeostasis is maintained. The neurons contained in this part of the brain should be deduced from the total if we use the total number of neurons to estimate how long it’s gonna take to get to human level intelligence using neural networks.
The brain frankly is complex, but it really isn’t that complex. It’s complex but as complex as … you know, the behavior of an ant colony. The colony is complex, the ants, well are of at least the level of complexity of a neuron, but it’s the fact that the individual ant is hardwired to behave a certain way toward it’s fellow ant that the complexity in the colony arises. The are a few basic modules in the brain that are hardwired to be motivated to do certain thing. Even if we were to create the most insanely “smart” neural network, it wouldn’t take over the world, because it assigns no value in doing so. The baby human wants to drink milk and eat, it naturally wants to try to walk! It naturally “wants” thing. That’s the key. The rest isn’t complicated. The neurons fine tune their connections so that the the human can achieve what he/she is programmed to want to do. The brain is only neurons connected each other, we can do that in software already. The only part we have yet to figure out is the “dumb” part of the human brain, the part that create our motivation to do all kind of thing that only have value for creatures that have evolved in a social environment.
In 15-20 years, we will get to a point where we are going to say, ah, if we knew what we know today about how the brain is arranged, we would could have recreated it in software in 2015. Literally, the technology is now advanced enough, it’s just that we lack some basic insight on why the brain parts are connected the way they are and what part of the neo cortex start to motivate other part so that they obey to the will of the being. In fact, the technology might even be too advanced already. Once we figure out enough about the brain to know how to recreate it in software, we will have to find a way to slow down the software neurons, because they will be operating at a speed that is not compatible with moving a body of a certain size through space ( that’s if we want this newly engineered brain to have a body and be able to move).
sorry for that somewhat messy text. if something doesn’t make sense i can elaborate.
Tim Dettmers says
Thank you for your detailed comment. The complexity of the brain either might be needed or it might not be needed — I currently do not see that there is evidence in either direction (there are no intelligent algorithms yet, and the most “intelligent” algorithms require ever deeper and larger architectures to run well). So I think this is a more a believe than a think one can discuss reasonably.
Backpropagation in real neurons was shown to function similarly to backpropagation in artificial neurons, but the problem is that signals do not seem to propagation beyond a single neuron. If we could train our algorithms like children we would waste a lot of energy, but we would really have general intelligence with slowly grows. I think we need to find a trade-off in-between these extremes — current algorithms are just not the solution.
It was shown that changes in the anterior cerebellum cause the largest changes in non-verbal intelligence during the brain development of young adults. So I would say that this brain region is critical for intelligence at least in some way.
I think brains are quite a bit more complicated than ant colonies. Maybe less complex than I described in this blog post? I do not know.
I am quite unsure about the future and I am uneasy making predictions which are not founded on data. I know how fast computers grow and I can estimate how fast the brain is. If you find better data or evidence for your reasoning, please let me know and I would be happy to change my conclusions accordingly.
Mike says
Human intelligence and artificial intelligence are not the same and should not be considered equal. General AI will not be similar to human intelligence, it cannot be. The brain of an AI entity will not be the same as the human brain just because they have been conceived for completely different tasks; one is to operate and control a living organism with biological needs, the other is to specifically perform intellectual tasks better than a human brain.
The human brain does many things that an AI brain will not have to do, like all the conscious and unconscious body processes and movements and that is computational power that the AI brain will not require.
Right now AI is much more efficient in doing things that the human brain can do rather poorly like mathematical calculations, geometry and data analysis. Right now AI is everywhere and systems like Watson prove that AI is much better and getting better by the day, at gathering and analyzing data and transforming it into information, than the human brain.
The point of General AI or the singularity is not to have self aware computers that can walk and pass as human beings. The point of general AI is to develop an “artificial” intelligence, that can learn, communicate and solve intellectual problems and tasks better than the human brain and that is happening already.
Do not expect Blade Runner type androids any time soon, with emotions and longing for more time as living beings, but general AI is in its formative years already and its growth is exponential, and that is what the singularity is about.
Tim Dettmers says
This is very true and I would not argue otherwise. But if we look at Watson, Watson is quite intelligent already, but it failed to yield good results in the medical field, which is quite disappointing and shows that we need more than that to do well on these kinds of tasks. For important problems like natural language processing, it can be expected that artificial intelligence will need to come quite close to human reasoning.
However, we have no exact comparison how fast our algorithms will need to develop to do amazing feats; comparison with the brain is useful to find a good bound on what we can expect if we need full human intelligence. So for hard tasks you still could expect that we need less time than I give here, but it will not be much less time. If you will, I just give a good approximation what we can expect how long it will take to achieve good performance on complex tasks.
Friedrich der Große says
This looks like an excellent blog which touched upon many points about the biological complexity of the human brain and its evolution.
However, I wish to point out something that many others here in the responses pointed out already.
Firstly, you state that it is arrogant to presume “we don’t need all that computation power”. By the same token, I could also state that it would be arrogant to presume that we do.
Evolution does not always pick the most efficient means to accomplish a task. It picks “whatever works” given a deep set of constraints and how they interact with natural selection. And in many cases, Evolution will go down a track it cannot reverse, but must build on top of.
I will categorically state (albeit conjecture) that Evolution did not and probably cannot have chosen the most efficient path that lead to the evolution of our brains as we know them today.
Also, you stated that we do not have a full understanding of how our brains do what they do today. This is true. But I will also state that we may not need a full understanding anyway.
Why not just use the principles of Evolution, augmented with some smart decisions, to achieve the General AI that we seek?
The NEAT algorithm (Neural Evolution of Augmenting Topologies) is one small example of what is possible. Already with the many, many of approaches with Neural Nets today we create structures we do not understand, but does the work we intended for them to do.
Creating more and more sophisticated means of evolving solutions to these very difficult problems is probably the way we need to go to acheive general artificial intelligence.
Tim Dettmers says
Thank you — this is solid criticism. I admit that some of my opinions are a bit extreme and even arrogant. However, in my view I still think that the evolved brain is the best indicator of what is needed for human intelligence. If we rule out the brain as a form of comparison we just have not much else with which we can compare our continuously improving algorithms with — other constructs are just too theoretical to have merit in an argument.
Tony says
Hey Tim, This topic I have been wondering about for years. Its really good to hear another person’s point of view. Kurzweil is great, but its really good to hear thoughts from the guy who builds deep learning systems himself.
What are your thoughts on quantum computing? I know Google and NASA have started looking into it, and it could potentially disrupt our views on computation cost. Because the quantum computing unit is so physically small, it could also affect the bandwidth concern you brought up earlier? I don’t think fiber optic cables of those small lengths would cost anything.
Thanks,
Tony
Tim Dettmers says
The problem with quantum computing is that only some special problems can be solved with it in an efficient manner; for most other problems the are just as slow as normal computers (where bits = qubits). Because only some special parts in deep learning can be accelerated with quantum computers (first and foremost convolutions) they will not represent a significant source of speedups in the future (you need to speed up all parts to gain the exponential advantage). The best would be to use them in an integrated manner with normal computers but it will be very hard to achieve good performance on such hybrid systems because the mode of computation is just very different from classical computers.
Tony says
Gotcha. Didn’t know if they could be used for more intensive systems (Conv Deep Belief Networks). I agree with you that integrating both classical and quantum computers is a nightmare. Its hard enough to get them to work in lab conditions. An actual consumer product would be far in the future.
Mark Gubrud says
I’ll wager that quantum computers can’t speed up convolutions any more than optical computers can, and optical implementation is more likely to be physically attainable.
William says
Honestly, I haven’t done my due diligence of reading the entire post in detail, but one thing that caught my eye and concerns me is the following: is there any reason why you’re operating under the assumption that in order to reach “singularity” (a concept that in itself is poorly defined), the computing power needs to be on par with that of the human brain?
Moreover, as far as I understand, we do not currently completely understand the brain’s computational model. On the more philosophical note: just because we took a certain path in evolution does not mean this is the only path to intelligence. Perhaps we could develop systems more efficient than the brain (i.e. requiring less computational power) that would still pass for truly intelligent systems.
Tim Dettmers says
In animals, especially in mammals one of the strongest constraints on intelligence is caloric intake: The number of neurons is strongly correlated with how much nutrients animals can digest (recently it has been argued that instead of calories, carbohydrates alone trigger this trend, because the brain can only really work well on glucose). Because we see this in almost any species it is a very strong evolutionary constraint and from this you have to conclude that energy is the main problem in computation (this is true for modern high performance computing as well) and that the brain evolved first and foremost in the direction of energy and computational efficiency.
If this is true — and all the important upcoming problems in HPC are related to energy and computational efficiencies (network and memory bandwidth) — then the human brain is the main measuring stick of computational efficiency.
The human brain is inefficient insofar, that it has functions which would not be required by an intelligent agent (e.g. movement might be solved in a simpler manner). So indeed, we will super-human artificial intelligent agents rather soon, but these agents will not perform better as humans on all levels. I do think that a singularity, or in my image and runaway effect of intelligent machines designing more intelligent machines requires, all or most intellectual capacities a human can possess.
To be more direct, I think other systems which are much more efficient than the brain are very much possible; but we do not even have any concepts how that would look like, or how to solve these details at a physical level. Because I cannot compare computation to phantoms which do not exist, I here use the brain, just because it is the best thing that we can compare this too and which actually exists. It is just the closest you can get to a scientific argument, other comparisons are just philosophy.
William says
Thanks for the response. I think what follows in this comment is essentially echoing the comments of many others.
It seems that, as far as the human brain (and not only human), nature chose to tackle complexity with quantity (this is a well known phenomenon in statistical physics, by the way). Perhaps a few tens of billions of neurons and all the computational power they provide (and the energy they require thereby) are not necessary to execute a given task, with given efficiency, but merely sufficient. For instance, to classify the 10 handwritten digits 0, 1, …, 9 (a typical textbook example) one does not need the human brain. In fact, one doesn’t even need a neural network. There are other, far more efficient ways. Thus if we know a priori the task that we are trying to accomplish as well as the constraints that we’re working under, we may optimize with respect to the constraints to accomplish the task efficiently. Nature did not have this advantage.
Tim Dettmers says
Nature also does this in some way. You have a tiny brain region which is just used to recognize faces for example (and other specialized patterns if you are an expert in certain areas e.g. chess positions and bird or car names). Language is also parsed in a three-step process from grammar to semantics to semantic-syntactic integration and each step in this three-step process is crucial for language understanding; this process is very efficient to disentangle the complexity of language.
Computational resources in certain areas are often enhanced if you need to do a certain task frequently (taxi driver = large hippocampus) which shows that the efficiency in such specialized areas needs to be high or otherwise we would not see an increase in capacity (rather than efficiency).
Often a single brain region will be used by most other brain regions to provide the computation for overlapping tasks (left dorsolateral prefrontal cortex and posterior cingulate cortex most prominently) which also hints at efficiency rather than quantity.
dirk bruere says
Just a minor point for the foreseeable future (which is about 5-10 years in electronics). There has been a proof of principle that would could build computers to operate at the Landauer Limt, which is almost certainly far more efficient than the brain:
http://phys.org/news/2016-03-magnetic-chips-energy-efficiency.html
John D. says
What do you say in response to the criticism on reddit: https://www.reddit.com/r/MachineLearning/comments/3eriyg/the_brain_vs_deep_learning_part_i_computational/cti3m7t
Tim Dettmers says
I addressed this criticism in other comments which brought this up, please have a look. I am also planning an update to address these shortcomings, but currently I have little spare time at hand to invest into this update — so for now my responses in the comments section have to suffice.
Tor Økland Barstad says
Thank you for this thorough and informative blogpost 🙂
I have nothing to say in regards to what you say about the complexity of the brain besides that it was interesting reading, but I will make the following short points:
* It seems like a very reasonable possibility to hold open the possibility that breakthroughs in molecular manufacturing could make hardware very cheap: https://www.youtube.com/watch?v=cdKyf8fsH6w. Even if we for the sake of argument assume that we will reach limits in regards to performance per volume within a decade or two, there could still be a radical breakthrough in regards to costs.
* You write: “On top of this, the statement that we will develop better algorithms than the brain uses is unfalsifiable. We can only prove it when we achieve it, we cannot disprove it. Thus it is a rather nonsensical statement that has little practical value.” While it may be unreasonable to assert that we will, it seems very clear to me that it’s unreasonable to confidently assume that we wont.
* It may be that a system that’s heavily inspired by the brain, but also based on non-brain-inspired ideas and/or evolutionary methods, will be able to match humans in regards to generalised intelligence in problem solving / creativity / abstract reasoning / those kinds of things, long before it can do everything the brain can do, and without having the same “computational power” as the brain (to say that the brain is not optimised for this kind of intelligence is an understatement). Of course, this could also not be the case, but the possibility makes it less reasonable to assume that human-level AI necessarily is very far off.
* Just like it’s hard to know how challenging it will be to reach human-level AI, it’s challenging to know how long it will take to solve the challenges AI friendliness / goal alignment. Given what’s at stake, and how little it would take to multiply current efforts manyfold, it would make sense to start working much more seriously on these questions now, as opposed to later. While some of these questions may be unrealistic to work on before we know more about the architectures that will reach human-level AI first, others are not. (For those interested in learning more about this, this book gives a good introduction and overview: http://www.amazon.com/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/1501227742).
Herbert says
When calculating the complexity of real time brain emulation, your order of magnitude estimate seems to entirely hinge on the complexity of the “dendritic convolution” operation in Purkinje cells. Here I have to admit I don’t fully understand your model: “100000×5” corresponds to the accumulation of 1e5 connections through 5 time steps, “10x1000x5” represents the diffusion in 1000 dendrites with 10 branches each(?), also respecting 5 time steps. Why do you multiply these two numbers? Your model calculates the diffusion through 1e4 dendritic branches for each of the 1e5 connected neurons separately, per neuron, per timestep?
If you made a simplified model that does the dendrite bound computation once per timestep, for all the incoming connections combined, the order 1e5 operations for the connections would swamp the order 1e4 operations for dendritic computation; You would get an estimate that is mostly dependent on the number of synapses and neuron firing rate and indeed close to Kurzweil’s estimate.
Tim Dettmers says
The algorithmic complexity for a discrete convolutional operation is given by the product of all dimension of the kernel and the propagator/feature map; more concretely, a linear convolution of a propagator of NxM and kernel mxn has complexity O(NMnm); since we have a three dimensional convolutional which I assume to be independent (it was pointed out that they are not independent in fact, so my model would be a simplification) the complexity is given the product of all tensor dimensions of kernel and propagator/feature map — that is what I am using in my model.
Currently, I do not know any sources which give reliable estimates of dendritic computation per time-slice — if you find any sources on that please let me know. Please also take into account that dendrites process also information when they do not fire, just like neurons do, so dendritic firing rates (which have not yet been estimated as far as I know) do not give the complete picture here. But again, if you have more information, please let me know.
Herbert says
Ah, so you have a 1e6 feature map (synaptic input) and a convolution with a 1e5 kernel (dendritic branches), pooling into one output (axon)? That would not really reflect the dynamics, though, since one synapse is (for the duration of a time step at least) fixed at its position on one of the dendritic branches and it only directly influences the membrane potential there. Your justification for convolution seems to be that there is diffusion happening, but for modeling diffusion you need a kernel specific to your PDE (e.g. gaussian) which is independent of the locations or number of synapses. (And wouldn’t really work because of the connectivity structure between branches… One could probably do a time discrete model by multiplying the vector of the 1e5 dendrite compartment states with a (sparse) matrix reflecting connectivity among them.)
Tim Dettmers says
Yes, this is problematic in my model, but I cannot model the diffusion as a continuous model and also make it comparable to deep learning and digital computers at the same time (which operate in discrete time steps).
In a continuous model you do not need a specific kernel as it is given by the solution to the differential equation. Here I simply assume that the solution to the differential equation is also a good fit when you discretize its solution, that is the continuous convolution. The other problems that you mention would also vanish in a continuous model. So why do I use a discrete model?
While I give up on a continuous model and thus introduce these problems, I gain the possibility to compare my model against deep learning and computers. It is ugly — but I do not think there is any better way to get deep learning, digital computers and neural computation into a single model. We simply cannot model continuous computation by discrete computation with accuracy (by definition), but I think my approximation is fair.
Mike says
Herbert, complexity of a brain is result of complexity of a body under it supervision, and not necessarily connected with brain functionality. So, a control for an artificial system could be few orders of magnitude simpler, although carry the same functionality.
The main problem is in knowledge what a brain does in the body. Let say, you will get a fully functioning artificial brain in possession. Could you explain how that artificial brain will be used?
Randy Jones (@Randy_AKA_Caro) says
Just a quick note to say thanks and well done. You have laid out a wealth of information here with clarity and with patience. When I encounter people worried that a sudden outbreak of sentience is a real danger, this post is where I send them. Maybe I’m optimistic, but I think any interested lay person can take in enough here in an hour to understand just how uninformed many of the claims of the singularity folks are.
A whole similar post could be written on the “software” side of things. Even imagining we have the necessary hardware, current software engineering practice provides no help in priming such a system with a dynamic state that can acquire more consciousness.
I think the comments directly above about the gut and embodiment point to an important piece of the puzzle. We are all born needing help from others to eat—in learning to get that help we end up building mental models of other minds, and then in turn, of our own.
If you have an hour sometime, Gerry Sussman brings up some great related points here: http://www.infoq.com/presentations/We-Really-Dont-Know-How-To-Compute
Mike says
Randy, despite the good work, author have misleading opinion, did not supported by factual knowledge.
Read my posts, pleas, and remember my predictions: All projects aimed on modelling of “mind” will fail due to lack of scientific basis.
Best regards, Mike
Michael Zeldich says
The problem is not in technical abilities, instead it is in understanding of the functionality of a brain and in methodological problem. Methodology, where a brain is subject to study without taking in account a body, which it belong to, is misleading.
I could predict that all current attempts to understand of the brain functionality, as an isolated organ, will fail.
Best regards, Mike Zeldich
Tim Dettmers says
This is very true. It was shown that neurons in the gut (more neurons than a cats brain), and the microbiome of the gut are important for information processing the brain. Eventually, we need to also understand this, but I think trying to understand a few isolated parts of the brain is a good start.
Bart Simpson says
Its not about science, or math, or understanding of fundamental physical principles…
It’s all about ego, arrogance, ignorance, self infatuation (like the article above), and of course the promise of big money…
Thousands of idiots all going,… blah blah blah….
Tim Dettmers says
I think that you are right that no science, not even math or physics are totally objective — there is always some ideation at work (or in your words self-infatuation etc.), which wants to twist things into the worldview of a certain person or group of people (e.g. Einstein: “God does not play dice.”). However, I do not think this undermines the validity or usefulness of science, or this article. The theory of general relativity or the dangers of smoking were heavily rejected at one point, but adopted later, because science is reinterpreted with each iteration of newly arriving evidence. The evidence in this article is laid out open for criticism and scrutiny — and indeed this article has been criticised and scrutinized.
The purpose of this article is not to say that everything is as I describe it here — I believe that under the current evidence one cannot fully say that everything works as I described it here, but you also cannot really say that it does not work like this in general. This article provides new ideas which help to combine neuroscience and deep learning into a coherent framework which is based on moderate (but not strong) evidence. Although this approach has its flaws I am very positive that this article is quite useful for many people and researchers.
Omar says
Hey Tim,
This was a great piece. I wanted to ask some advice. I am currently a self taught software engineer, and have been working for a little over two years. I am going to be pursuing a Masters in CS, and am interested in focusing on AI. The struggle I am having is not knowing if i like the idea of AI vs liking the actual work of it. I know I really want to work with some kind of augmented intelligence. The work Quid is doing is pretty awesome, and if I can help in building Jarvis (from Iron Man), that would be pretty damn awesome.
I know we are a long way from that, but the advice I wanted to ask you is if I should pursue masters right now, or wait a little and tinker on my own for a bit. I want to go back to school to be in the environment of learning, to be surrounded by like minded people, and to spend my entire time enhancing my knowledge and skills (instead of doing it part time with work). I want to play with AI for a bit before i commit to doing a masters in CS with a focus on AI. Any advice on what material to go through and when I would feel comfortable knowing if i want to pursue this or not. Also, what is the reality of working in AI? I hear its a lot of math. I was told there is math in programming, but its really more logical thinking than doing math theorems. Is working in AI really about algorithmic design? And if so, is algorithmic design all about math theorems? I guess what I am looking for some more light to be shed on the actual work of AI than the hype that is filling the industry.
Thanks man, I appreciate it.
Tim Dettmers says
It is difficult to give you concrete advice, Omar, because the right thing for you is dependent on many factors. However, I can give you a sneak peek of how it is to work with “AI” algorithms.
If you would be working on something like Quid, then your job would be similar to a software engineer (the feeling is similar), but the workflow is a bit different. You mainly spend time with cleaning data (+80%) and spent time in discussions/meetings/presentations (10%), while only a small part is dedicated to algorithms (and most often deep learning, or “AI-like” algorithms will not be used here). If you want to gather some experience how this work feels like, I would suggest you sign up with http://www.kaggle.com and start some start competitions there (make sure to read the forums to get starter code). You only need a master of science degree for this (best would be a degree in data science).
For something like Jarvis, you will need a PhD in deep learning, and will work primarily on research in a private lab, or at an academic institution. Regarding mathematical ability, you can read this answer on quora. The work in research is highly competitive and you are expected to make scientific contributions regularly; if you do not succeed, you have to leave academia eventually. In order to prevail, 60-70 hour weeks are not uncommon and in the end only few make it (< 5%) to become successful researchers, who can spearhead endeavours like developing Jarvis-like AI. The usually work will be to work on ideas which could be significant scientific contributions where most of which will fail. So to prevail at that you really need passion for this kind of work.
Hope this gives some insights into these branches of work.
Abelard Lindsey says
Nice paper, Tim. Its part of the reason why I do not expect AI in the foreseeable future.
You nicely describe the hierarchy of different memory and processing mechanism of the brain. This will be very difficult to model with software.
I am not a neurobiologist. But I know enough to realize that brains are very different from semiconductor electronics. One thing about brains is that they are structurally dynamic. The dendrites continuously remodel themselves (while you are sleeping). Another thing is that there are 206 different chemical types of synapses. Semiconductor electronics does not capture this dynamism at all.
One other reason why we will not get AI anytime soon is the software. There is no Moore’s Law progression in software. This “deep learning” stuff (which I’m trying to teach myself by using Python language) represents the only significant software innovation since the development of the high level programming languages such as “C” in the late 1960’s. Software seems to have these discontinuous jumps of innovation followed by long periods of stagnation. Yeah, hardware will get to brain equivalent in about 50 years. It will take centuries to get there with the software.
The other issue is semiconductor fabrication technology (something I do have experience in) of deposition, patterning, etch will reach its limits likely at the 5nm, maybe 3nm, level in about 10 years. A new fabrication technology based on molecular self-assembly and growth will be necessary to breach this limit (and reduce the cost of the fabrication facilities themselves). This technology will likely be developed and will take us down to the molecular level which will be an absolutely hard limit. Depending on the size of the molecules used (bio-molecules are quite large), this could be anywhere from 3 nanometers down to half a nanometer. The issue then becomes scaling up the structure to the size of a human brain. This is enough work to keep all of us busy for another 50 years.
Instead of the obsession over sentient AI. It is much for fruitful (and profitable) to focus on improving techniques such as machine vision, decision making, and motion control necessary for useful robotics and improved automation.
J R says
3nm… Do you really how small 3nm is? A synaptic vesicle has a diameter of about 40nm, the active zone of a synapse has a diameter of about 300nm. Also remember transistors are several orders of magnitude faster than the operations in a chemical synapse.
Abelard Lindsey says
Yes, but can we really duplicate all of these functions of the brain using semiconductor electronics, even those at 3nm design rule? The chip would have to be scaled up to be the size of a human brain. No one is proposing 3-D scaling on this level in the foreseeable future.
Even if we get the hardware to make a brain-equivalent computer in the next 50 years, the software is going to take 2-3 centuries at least. There is not and never has been any “Moore’s Law” progression in software. Even most software/AI people think this.
http://www.overcomingbias.com/2012/08/ai-progress-estimate.html
J R says
You don’t have to build the whole brain in one chip, you can use multiple chips and multiple machines. Also, Samsung is already using multiple layers of transistors in their new flash memory.
Software cannot be used as an argument against human level AI, because the brain doesn’t have software in the traditional sense, there is no comparison. If the brain can be said to be “programmed”, then it is “programmed” by experience. We have plenty of real or virtual experience to offer to any human level AI, and we also have a lot of expertise on teaching other humans which may be applicable to human level AI.
Tim Dettmers says
The problem with multiple chips/machines are bandwidth problems, which currently dominate computing performance. 3D memory will help for the next few years, but after that the bandwidth problem remains, and it is unclear how we will overcome the next hurdles after 3D memory.
The brain is programmed by DNA; different DNA yields differently programmed brains.
J R says
Axon’s bandwidth is not that great if you take the slow AP firing rate into account, I think its bandwidth per area is comparable to a normal Ethernet cable. I’m not sure what you mean by next hurdle, but if we can reach 5nm, I think that should be sufficient for building brain like computation devices.
The genome does provide the initial configuration of the brain, note though the genome is not that large, it can fit into one DVD, we have a lot of software that are bigger than this. And only a small part of the genome is for the brain, and a tiny percent of that is human specific.
Tim Dettmers says
Do you really think you can compare the bandwidth of neurons with the bandwidth for computer chips and computer networks? In computers, the main costs stems from moving data rather from computing data; smaller transistors will not help much to alleviate this problem, as RAM does not scale as well with smaller transistors and network interconnect do not really profit from smaller transistors either.
I agree with you, that computer code can be more complex than DNA code. But it was not my intention to compare DNA to computer code, just to point out that the brain is configured in a certain way even before experience.
J R says
You may have trouble matching brain’s bandwidth if you continue to use von Neumann architecture, I’m not sure. It would need some detailed analysis, taken into account brain’s connectome (it favors local connection over global connection, just like our computer/network) and network protocols (our interconnect is designed to meet 1000 pages specification, you can probably speed it up quite a bit if you simplify the protocol to carry AP and loose some error correction checks)
But that’s not what I’m looking at with a back of the envelope, first principle estimate. I’m looking at what we can achieve using near term semiconductor industrial capability if we design our device to be like the brain, looks to me we’re pretty close to nature’s capability.
Tim Dettmers says
I think we both agree that we disagree. I suggest that you write a blog post similar to this one where you derive your own estimate. This way you can show me that I am wrong about what I wrote.
J R says
Let’s agree to disagree then. I don’t think I know enough to write a blog entry, and I’m not sure how useful it would be. Human level AI won’t arrive early or late because of one blog post, it may however arrive early because people spend time working on it, which is what I should be doing.
Tim Dettmers says
Well said and I agree that this blog post is rather irrelevant for progress on human level AI. As you said, the best we can really do is to actively try to bring about human level AI — I will also try to do my part in that. Cheers!
Mike says
Tim it is not true, nor necessarily. One have to design a device with a brain like functionality, not replicate of a brain.
Actually today science did not provide a basis for such development, and there present the methodological problem. The functions of a brain cannot be understood without taking into account functions of a body as medium between the environment and a brain.
Tim Dettmers says
I agree. But while we do not have to design a brain-like device to attain brain-like functionality, a brain-like interpretation of our current devices could be helpful to gain insights into how we can improve them — mimicry is often simpler than creativity.
Mike says
Tim, let say that you, or any body else, will find a functional model of a brain on the table. How that model will be used, if its functionality is unknown?
Tim Dettmers says
I think when you put down the full model of brain not much would happen right now. We do not know how to integrate the brain’s model with common data science problems like object recognition and it would take a long time to get there even if we would have a whole model of the brain.
But I think over time, when these two fields intersect more and more, we will have a better picture of how to make use of such information related to the brain. I think it is important at this time that we say that understanding of the brain could be useful for deep learning. I do not think totally ignoring the brain is wise, but I also do not think totally following the brain 1:1 is wise. I think we can learn something from the brain, because obviously it get a lot of things right — and this is my whole point.
Mike says
Tim, so you are agree that result from investment in modeling of a brain are next to zero.
However the situation is not hopeless, I do have the understanding what a brain did, how a behavior of a live creature determined, and how to approach to design of an artificial subjective system on the factual basis.
The problem is in absence of sufficient funding and team for accomplishing that task.
The subjective experience, which is the basis for determination of the behavior, could be belong only to a subject. So we are talking about design of an artificial live person.
That is only way possible to have a super intelligent artificial system, capable to work by itself, without any further programming.
We are could discuss that opportunity, if you wish, there: Skype – Subjective1 (New-York), or szeldich(at)g..il(dot)com
Robert Bynum says
Language is an artificial intelligence. You use far far less bits to transmit intelligent concepts and build new meanings from experience. You do not have to transmit every single synaptic state to another person for them to understand your thoughts.
Tim Dettmers says
Language are the symbols in written or spoken form which are able to create a thought in your brain that is similar to the thought that I had in my brain, but most of the information is not contained in the word but in our mental representation. The word is just a cue that induces the thought.
If I say the word cat, then the word itself contains only a few bits of information, but we humans have very rich associations with that word because we have a deep understanding of what cats are, what they do, what they desire, what they do when they are angry, what they do when they enjoy themselves etcetera.
A cat expert’s mental representation might be filled with hundred of pages of unique information when put on paper, while a child that learns and understands the word cat for the first time has a very limited mental representation of what a cat really is — so words can neither be associated with a computational cost at the neurological level nor do they themselves signify intelligence.
If we want machines to understand language as we do, words are not enough; we need a full mental representation of the world and common knowledge — thus real language understanding is already strong AI.
jms90h5 says
One of the best pieces I’ve read in quite a while! Thank you so much for taking the time to share your thoughts.
As I was reading through your discussion on the physiology of the brain, the thought that kept coming back to me was how well that relates to our current understanding of complex adaptive systems and emergent behavior. I’m very much looking forward to your future installments and would offer a suggestion that perhaps in your third installment you present some discussion on emergent behavior. From my perspective the basic operation of a system, whether biologic or synthetic, isn’t where the “magic” happens. Certainly those underlying mechanisms are important to understand, but IMO the threshold of “intelligence”, (and potentially consciousness, although one does not imply the other), will only be crossed as an emergent behavior. Your article makes many excellent justifications pointing to an extremely high level of complexity required for something like consciousness to appear.
Claudio says
Hi,
If all you have wrote is correct or close to be correct than how can you explain something like this? -> http://www.rifters.com/crawl/?p=6116
Tim Dettmers says
There is little evidence to reason about this case, because such cases are rare, but I have a go at it anyways.
First of all, there are two important facts: (1) Patients with hydrocephalus often start out with brains that have normal structure (often the structure of a developing child), so everything is already in place; (2) patients with hydrocephalus still possess most of their neurons, which are situated mostly at surface of the brain. So hydrocephalus will mostly destroy the connections between brain regions rather than the neurons (numerous neurons die too but it is not so severe as the connections between neurons which lie mostly beneath the surface of the brain).
When hydrocephalus expands and squishes everything slowly to the edges, many connections are destroyed, but the overall connectivity within a brain region remains; thus functioning is only impaired rather than destroyed. When the hydrocephalus expands slowly, the brain will have quite some time to adapt to this. If it is slowly enough, a human might survive this and still function relatively okay (like in the 75 IQ case), but significant impairment besides a low IQ should exist.
However, to make it possible to have an IQ of 130 while having pretty much most of your brain wiped out by hydrocephalus you will need to have the right genes which enable you to develop extensive connections between brain regions. Dyslexics often have these genes to make extensive connections between brain regions, but less connections within a brain region; for autistic people this is generally the other way around (this is also why savantism is more common in autistic people). However these connectivity patterns are not global.
Einstein’s brain for example showed strong connectivity between brain regions, while some brain regions important for his thought in mathematics and physics additionally showed strong local connectivity. For example Einstein often thought in “muscular” terms when thinking about physics, because he had highly increased density in the brain region responsible for tongue and lips which was tightly connected to his left dorsolateral prefrontal cortex which is the brain region which is most important for cognitive reasoning.
Similarly, if a person had genes for strong connectivity between brain regions, then important functional pathways could stay connected even with extreme hydrocephalus, thus normal social functioning would be possible. But even then I guess a high IQ could only manifest if strong local connections are made in brain regions important for cognition. In the 130 IQ individual with math degree, this might have been similar to Einstein, some regions that are useful for doing math and cognition had increased local connectivity.
Charles B. says
Congratulations for this excellent blog post. It’s indeed one of the best I’ve ever read about the topic!
You raise a lot of very good points and give a lot of thought material.
In my opinion, the most interesting one, still in shadows of the general public debate of the problem of AI, is the one you raise about meaning and about how we don’t know how of what really the brain learns. We have only second-hand rapresentations, filtered by our own way to see and describe in a coherent way how the world behaves. But, at the same time, we can’t totally detach from our own inner representations and from what we believe and what want to achieve with our research (Kuhn’s brilliant research on the scientific method and it’s paradigms come to mind).
This is, in my opinion, the main reason for which we can’t obtain a full strong AI but, at best, an instrument to replicate part of our own thought patterns, without being able to create totally new meaning.
It’s very fascinating to dive deep in the scientific research and discover how, the deeper we go, the lesser we discover to really “know”. And also it’s very interesting to see how the human mind reasons for patterns and processes in building it’s model and representation of “reality”.
I’d really like to know your opinions on the matter. Again, congratulations and thanks a lot for the great article you posted.
Tim Dettmers says
I also think we cannot replicate the brain in its details, but I also think that it is not necessary to do all of that. I think the intelligent algorithms that we will develop in the next decades will be quite significant on their own, even when they differ from the brain.
However, I also think insight into the brain’s algorithms and functionality will yield the insight that we need to come up with these good, but not brain-like algorithms.
Andreas Geldner says
Hi, I find your article fascinating – even if I am not an expert in your field. I have linked to it on my blog worldwidewirtschaft.blogspot.de and have tried a (German language) summary to encourage my readers to delve into the text, which despite its clarity needs a little bit of effort. I hope I caught your intentions with my summary. Any improvements are welcome.
Tim Dettmers says
This is excellent, thank you!
Takayuki Muranushi says
> Since linear convolution over two dimensions is the same as convolution over one dimension followed by convolution over the other dimension, we can also model this as a single 3×4 convolution operation.
Moreover, we can store the results of convolution over one dimension on memory, then perform the convolution over other dimension. This will cost 3+4 computation, not 3×4, so the latter would be overestimate. I believe at least we should separte the temporal dimension, since the number of temporal convoluters should be limited to the number of such memory. Such memory spots, I guess, biologically correspond to the spots where “charged particles may linger for a while.”
Then our estimation:
10000[synapses/neuron] × 5[soma temporal convolution] × 5[branches/dendrite] × 50[dendrite/synapse] × 5[dendrite temporal convolution] × 8.6e10[neurons/brain] × 200Hz = 1.075e21 flop/s
will be like this:
(5[soma temporal convolution] + 10000[synapses/neuron] × (5[dendrite temporal convolution] + 5[branches/dendrite] × 50[dendrite/synapse])) × 8.6e10[neurons/brain] × 200Hz ≒ 4.4e19 flop/s
I’m not sure where the temporal convolution terms should go, because I’m not familiar with the biological neurons.
Thank you so much for this elaborate article. This is so interesting as a scientific article, and gives us a lot of hint towards how we can design and improve learning algorithms.
Best regards,
Daniel Smith says
> Since linear convolution over two dimensions is the same as convolution over one dimension followed by convolution over the other dimension, we can also model this as a single 3×4 convolution operation.
Only true if the convolution is linearly separable!
Tim Dettmers says
Thanks for this correction. I will look into the consequences and will update the math and argument.
Mike says
Comparing on the basis of computational power and complexity is interesting but ignores another dimension of the issue. AI and ML are misnamed because Turing machines aren’t intelligent and don’t learn. They are adaptive pattern recognition algorithms. Searle’s Chinese Room and Goedel’s Incompleteness Theorem suggest human minds are more than Turing machines. However fast or complex computers get, so long as they are Turing machines they will never be capable of understanding. I wonder if if would be posible to build a machine that doesn’t just calculate that 2+2=4 but understands what it means. If it exists, whatever it is, it would be more than a Turing machine and something that might truly be comparable to a human mind.
Tim Dettmers says
Maybe I will add a short paragraph on this in an update, thank you!
Indeed, this might be hurdles what might not be overcome. However, it might not be necessary to overcome this in order to have an useful intelligent agent. AI will have a significant impact on everything we know for sure, even if AI is only a tool (like google search) rather than an agent. And we do not have to wait until 2080 for that — big changes will probably come within the next two decades.
Arthur Breitman says
Godel’s Incompleteness theorem suggests nothing of the sort. There is no evidence that the human brain can decide the truth of indecidable propositions, nor is it clear what that would even mean. This is a modern god of the gaps in disguise.
Searl’s Chinese room thought experiment only demonstrates Searl’s inability to conceive that a complex system can has conscious thought. It’s pure crap, on par with the phlogiston and the “elan vital”.
Mike says
You express a lot of confidence for topics that remain undecided and the center of one of the most lively debates in philosophy and technology.
Doug says
Excuse me? Searle’s Chinese Room thought experiment is many things, but comparable to phlogiston, or “pure crap” it is not. I suppose someone who doesn’t know how to spell the man’s name and uses “can has” in public can’t be expected to know what he is talking about. As penance, you are to spend a year looking after toddlers. If at the end of that year, you still imagine that human behavior is entirely computational, there is no hope for you.
Daniel Waltrip says
Your comment contains several dozen words but somehow fails to allocate a single one towards defending your proposition that the Chinese room experiment sheds any light on consciousness or intelligence.
Rodrigo de Almeida says
As for Gödel’s theorems, I’d say the first shows that Principia Mathematica principles, along with some good assumptions, yields a formal system S where is possible to “construct” an undecidable true proposition G, in the sense of being “self-referential” stating that itself can’t be proved (or a strong-enough system to contain PA).
There is no possible conservative enlargement of S, say by adding G as an axiom, such that S will be both correct and complete.
The second theorem says that no such S can prove its own consistency. On the other hand, a “curios” fact is that by using transfinite induction (I’m being deliberately vague here) one can have a powerful-enough-system S’ to prove S consistency, assuming S’ to be consistent. Now, while saying that “Goedel’s Incompleteness Theorem suggest human minds are more than Turing machines” is indeed either wrong, or too vague to have meaning, it is indeed possible to “step outside” S and prove its consistency on some rational basis, let’s say. Alas, we have to assume that S’ is consistent.
I particularly find clearer the way Tarski does everything (although it is to prove another very close result, undefinability of truth theorem). Language\metalanguage distinction, being more perspicuous where the machinery is being enriched, where the assumptions go, etc.
Mark Gubrud says
The problem with the Chinese Room is that it can’t exist. Let’s assume per convention that it must pass the Turing test. It must be capable of responding convincingly in any conversation of any length. But this is impossible if it is faking intelligence. For any Room capable of responding convincingly to any conversation up to length n symbols, conversations of length n+1 wil break it. However the required size of a look-up table to handle any conversation up to length n is some exponentially godawful exponentially exploding function of n; if you think about the books in the Room they must fill a Solar system if they can handle a five-minute conversation, and would collapse into a black hole before reaching the required size of an hour’s library. And of course, the poor soul condemned to fetch the response would require years to find even the first one, so we’re really unlikely to fool the judges with this.
Now, it is argued that the Chinese Room need not be a look-up table but should actually compute its responses in some way. Actually, any computation that yields the same results as the table can be considered a compression of the table, and some suggest that the compactness of the compressed representation can be considered a measure of intelligence. But I would argue that there is only going to be one kind of computation that will yield a really efficient compression, and that is something isomorphic to an “actual artificial general intelligence” whatever that turns out to be. No way is Searle going to be able to do that computation by hand in real time (to say the least). But if it were being done by a suitably fast processing substrate… well, that would be your artificial brain.
calvin says
This is one of the best descriptions of the computational problems of intelligence and brain modeling I have ever seen. I work on machine consciousness and fret about these very things. The computational requirements for human level cognition do look enormous. But, I have three critiques of your conclusions… well, more hopes than critiques actually.
1) Brain size may be largely irrelevant to intelligence and consciousness. eg there are examples of high functioning individuals with hydrocephalus who have much smaller than normal sized brains: https://www.newscientist.com/article/dn12301-man-with-tiny-brain-shocks-doctors/ The brain size necessary to achieve general intelligence may be substantially smaller than we think. An elephant’s brain is over 3 times larger (both in volume and neurons), but we don’t think of elephants as being more intelligent than humans. Bird brains are much smaller and different than human brains and birds engage in a whole host of useful, intelligent behavior as demonstrated by parrots and crows, including memory, pattern and object recognition, performance of complex tasks, and even some language. Hopefully, brain size, based on number of neurons, poorly correlates to intelligence.
2) How much of the brain is actually dedicated to what we could call the representational problems? It could be that we are not that intelligent at all, in terms of mass to cognitive power. And there is good evidence for this. The child prodigy has no more brain mass than an ordinary child but a bit more than a child with down-syndrome. But we know it’s not the size of the brain of the down-syndrome child that seems to matter, it’s how the cells themselves are structured and behave. This is almost certainly true for the child prodigy. What matters about brains isn’t volume per se, but cell structure and function (you cover this very well btw). If we understood how the structure and interaction of neurons (and other brain cells) produces cognition, we may discover that intelligence could be achieved a whole lot more efficiently than how it’s currently done in brains.
3) Even though we are approaching silicon and lithographic limits of processor manufacture, we are not necessarily constrained to those approaches. The brain is a bunch of molecules, proving there is no theoretical reason we could not construct other kinds of molecular machines that reproduce the cognitive functions of the brain that do not rely on our existing processor technologies. Other approaches may let us get further down the road of Moore’s law without the costs you suggest.
Otherwise, I think you’re analysis is correct…. I just hope it’s wrong.
Tim Dettmers says
1. Brain size may indeed be not too important. Einstein hat a smaller brain than an average human, and Neanderthals had larger brains than we have. However, note that elephants and blue whales have larger brain, but not more brain cells and connections (brain size, and the number of neurons/connections is only proportional within an species). I think the intelligence in birds is an important point you make; their brains are not only small but contain relatively few neurons/connections if compared to their intelligence.
2. The connectivity is most important for intelligence. For example, the connectivity to the left dorsolateral prefrontal cortex explains roughly 20% of intelligence alone. People with autism, which on average have lower than average IQ, have strong local connectivity, but poor connectivity between brain regions; this is even more extreme in savants. This evidence would mean, that it is much more important how we design our networks rather than how large they are. So yes, we might be able to improve upon the brain, but it is unknown by how much we can improve and it makes no sense to derive an estimate. So I will leave it at that.
3. Indeed, there will be magnificent technology only a few decades down the road, but right now these systems are not practical and thus I cannot calculate estimates based on this because I do not know how they perform on deep learning. I have to assume that exponential growth in computing does mean “we will use new technologies as they become available” and that new technologies will scale in a similar exponential manner — this is crude, but this is the best approach that I know of to make estimates.
Juan says
Great article, I have save it to read it more carefully. Just a note that you may be aware by now, but for possible future readers: The number of neurons in bird brains has been proved to be equivalent to the number of neurons of primates, specially in the forebrain: http://www.pnas.org/content/113/26/7255.full
Tim Dettmers says
Thanks for the link! Another interesting thing is, that birds are also the only species which can exceed the neuron densities of humans.
Brent Hatcher says
Since birds are descendents of dinosaurs maybe that is one of the reasons they ruled the Earth for so long.
Tim Dettmers says
It could be a reason, but also note that the density of neurons is not that important if your energy intake is the limiting factor. Humans would be able to power more neurons without starving, but a babies head cannot grow larger because then it would impossible to give birth. With eggs, dinosaurs would not have that problem. So if they somehow would have invented fire and cooking they might be much smarter than us because (1) of the density of neurons, but also (2) in eggs the head size does not matter that much.
ML says
You mention: However, the direct data gathered by those few teams was enough to establish dendritic spikes as important information processing events. Due to the introduction of dendritic spikes into computational models of neurons, the complexity of a single neuron has become very similar to a convolutional net with two convolutional layers.
I am not sure I see where the exact equivalence is found. You say you relate the single parts to equivalent parts in a convolutional net, but I don’t see that. Can you help me pointing out where exactly that is done? Thanks!
Tim Dettmers says
The diffusion of particles (information) can accurately be modelled by convolution or by a system of nonlinear differential equations. Previous models only recognized such processes within a neuron (approximate differential equations followed by a non-linearity to yield a firing rate). If you take into account diffusion in dendrites you receive two firing rates and equivalently two systems of nonlinear equations (or two convolutions). Usually nonlinear differential equations are chosen to model this, but we can replace these equations with convolution (the solution to circular diffusion from a source can be transformed into a convolution, the value of the solution does not change, only its representation) and receive a system which is much more like a convolutional net.
BitCuration (@bitcuration) says
The analogy of bird and aircraft is a good one and well know. I suppose you can say the same for a modern personal computer from the level of metal material, physics of what constituted the motherboard, memory chips, or cooling fans for that matter. But without understanding the Von Neumann’s architecture it really doesn’t matter how much you gain knowledge of that to help making another computer from scratch.
In that vein, this entire article is trying to explain how motherboard works at electron level. If NN is the equivalent of logic gate and motherboard design, then what is missing is really the computer OS and Von Neumann.
Tim Dettmers says
I will write more about the bird and aircraft argument in an update.
As for electrons in semiconductors: They do not contain any information because their flow is just random (this can be modelled as a random walk). The flow of particles in dendrites is not just a random walk, but a random walk along a concentration gradient which is maintained in distinct patterns by the cell (this can be modelled by either convolution or equivalently partial differential equations).
You do not need partial differential equation to model the flow of electrons in semiconductors to make predictions about its flow. A model build on variables for the electrical conductivity of the material and time will be an excellent model for the flow of electrons in a semiconductor.
Thus, if I take into account the diffusion of electrons in semiconductors, it will add zero computation; however, it was shown that dendrites are not passive cables like, but do meaningful computation that contributes significantly to information processing. There are dozen of papers about this now; but just have a look at the Nature paper in my reference list if you have any doubt about this.
Christopher Nguyen says
Excellent article and set of arguments, Tim. I appreciate the concrete exploration from nearly first principles.
Now, this particular set of comments about electron flow in semiconductor is quite incorrect. The exact opposite is true: you can’t model semiconductor currents without considering both the drift and diffusion components, and it is precisely the concentration gradients in, e.g., depletion regions in p-n junctions (and it’s the junctions that make semiconductors interesting at all) that account for electron/hole flows. You may be thinking about conductors or reverse-biased junctions for which diffusion current is negligible.
However I do sympathize with the argument, if so implied, that there isn’t anything that automatically dismisses the validity of a too-detailed model, e.g., of “a motherboard at the electron level”.
Cheers.
Tim Dettmers says
Thank you. What you say is very true. I stumbled over my own feet in that argument. However, I think my main point stands, that the diffusion of electrons does not contain any information which is used in computation of digital hardware.
But I see your point though: If I say that diffusion in neurons is controlled information processing, where pumps and tempo-spatial integration produce certain output for certain input, you can also say that switching certain transistors according to certain input produces certain output. Indeed, dendrites are often thought to be akin to a transistor. However, one must not forget that the unit of information for a transistor is a bit, while the value in a neuron is analog (dendritic spikes) or multi-valued (neurotransmitters).
I will look at that argument again and will see if I can derive a separate meaningful estimate by viewing a dendrite as a transistor-like processor.
Christopher Nguyen says
Re digital vs analog and transistors: there’s also nothing that fundamentally prevents a transistor from being used in analog or MVL mode (other than desired precision); indeed they were primarily used in analog mode before digital, and indeed there are neuromorphic efforts using analog units. They are not at the 7nm node but rather at 1-2 orders of magnitude coarser, due to the precision-size trade-off, hence not (yet) computationally interesting at-scale. But that is going off on a tangent.
I think your arguments at the scale challenges at the architectural level are reasonably sound (subject to more back-of-the-envelope calculations of my own) without your having to invalidate the possibility of a Si-based neuromorphic computing unit—if that’s the concern.
Witali Dunin-Barkowski says
Witali Dunin-Barkowski 6:13 PM:
Your table in section “Estimation of cerebellar input/output dimensions” yields wonderful 10^11 for the nuber of neurons (correct, this is the number of granule cells) and 10^21 for the number of operations/sec. This gives 10^10 ops/sec for the granule cell. Indeed, the granule cell sometimes can fire with the freqyency up to 500 imp/s, but very rare. The exact number is not known, but the average frequency of granule cells firing is in the region of 10 imp/s. With 5 incoming and up to five hundreds outcoming connections these numbers in no case will result in
10^10 ops/sec. I would evaluatre that in about (505 synapses)x(10 imp/s) ~ <10^4 ops/sec. This is much more realistic, and can be attained in artificial systems.
Witali Dunin-Barkowski 7:40 PM+1:
"An average neuron in the cerebellum has about 25000 million synapses" – are you sure (misprint?)? If you count granule cells, they have ~500 synapses (almost all – outcoming). Purkinje cells might have up to 200 000 synapses (much more incoming than outcoming).
Witali Dunin-Barkowski 8:22 PM+1:
"The neurotransmitter (2) binds to a G-protein which then produces a protein" – alas! You are not right.
In fact, almost all synapses have ionotropic and metabotropic receptors at the postsynaptic side. Transmitters connect with approximtely equal chances with both types of receptors. Activation of ionotropic receptors leads to electrical excitation/inhibition interactions. Activation of metabotropic receptors starts signaling cascades processes. Both metabotropic and ionotropic processes flow in parallel, although the latter are much slower than the former. Because of these inaccuracies of handling the facts of experimental neurosciences I would not advise your readers to take seriously your conclusions regarding possibility of impossibility of the singularity.
Tim Dettmers says
These are indeed some errors that I made, and I will fix them in the next days in an update. Thank you.
gigi says
As another commenter above pointed out, using general purpose processors (CPU or GPU) to mimic neurons is highly inefficient.
IBM already has a chip (IBM TrueNorth) with hardware neurons and they claim it’s 10000 more energy efficient than simulating them. Of course, their neurons could be very simplistic ones (without the convnet function for example).
As deep learning become more important, I would expect more and more of it to be moved to hardware (as video encoding/decoding for example moved from software to hardware)
Tim Dettmers says
As I pointed out above, currently we have nothing better than use make estimates that make sense for the comparison with digital computers.
Neuromorphic chips as they are designed now will unlikely be used for deep learning in the future. Currently, the IBM chip only reaches 15% error on MNIST, which is worse than linear methods. This also demonstrates that simple models of the brain are unlikely to work (such chips often use a version of the integrate-and-fire neurons, which are also featured for the estimate of Ray Kurzweil).
Also see this post by Yann LeCun on neuromorphic computing.
Jayeson Lee-Steere says
In the same post you linked to, LeCun describes a conv-net implementation that would be as power efficient as TrueNorth in the same 28nm. If I did my math right using your numbers for brain computational complexity, that is just a couple of orders of magnitude away from common supercomputer power levels. See also: http://www-micrel.deis.unibo.it/~conti/wp-content/uploads/2014/10/paper_hwce.pdf
I don’t share your memory bandwidth concerns either. The high bandwidth requirements caused by the high connectivity in neural nets are localized – the biggest gift in bandwidth optimizations problems I can recall. Certainly something custom chips designs could (and appear to) deal with.
I will also note, the brain’s level of computational activity is localized at any given time, and neither a cerebellum nor both halves of cortex are required to be considered to have human level intelligence.
Might I suggest that the article is a bit pessimistic – taking the worst possible computational cost of an extremely noisy biological system, then looking at running it on an extrapolation of current fault-free hardware that has the wrong architecture. I realize you consider it a lower bound, but it is a lower bound that assumes every operation is essential, all the time, and that its value can’t be implemented in some other way.
Tim Dettmers says
1. You can run convolution fast on this software, but a modern convolutional net will not run on this hardware (otherwise somebody would be using them). Also do not forget that this hardware is more or less experimental like memristors-based, analog and quantum computers. Currently we cannot use them.
In the future we might use such components, and this might be ? I put in the supercomputer graph after the year 2020. Such technology will help to growth computational power and to some extend bandwidth.
2. Please read my other blog post about the parallelization of deep learning is you have any doubts. I developed deep learning systems which run on GPU clusters and optimized the runtime by quite a bit. Soon I will publish a new compression technique which will set the state-of-the-art in model and data parallelism for convolutional networks. So when I say that bandwidth is the problem you can trust me on that (or read my other blog posts if you don’t trust me; read the comments too).
3. I do not think it is pessimistic to show that our current systems are close to the brain. We just need to do what we do not, add in some hierarchical architectures and wait for computers to get fast enough and we will be in pretty good shape for general artificial intelligence! I would call that pretty optimistic!
Of course with pessimistic you mean the growth in computational power. But here I present you not with opinion, but with data and analysis from experts. Just have a look at the references, three differences HPC researchers which work on some of the fastest supercomputers in the world agree that the growth supercomputing is in trouble. I would not call this pessimistic, but rather realistic.
J R says
The part comparing a single neuron to an ANN is very interesting, however I’m not sure about the later part where the flops between brain and high performance super computer are compared directly. It seems to me the brain and computer are using completely different optimization strategies. It is true that brain has a huge number of nodes that can do information processing, and it uses very little power, but at least from my limited understanding, it is also very inaccurate due to its analog nature, i.e. if you give a neuron the same input you don’t always get the exact same output, this is why you have some random terms in the simulation. Computers on the other hand are very reliable, you always get the same output given the same input, we want it this way and it has a high price in terms of complexity and power usage (for example we need a relatively big voltage difference to differentiate 0 from 1, because we really don’t want to mistaken 1 for 0).
So a direct emulation of brain using computer is using the weakness of computer (high power, low parallelism) to emulate the strength of brain (lower power, high parallelism), the result can’t be good. In order to get better results, you either need to play to computer’s strength, which I think would mean better algorithm which rely on the accuracy of computer; or you redesign the machine to be more like neurons. The former is already discussed in the blog post, and I like Yann LeCun’s argument. For the latter, there’re some non-Von Neumann architectures such as memcomputing, or neuromorphic chip that may show promise. If you compare the size of a transistor (14nm in production, 5nm in 5 years) to the size of a neuron (4000nm diameter for the smallest one), we can pack a whole lot of transistors in the footprint of a neuron, the question is how do we make use of these transistors.
Tim Dettmers says
These are all good arguments that you give here. With this blog post I tried to establish something which is practically useful and because analog computers are not used widely or successfully you cannot avoid the awkwardness of transforming analog computation to digital one. I think my approach of mathematical modelling and then uses the known complexities for the mathematics on digital computers is the fairest comparison that we can do as of now.
The problem with memristors, and non-Von-Neumann architectures is that no successful computers exists which make use of these ideas in a practical sense, and thus it is hard to include them into a practical framework — there is just no data I can base any estimates on. I will surely revise my estimates once such computer become available for practical computational tasks.
Jonathan says
I’d just like to point out that you are generalizing brain simulations only to the one kind of connectome type of mass brain simulation where we are simply modeling physical connections and testing that the simulation resembles the brain in homeostasis. There are lots of other smaller (and even bigger http://www.sciencemag.org/content/338/6111/1202 ) models that explicitly test functionality and do a lot for our understanding of actual brain function. In fact one of the main stimuluses for the criticisicm of the EU Brain Project was the shift of focus from these models to the larger simulations you speak of.
Tim Dettmers says
This is good criticism and I agree; I will update my blog post with this information when I find the time.
Pete Chapman says
Tim this is a fascinating read. Thank you for sharing. Biological systems are so mind blowingly more complex and larger scale than any conceivable digital system. We could run today’s Internet for a hundred years and transmit less data than the DNA in the cells of one human being.
JustLeavingAQuickComment says
The Human Genome is 3 billion base pairs long. Even if you count both strands, the ribosome and do not compress the data it’s not that much data.
That’s the beauty of it.
In case you want actual estimates: actual genome [700MB], sequenced genome [200GB], compressed genome as variations from another one [4-125MB?].
By the way, most of the genome is ‘junk’ anyway as far as we know, a lot of it are long forgotten sequences that have been hanging in there for a long time doing pretty much nothing. Look up what transposons are!
As it was pointed out in the blog, biology is very efficient, even when it’s not.
Tim Dettmers says
You are right, our genome contains very compressed data. The interesting thing is, if you want to describe everything in and about a single human being you can fill probably terabytes of information, yet all of this rich expression came just from a compressed sequence of information (my partial genome from 23andMe is just 25MB) — I think this is one of the things which makes the entire biology of life so interesting and exciting!
Frank says
I used to work at Numenta (as a side note, all those “dreams” generated top-down by DBNs of one kind or another are something we used to do… 5 years ago, or at least we observed it, but didn’t pursue). I would say the #1 problem we had was scale, and there was no technology in existence to approach the scale of the brain. The second problem, IMHO, is that it is not yet entirely clear how the brain does what it does exactly. We have many pieces of the puzzle, but maybe not a complete picture yet (I haven’t checked the neuroscience literature lately though). I don’t know if it is true that brains would be just a souped up DBN, for example, although it seems that deserves to be explored. But some people will say that QM is involved in brain processing – I have no opinion myself as I am not a neuroscientist, but I’ve heard it. All in all, it seemed to me that at the time (5 years ago), there were still a lot of unknowns about the brain.
Anthus says
I have been a neuron firing randomly and intend to convert to firing actively, spreading this article in concert with all the other people tweeting, sharing, posting to comprise our hive mind. From one neuron to another, thanks for your hard work!
ptrman0x0 says
Where/what is the source of the claim that neurons use and modify DNA “programs”?
Tim Dettmers says
I forgot about that. I will add it together in an update later in the day.
GMoney says
Interesting post. However, I think the case you make for one single neuron being equivalent to a whole convolution net is quite weak. Since this is quite important for your argument, it would help if you expanded on that more convincingly.
Tim Dettmers says
I relate the single parts to equivalent parts in a convolutional net, what more can I say? You mean justifying things like dendritic spikes with more evidence, or justifying my mathematical modelling in more detail?
I could cite more evidence for dendritic spikes, as for the mathematical modelling you will find more information about that in my blog post about convolution (the examples about convolution as diffusion of information is most relevant).
Adamant-ish says
Very interesting and thorough. I’m just a programmer so not properly qualified to respond but I had to say that your argument that the brain is well optimised doesn’t seem to hold water.
You reason that because there is resource pressure for brains to be small that means they must be fully optimised. That’s like building a program in a slow running language like javascript, provisioning the minimum possible machine size for it then concluding that it couldn’t be much smaller if written in C++. In fact, I wouldn’t be surprised if we discover the brain to be millions of times less efficient than our inventions in terms of excess system complexity. Energy consumption may turn out to be another matter though.
Evolution doesn’t necessarily move forward all the time or strategise its designs well. It has the benefit of time to look clever but in many ways it isn’t.
Before planes, many thought that we would need a system with the complexity and fluidity of flapping wing to achieve agile flight. By coming to understand the underlying principles we achieved nearly the same goals with much simpler technology.
Of course, intelligence is a far harder problem than flight and it may be that the first effective AI will not be mostly human designed but need to build itself through an evolution-like process such that nobody understands. But I tend to think we’ll be able to step in at a few points and make some decisive tweaks.
I just don’t buy the defence of the brain’s magnificence. If the brain were well optimised for useful work:
1) It wouldn’t make the kind of constant mistakes in reasoning which mean that smart people can be hugely more effective than average ones. If the retort to this is that effectiveness != evolutionary fitness then that wins the argument against brains too.
2) We wouldn’t need to have fine-grained discussions about why current AI visual systems aren’t really better than human’s even though they could appear to be. You show that our brain is at least a million times bigger than a supercomputer so if it were optimised, this wouldn’t be debatable. We would have nothing that slightly resembles AI yet.
3) The cerebellum wouldn’t require that 3 quarters computational power just for orchestrating muscle movements considering we have robots which do it on a simpler level already.
4) It wouldn’t keep producing smart people who are mentally ill.
Is there something big I’m misunderstanding?
Tim Dettmers says
1) Google trained a convolutional net on many hundreds of million of images and I believe it had an accuracy of about 20% (80% error); if you increase the possible knowledge and decision making, you increase the possibility for error exponentially. This is why ImageNet is so much harder than other data sets, and this is the same reason we as humans do mistakes on a daily basis. It is the same as: The more you know, the more you don’t know.
2) Arguably, we have nothing that resembles AI yet; our deep learning systems are currently able to recognize objects in images which is a basic, non-cognitive (in the sense of “thinking”) ability of the brain. It is true that we have systems which throw sentence creation from images in the mix, but arguably these systems do not have a real understanding what language is
3) We do not have robots with the same degrees of freedom that animals have: Try to get a robot to throw a ball in the same way as humans do — robots currently are not able to do this
4) Current views seem to suggest that genes that have a probability to cause a certain mental illness are good as long as you do not have the disease. Or in other words: Genes that make you schizophrenic, depressed, psychopathic, or psychotic increases your fitness (creativity, intelligence, sociability) as long as you do not have schizophrenia, depression, psychopathy or delusions in the first place. Of course this is much more complex, for example in shamanistic societies schizophrenia may increase your fitness dramatically because you easily become the religious authority due to your ability to receive “prophecies”; this may be another reason why we will have genes that cause schizophrenia — remember that most humans may have lived in such societies at one point.
Thus I do not think it is correct to say that nature does something wrong by producing mental illness. If you model gene inheritance distributions with a markov chain that will settle to probability distributions which produce optimal fitness according to rules governed by evolutionary game theory, you will probably see that genes that cause mental illness are necessary to maximize the fitness of an overall gene pool (there are certainly papers on this which you can look up) — thus mental illness is inevitable under optimal conditions.
Does this make more sense now?
DamianReloaded says
Hi, Great article. Just popped in this discussion to note that CNNs are far from being equivalent in design to the visual cortex, so precision comparisons between the two are kinda off base (like comparing a bicycle and a bus, both will get you there but the bicycle needs to go back and forth for each passenger and it may break). It’s difficult to make a prognosis based on unknown discoveries, but I’d be willing to bet that in the next 10 years our basic (superficial) understanding of how biological neural nets process information will improve and will help developing new artificial ones that will close the gap between biological and artificial intelligence.
Tim Dettmers says
I did not try to say that they are not equivalent, but that neurons are closer to CNNs than they are to perceptrons. Only thinking that CNNs consists of neurons is not biologically sensible. It is better to say that a few layers of CNNs constitute one neuron. If you take this view then you realize how far we are from realizing the performance of humans in machines.
Tim Dettmers says
For more criticism on this post, please read the article above.
In response to the article: I would not say that my claims are too much of an exaggeration, but rather present a different angle on the problem. If I look at the biological evidence and how it can be modelled mathematically my model emerges as a consequence of it.
I do not think, as you stated, that these are fixed rules we have to adhere to. I think there is a lot of wiggle-room for algorithms and architectures. My main point is really, that deep learning is on the right way and if we continue what we are doing in deep learning, we will eventually get to general artificial intelligence — it will just take a bit more time to get there than we thought.
Another main point I make is that we do not have to replicate biology, because we already do. This is almost the opposite of what you claim I say in this article.