Will this article ever end?

30 October 2020

The obvious answer is yes: there are only so many atoms in the universe to make paper and ink out of, so of course it will have to end at some point. But questions of this ilk, like so many apparently simple questions, can spark an interesting discussion. The more interesting question is this:

Question: Do all yes/no questions have an answer, and if so, is there an algorithm that can compute it just from the statement itself?

The first half of the question is clearly false, think about paradoxes like ‘Is the sentence “This sentence is false” true?’. For further discussion, please go talk to your favourite philosopher (mine is Nick Bostrom). The second half is something we will be looking into in more detail; more precisely, how such a question led to the creation of modern computer science. To rephrase the previous question with a bit more formality and clarity:

Question: Given a yes/no statement in formal logic, does there exist an algorithm that can compute the yes/no answer?

Formal logic is the logic used by mathematicians and logisticians (see How can we tell that $2\neq 1$?). Further discussion is well outside the scope of this article, but it is just there to make sure that the question the algorithm would try to solve doesn’t produce paradoxes like the one above. This question, or one very similar but in German and itself written in the language of formal logic, was posed by one of the most famous mathematicians of the 20th century—David Hilbert. It is called the decision problem, or Entscheidungsproblem in German. We can clearly see that this is an important question. Almost any question in mathematics can be posed in terms of formal logic and mathematics can be used almost anywhere. So being able to show that there will always be an algorithm to compute the answer to any question would be of incredible use. It would mean that there would always be a clear method to solve any problem that can be reduced to mathematics.

Before we can dig our teeth into this problem though, we must first properly define what we mean by an algorithm. Since we live in a world built by, and upon, algorithms, we intuitively know that it is something that takes in some information, does something to it, and then gives us some sort of ‘useful’ answer. This is a very nebulous definition, clearly, as I used the term ‘does something’. That and the idea of utility is a heavily contextual notion (if I’m stuck and dehydrated in a desert, a proof to the Riemann hypothesis isn’t going to be very useful to me). A slightly more formal definition of an algorithm in plain English would be the definition from Marshall H Stone and Donald Knuth:

“… a set of rules that precisely defines a sequence of operations such that each rule is effective and definite and such that the sequence terminates in a finite time.”

Or in layman’s terms, a finite sequence of well-defined operations. But as any mathematician knows, writing something in plain English is usually much easier than doing so in mathematics.

This is where Alan Turing comes into the picture. Turing was one of the greatest mathematicians of the 20th century—if not of all time—and was a key figure in the foundation of theoretical computer science and artificial intelligence; he is often called the father of both fields. Turing did a whole host of wonderful and amazing work in his short career. Unfortunately, despite the revolutionary work that he did both academically and in breaking Nazi codes during the second world war, his work was not truly recognised until long after his passing. This was, of course, partly to do with the secretive work of codebreaking, but also because he was gay at a time when homosexual acts were illegal. In 1952, he was convicted of ‘gross indecency’ and given the option between a prison sentence or chemical conversion therapy; the chemical injections he chose led to depression, and later, his premature death by suicide. Turing died at the age of only 41, and we can only wonder what astonishing work he would have done if the prejudices of the time hadn’t caused his premature death.

Turing’s first groundbreaking work was solving Hilbert’s problem and defining what was meant by an algorithm in formal mathematics using his elegant concept of the Turing machine. A definition and answer had been given a few months earlier by another mathematician called Alonzo Church—a pioneer in his own right and someone who Turing would later study under for his PhD. However, not only was Turing’s solution a greater conceptual leap than Church’s proof, but it was also a more intuitive and pragmatic idea than the heavy formal logic of Church’s approach, called lambda calculus. But enough beating around the bush, let us talk about the Turing machine.

The Turing machine and the Church–Turing thesis

Turing’s idea of how to formalise an algorithm in a mathematical framework was to design a machine that would be similar to the concepts proposed by Ada Lovelace and Charles Babbage, two more pioneering mathematicians and founders of the field of computer science. This was a mathematical construction composed of an infinite tape, a box (normally called a head) that could read what’s written on the tape and write new things onto the tape, a motor to move the head left or right, a set of numbers/symbols to read, and a processor full of instructions. The idea is that the input for the algorithm is written on the tape, the Turing machine then reads this and performs a sequence of simple actions dictated by the processor, perhaps using the tape to temporarily store information, and then writes the output of the algorithm on the tape. We’ll look at an explicit example of a Turing machine working in a minute, but first all the components of a Turing machine fit and work together like so:

The processor can have (finitely) many different configurations of its internal settings: we shall call these states.
The Turing machine itself doesn’t have any memory. But we can encode what it previously read by switching into one of many different states.
The Turing machine is fully deterministic and well-defined. Given the current state and the symbol the head is reading, we know exactly what the machine will do.
In a given state, the Turing machine can do any combination of the following: read the tape, write to the tape, move the head left or right, and (depending on what was read from the tape) change the state that the processor is in.
There has to be at least one state in the processor that will stop the Turing machine and signal the end of the computation: this is usually called a halt state for the Turing machine. This is typically done when the head has printed the output of the algorithm.

A schematic representation of a Turing machine.

Turing’s great insight was not only the simplicity of his construction but the notion that perhaps any form of computation could be represented as a Turing machine. This was later expanded upon during his study under Church for his PhD, leading to a groundbreaking theory which is now the cornerstone of computer science and a branch of logic and mathematics called computability theory. The Church–Turing thesis states that any computable function (a function computable by a machine or any formal algorithm that solves a problem) can be translated into an equivalent computation model involving a Turing machine. There is still some ambiguity about what exactly a computable function is and if every possible computation is represented as a computable function. But as these questions are still being discussed by academics all over the world, we will sidestep this thorny issue.

The Church–Turing thesis quickly leads to the conclusion that since our brain can compute mathematics and all algorithms that have been developed thus far can be computed by our brains, every algorithm ever computed is equivalent to a Turing machine! Because of this fact, the modern definition of an algorithm is based around Turing machines.

Now that we have a mathematical definition of a general algorithm, can we answer Hilbert’s question? Yes, we can, but first let’s get a little more comfortable with Turing machines.

Example Turing machine: how to build a dam in three easy steps

Like most things in mathematics, the best way to get some familiarity with an idea is to see a worked through example. The toy machine I will use to illustrate Turing machines is one from a rather famous family of Turing machines created by Tibor Radó called busy beavers s (Radó also
discovered a general solution to Plateau’s problem, see The birth of the Fields medal). They are very simple machines that are fed an infinite tape full of 0s and their aim in life is to write as many 1s as possible on the tape; but it can’t just print 1s forever, it has to stop eventually. The example we will look at is the 3-state busy beaver.

The processor for a 3-state busy beaver, which terminates as soon as the head reads a 1 while the processor is in state A.

Starting in state A, the Turing machine reads a 0, so it erases this and writes a 1 in its place. Then it moves the head so that it is positioned over the next symbol, and the processor transitions into state B. The next symbol the head reads is again 0, but this time because it’s in a different state the Turing machine leaves this 0 as it is, moves the head again, and transitions into state C. The machine just keeps repeating this process until it enters its halt state.

This 3-state busy beaver halts after 14 steps and writes six 1s on the tape. You can’t do more without more states. The coloured squares show the head position and the state of the processor.

Although we don’t see it here, where the Turing machine derives its main source of power is its ability to switch state depending on the information on the tape and encoded information from its previous state. If you are familiar with a bit of programming, this is very similar to jumping to a memory location in assembly language or conditional statements and loops that are found in higher-level languages like Python. If you want to write one out for yourself in full—I recommend you don’t. Even a simplified binary number checker takes more than a page of explanation of explicit states and even longer if you wanted to show it all working—I know from painful experience. But if you’re feeling very proactive, try to play out the beaver from above: you could use it on a row of coins to represent the tape with $\text{tails}=0$ and $\text{heads}=1$. Extra credit for making a 4-state beaver.

The Turing machine and the decision problem or: how I learned to stop worrying and love the halting problem

Now, finally, back to Hilbert’s Entscheidungsproblem. How did Turing solve it? Turing’s own paper and full technical proof isn’t too hard to read or understand. I will, however, provide a truncated version of the proof and try to provide an alternate method while keeping the core intuition.

In the paper, Turing proposes a method to encode a Turing machine, the states, and inputted symbols, into what he coins a description number. Similar to what we do to with text and Unicode—uniquely representing some text with a binary number. This description number describes the machine, and therefore the computation that it represents, completely! He then asks the obvious next question.

Question: Can one compute the description number of a Turing machine from a yes/no statement in formal logic?

If we could, then we would show that any yes/no statement has at least one corresponding Turing machine (which Turing showed was equivalent to a computable algorithm) and so the answer to Hilbert’s question would be yes! Unfortunately, this is not the case; so, the answer to Hilbert’s question is a resounding no.

Turing was able to show this by using an idea from another Titan in the world of mathematics and logic, Kurt Gödel. Gödel is famous for a lot of important results from the fields of logic, mathematics, and philosophy, and is an interesting person in his own right. However, the key Turing used—and therefore the thing we will be looking at—is Gödel’s incompleteness theorem. This is a wonderfully mind-bending theorem that deserves its own article, and I highly recommend you go read into it (try Gödel’s Proof by E Nagel and JR Newman), but the important piece of logic that Turing adapted was the idea of using something being self-referential to reach a contradiction. We can see this in the paradox I mentioned at the start: ‘Is the sentence “This sentence is false” true?’. Gödel’s idea was very similar, but to formalise it in the realm of formal logic was his great achievement.

To use this insight, Turing proposed a Turing machine, and therefore a computable algorithm, that I’m sure anyone who has ever written code wishes they had in their back pocket: a Turing machine to see if another Turing machine, with a given input, will halt. Or in other words, a computable algorithm which can tell you whether an algorithm, together with an input, would eventually stop and give you an answer, or whether it would just keep cranking through calculations forever.

As an aside, an algorithm that could do this would not only be a boon for any budding programmer wishing to debug their code—but also an incredibly useful tool in pure mathematics. You could solve Goldbach’s conjecture by applying this mystical Turing machine to a simple program to search for the first counterexample! (What would the program look like?) Unfortunately, like most things that are too good to be true, this doesn’t exist. Let’s see why!

The halt-deducing machine $H$.

Let’s suppose we have this amazing halt-deducing Turing machine and call it $H$ (for $H$alt). If Hilbert’s decision problem has a positive answer, this machine will exist—as asking if a Turing machine halts can be written in formal logic—so let’s not get bogged down in the details of how it would hypothetically work. But, on a high level, if we give $H$ a description of a Turing machine and its input, $H$ will either halt and print ‘halts’ onto the tape if the inputted Turing machine halts: or it will halt and print ‘loop’ onto the tape if the inputted Turing machine loops forever. In particular, note that $H$ itself will always halt.

We can then modify this machine a little bit. To $H$ we attach an inverter which will flip its output: if $H$ outputs ‘halts’, the inverter will go into a loop and not halt; if $H$ outputs ‘loop’, the inverter will halt and output a picture of a scorpion. As mentioned before, $H$ needs two inputs: the Turing machine itself and its input. We wish to check if $H$ terminates when we apply it to itself, so we need to put $H$ into both of its input slots. To do this, we can attach a device to copy the input into it, which we can call a cloner. We can call this new Turing machine $H^*$:

The modified Turing machine, $H^*$.

Now that we have $H^*$, and the idea from Gödel of self-application, all we need to do is input $H^*$ into itself. When we input $H^*$ we have two possibilities. If $H^*$ does halt, then the inverter means that $H^*$ doesn’t halt. But if $H^*$ doesn’t halt, then $H^*$ does halt. We have reached a contradiction!

But the only thing we assumed is that $H$, and therefore $H^*$, exists. But as the existence of $H$ leads to a contradiction, we must conclude that a Turing machine like $H$ cannot exist. However, as a Turing machine is a mathematical object and asking if it will halt can be phrased in terms of formal logic, a yes to Hilbert’s question would mean that $H$ does exist! Therefore Hilbert’s question has negative answer: not all yes/no questions can be answered by an algorithm.

This argument is the same as the one Turing presented in his original paper, albeit simplified: there he shows that the description number of $H$ and $H^*$ can’t be calculated as the computation would go on forever. This is what we call an uncomputable number as no finite algorithm could ever compute it.

Typically, the numbers we are use every day ($\pi$, $\mathrm{e}$, $5$, $3/4$, etc) only require a finite amount of information to represent them. This can be the number itself if it is an integer, or a quotient of integers, but some numbers with infinite non-repeating decimal expansions can also be represented with a finite amount of information, like $\pi$ and $\mathrm{e}$. This is because $\pi$ and $\mathrm{e}$ both have finite generating formulae that can produce every digit. In other words, if you give me a position in $\pi$ or $\mathrm{e}$, like the 1737th position, the formula can find the 1737th digit of the number after a finite number of steps. This isn’t always the case: if I have an infinitely long number whose digits are random, you wouldn’t expect to be able to compute its decimal expansion using a finite algorithm. These numbers are uncomputable as they cannot be computed or represented in any finite sense.

The legacy of Turing and his machines

The legacy of the Turing machine and the ideas Turing put forward during his lifetime are hard to quantify. It is always hard to know what the world would be like without certain people and their ideas in it. Just limiting Turing’s body of work down to his idea of the Turing machine, we can imagine that the world would perhaps be almost unrecognisable. The concept of computers and their programs that were pioneered by Ada Lovelace and Charles Babbage had been around for about 93 years before Turing’s own paper on the matter. Despite Church developing his lambda calculus to solve Hilbert’s problem, it is quite difficult to imagine whether the modern computer would be around today, and if so, what it would look like. Perhaps the best view on the importance of Turing’s work comes from a sentiment put forward by John von Neumann, another intellectual juggernaut of the 20th century, to Stan Frankel. Frankel, writing in a letter to Brian Randell, says,

“… [Von Neumann] firmly emphasised to me, and to others I am sure, that the fundamental conception is owing to Turing—insofar as not anticipated by Babbage, Lovelace and others.”

This was in reference to Von Neumann architecture, the modern framework of the stored program computer, which almost every computing device in the world, past 1945, can draw its direct heritage from. That, and the wider body of work credited to Turing, is why we call him the father of theoretical computer science. There is much more to talk about relating to Turing and his work, but as I said at the beginning, this article must end.

Emilio McAllister Fognini

Emilio is a master’s student at UCL and is a first order approximation of an analysis student. He may be a finite state Turing machine, but he hasn’t halted or gone into a loop yet; we’ll just have to wait and see…

In conversation with Christina Pagel
Ellen Jolley asks for (independent) Sage advice
How can we tell that 2≠1?
Maynard manages to prove that 2≠1 in less space than it took Bertrand Russell to prove that 1+1=2
Colouring for mindfulness
Florian Bouyer explains the beautiful geometry behind his mathematical colouring-in designs.
The birth of the Fields medal
Who is behind the so-called Nobel prize of mathematics? Gerda Grase investigates.
(Not) squaring the circle
Sam Hartburn attempts the impossible
Fun with Markov chains
I love Markov is I love Markov chains love me.

The Turing machine and the Church–Turing thesis

Example Turing machine: how to build a dam in three easy steps

The Turing machine and the decision problem or: how I learned to stop worrying and love the halting problem

The legacy of Turing and his machines

More from Chalkdust