post

How can we tell that 2≠1?

What if I told you that $2=1$? You may say I’m wrong. OK, well, what if I proved it to you? We can both agree that there’s an $x$ and a $y$ where $x = y$. From there, multiply, subtract, factorise, divide, substitute, divide again, and you get $2 = 1$.

Still not happy? You’re probably unconvinced by my so-called ‘proof’. OK, I say, and, after a minute, hand you a sheet of paper with the following hastily scrawled on it: It’s better, but you’re still displeased. This time, I’ve made clear what steps I’m taking from $x = y$ to $2 = 1$. However, you point out, I don’t connect any of these steps. Nodding slowly, I take my time and write out a very nice, orderly proof, complete with justifications for each line:

At this point, you spot my mistake: in going from line 4 to 5, I have divided both sides by $x-y$. But we began with the assumption that $x = y$, meaning that $x-y = 0$, and dividing by 0 is not defined! This means that lines 5 to 7 are operating on nonexistent values and are therefore meaningless.

You’re happy with yourself, but something is bothering you. To reveal my mistake, you asked me to be more precise. But why stop here? Because you found what you were looking for? That’s not how truth is found.

My proof, like all proofs, is a path from one statement to another, just as we may follow the path from $ax^2 + bx + c = 0$ to $x = \big(-b \pm \sqrt{b^2-4ac}\big)/{2a}$, or from the existence of rectangles to the transitivity of parallelism (see below). Along this path I have made several intermediate statements, and linked them together with justifications. You found that one of my links is flawed, and you wonder how we know that the others aren’t also wrong. You begin to question foundational principles, wondering, for instance, why we’re even allowed to do the same thing to both sides of an equation.

Euclidean geometry: For the unfamiliar—Euclidean geometry (standard geometry on a flat surface) rests on 5 assumptions, one of which (the parallel postulate) has historically been regarded as ugly. In attempting to eliminate the parallel postulate, mathematicians have found numerous other statements that are equivalent to it, such as that a rectangle exists or that parallelism is transitive.

You keep digging deeper and deeper, questioning more and more of what you previously took to be correct. Eventually, you come across a piece of mathematics that is perhaps the most beautiful and elegant thing you’ve ever laid your eyes upon: natural deduction.

Natural deduction

Natural deduction is one result of asking for deeper and deeper justification when doing maths. A system of natural deduction is a set of very simple, almost irrefutable rules that act to formalise our intuition about what is definitely true.

Reiteration (R): if $P$, then $P$.

These rules include such things as reiteration, which simply allows us to repeat ourselves. Precisely, reiteration says that if you know that a statement $P$ is true, then you can conclude that $P$ is true. This is hardly controversial.

Conjunction introduction ($\land$I): if $P$ and $Q$ then $P\land Q$.

There are two rules for the natural idea of ‘and’. First is the so-called conjunction introduction rule, stating that if you know that $P$ and $Q$ are both true, then you may conclude $P \land Q$, pronounced ‘$P$ and $Q$’. On the other side, we have conjunction elimination, stating that if you know that $P \land Q$ is true, then you may conclude $P$ and also may conclude $Q$.

Conjunction elimination ($\land$E): if $P\land Q$, then $P$ and $Q$.

These rules don’t feel like they do much besides swapping out ‘and’ for ‘$\land$’; however, doing so is important for formality and precision.

Disjunction introduction ($\lor$I): if $P$, then $P\lor Q$.

Things start to get tricky with the rules codifying ‘or’. The first, disjunction introduction, tells us that if $P$ is true, then you may conclude $P \lor Q$, pronounced ‘$P$ or $Q$’: if I am hungry, then it’s also true that I’m either hungry or tired.

Disjunction elimination ($\lor$E): if $P\lor Q$ and from $P$ we can prove $X$ and from $Q$ we can prove $X$, then $X$.

The second rule, disjunction elimination, states that if $P \lor Q$ is true, and from $P$ you can prove $X$, and from $Q$ you can prove $X$, then you may conclude $X$. More colloquially, if either $P$ or $Q$ is true, and in both cases $X$ is true, too, then $X$ is true. For example, if I’m either well-rested or well-fed, and being well-rested makes me happy, and being well-fed makes me happy, then I must be happy.

Implication introduction ($\Rightarrow$I): if from $P$ we can prove $Q$, then $P\Rightarrow Q$.

Then come the rules regarding implication. We have implication introduction, stating that if from $P$ we can prove $Q$, then we may conclude $P \Rightarrow Q$, pronounced ‘$P$ implies $Q$’. And we have implication elimination (also known as modus ponens), which states that if $P \Rightarrow Q$ is true and $P$ is true, then we can conclude $Q$. If the weather being rainy implies that I am cosy, and the weather is rainy, then I must be cosy.

Implication elimination ($\Rightarrow$E): if $P\Rightarrow Q$ and $P$, then $Q$.

Finally, we come to the most arcane rules, those handling negation. The negation of $P$ is written $\neg P$ and pronounced ‘not $P$’. Before talking about the $\neg P$ rules, however, we must first introduce a new symbol: $\bot$ (pronounced ‘bottom’), which represents impossibility or contradiction. We can then introduce bottom introduction, which states that if both $P$ and $\neg P$ are true, which is absurd (usually… there are systems of logic that admit both $P$ and $\neg P$ at the same time, called paraconsistent logics), then we can conclude $\bot$, to represent this impossibility.

Bottom introduction ($\bot$I): if $P$ and $\neg P$, then $\bot$.

We’re then able to make use of $\bot$ through negation introduction, which states that if from $P$ we can prove $\bot$, then we can conclude $\neg P$. This is reasonable; if $P$ being true led to a contradiction, then $P$ isn’t true, so $\neg P$ is.

Negation introduction ($\neg$I): if from $P$ we can prove $\bot$, then $\neg P$.

Finally we have negation elimination. This one is a nice easy way to end: it says that if we know $\neg \neg P$, then we can conclude $P$. If something isn’t not true, then it must be true!

Negation elimination ($\neg$E): if $\neg\neg P$, then $P$.

And with that, we have completed (one kind of) natural deduction, laying out a framework for proofs based on undeniable principles so that we can be completely confident in our results.

Now, you may be wondering, hey, maths is about numbers and shapes and functions and vector fields, but all we’ve been working with are $P$s and $Q$s! Not a single $n$ or $x$, let alone an $f$, has been written so far!

Fear not! Purely logical systems such as natural deduction are key ingredient for building typical maths. For example, to define numbers, we may first extend to predicate logic, then construct the naturals (via the Peano axioms), which we’ll use to make the integers and the rationals (via equivalence classes), then finally the reals (via Dedekind cuts).

So, in fact, we still we get to work with all the maths we’re used to! Plus, due to the use of natural deduction, we have the added benefit of being confident about what we’re doing at every layer of abstraction!

Predicate and propositional logic: The logic we’ve been building, with $\land$, $\lor$, $\Rightarrow$, $\neg$, and $\bot$, is known as propositional or zeroth-order logic. Predicate or first-order logic is an extension of propositional logic wherein our statements ($P$, $Q$, $X$, etc) may be parametrised. So as well as having $H$ mean that ‘I am hungry’, we may also have $\mathcal H(x)$ mean that ‘$x$ is hungry’. Additionally, predicate logic includes two quantifiers, $\forall$ and $\exists$, which respectively mean ‘for every’ and ‘there exists’: $\forall x \mathcal H(x)$ means that everyone is hungry, and $\exists x \mathcal H(x)$ means that (at least) one person is hungry.

So what?

If you’re anything like I was at age 17, or anything like how I portrayed you in the beginning of this article, you’re drooling right now. It’s like all of your fantasies regarding rigour and precision have been heard and answered by divine mathematicians.

But maybe you’re not intrinsically motivated by rigour, so you’re less excited by natural deduction. Which is fine! I’m not hurt. Maybe a little bit. Or maybe you just feel that this is overkill—did you really need all this work to know that $2 \neq 1$? Or maybe you’re not convinced that these rules are correct; perhaps you don’t agree that from $\neg \neg P$ we can conclude $P$.

Excluding the middle: If you don’t agree, you are not alone! That $\neg \neg P$ entails $P$ is a consequence of a rule called the law of excluded middle, which states that $P \lor \neg P$. (This law is built-in to the system of natural deduction that we created.) Some mathematicians (the intuitionists or constructionists) reject the law of excluded middle, thus also forfeiting that $\neg \neg P$ entails $P$. One reason to question the law of excluded middle is that it allows us to state that something exists without stating what it is. For instance, we are able to prove that an irrational number raised to the power of an irrational number can be rational, but without giving an actual example. If we reject the law of excluded middle, then all such proofs must actually construct an example.

Still, I posit, natural deduction is worth your time. Because we’ve been so rigorous in building the system up, we gain the benefit of knowing exactly what we’re talking about. Before establishing such precision, we may have used $P \Rightarrow Q$, but without a sense of what, exactly, it really means. Now we have a precise definition: it means that from $P$ we can derive $Q$ (as per implication introduction); and it means that if we have $P$ then we can conclude $Q$ (as per implication elimination); and it means nothing else.

From this precision, we reap at least two amazing things: metamathematics and computers.

For one, we now can dip our toes into the metamathematical branch of proof theory, where we prove things about proof systems. For instance, we may wonder if natural deduction—or any proof system—is complete, meaning, roughly, that any question we can ask within the system can be answered by the system. Likewise, we may wonder if it’s consistent, meaning we can never prove $\bot$. Or perhaps it’s both? (Interestingly, logical systems capable of sustaining mathematics are never both at once, due to Gödel’s incompleteness theorem.) Proof theory is full of fascinating and surprising results, all enabled by being very precise about what we’re talking about.

@mathsproofbot and @mathstableaubot both prove the true statements that @mathslogicbot tweets.

Additionally, with our newfound precision, we get to enlist computers! Because computers generally demand the utmost precision in order to be of any use, they aren’t of much help until we achieve such rigour. Now, however, we are able to get their help writing proofs, using proof assistants such as Coq and Lean, or interactive proof-writing systems such as my own (see maynards.site/items/fitch/full). There are even programs that can write proofs entirely for us, as exemplified by @mathsproofbot and @mathstableaubot.

So let us return to my claim that $2=1$. How can we reject this? ‘By intuition’ is the easiest way: clearly $2$ is not $1$. However, now we may also turn to our system of natural deduction, where we were very careful about what we took to be true, and point out that $2 = 1$ is not true in this system. To exemplify this, we can show that it will be rejected by proof assistants and proof-writing algorithms. Finally, we may rest confident. That is, until we dig deeper once again, questioning the principles of our system of natural deduction…

post

The birth of the Fields medal

What comes to mind when you think of the most prestigious awards in mathematics? Chances are, it’s either the Abel prize or the Fields medal. I think most of us have heard about Abel at one point or another, but I became increasingly curious about Fields. Who is behind the so-called Nobel prize of mathematics?

John Charles Fields

John Charles Fields was born in 1863, in Ontario, Canada and studied at the University of Toronto before moving to the United States in 1887 to study for a PhD at Johns Hopkins University in Baltimore. Fields was involved in several mathematical societies—the Royal Society of Canada and the Royal Canadian Institute among others—and he spent most of his life lecturing at the University of Toronto. Though a great mathematician, he excelled at organising mathematical events and promoting research.

One of his greatest achievements was working with the International Mathematical Union to hold the 1924 International Congress of Mathematicians (ICM) in Toronto. This was only the second congress held by the union after the First World War. During the first congress, Germany, Austria–Hungary, Bulgaria and Turkey were excluded from the union and many were worried this might escalate. But Fields was determined to make the 1924 congress work. He spent months in Europe fundraising tirelessly. Some say personal acquaintances with rulers of Europe aided his efforts—attendance of a dinner with the king of Sweden and a later meeting with Mussolini in Bologna certainly support this claim—but no direct sources remain. After organising the conference, with money to spare, he paid for European mathematicians’ travel costs across the Atlantic. And this is where the idea of the Fields medal was born.

The proposal

It is proposed to found two gold medals to be awarded at successive International Mathematical Congress for outstanding achievements in mathematics […]

The idea of the award was first presented on 24 February 1931. Because of Fields’ talent for acquiring funds, the medals were to be funded by the remaining finances from the 1924 ICM. Fields wrote, “It is proposed to found two gold medals to be awarded at successive International Mathematical Congress for outstanding achievements in mathematics […]” The awards would be open to the whole world and would be made by an international committee.”

He proposed the medals would be handed out at every congress for work already done, but also to encourage further achievement, starting with the next congress in 1936. Unfortunately, Fields’ health started to decline in 1932. Just days before his death, he noted in his will to leave an additional 47,000 Canadian dollars to fund the medal. As he had intended, the very first medals were awarded in the 1936 ICM in Oslo, Norway.

Today, Fields medals are awarded every four years to between two and four brilliant mathematicians under the age of 40 with a prize of C$15,000. The age limit was not directly due to Fields himself; it was added in 1966 to promote diversity, although recently this choice has been criticised as there is anecdotal evidence suggesting female mathematicians fare better later in their careers.

The medal itself was designed by Robert Tait McKenzie and features Fields’ monogram alongside a profile of Archimedes and the date of the medal’s founding, incorrectly written in Roman numerals: MCNXXXIII (1933). The reverse features an illustration of one of Archimedes’ most famous achievements: deducing the surface areas and volumes of the cylinder and the sphere which can be inscribed within the cylinder. Each face of the medal bears a Latin phrase:

TRANSIRE SUUM PECTUS MUNDOQUE POTIRI

(To transcend one’s human limitations and master the universe)

on the obverse, and the reverse reads:

CONGREGATI EX TOTO ORBE MATHEMATICI OB SCRIPTA INSIGNIA TRIBUERE

(Mathematicians gathered from the whole world to honour noteworthy contributions to knowledge)

The date of the prize being awarded and the recipient’s name is engraved on the rim of the medal.

In 1936, Jesse Douglas and Lars Ahlfors were the first to receive Fields medals. What was so special about these two mathematicians, that they stood out from across the globe to become the first winners of the (arguably) highest honour in mathematics?

Jesse Douglas

Jesse Douglas’s love for mathematics started in high school. While studying at the City College of New York, he became the youngest recipient of the college’s Belden medal for excellence in mathematics in his first year. After his undergraduate degree, Douglas began his doctoral study at Columbia University under the supervision of Edward Kasner, who introduced Douglas to the problem that became his most noteworthy achievement—Plateau’s problem.

Plateau’s problem (also known as the soap bubble problem) is about showing the existence of a minimal surface for a given boundary, and possibly with other constraints. This has a fascinating physical application in the form of soap films. A frame filled with a thin soap bubble, due to the action of surface tension, will always take the shape of minimum surface area: a so-called minimal surface. In 1931, Douglas discovered and proved a general solution, for which he came to win the Fields medal (although this was also done independently by Tibor Radó: Radó also created the busy beaver family of Turing machines, see Will this article ever end). Before this contribution, only some special cases had been proven by such mathematicians as Schwarz, Weierstrass, and Riemann.

Around the same time, Douglas was working at the Massachusetts Institute of Technology as an assistant professor, later to be promoted to an associate professor. He continued to work on Plateau’s problem even after solving it, focusing on further generalisations of the problem. He published 11 papers between 1939 and 1940 on these generalisations.

After working on Plateau’s problem for many years, Douglas diverted his attention to group theory, making significant contributions to the field. He spent the last 10 years of his life as a professor, back at the City College of New York.

Lars Ahlfors

Lars Ahlfors was a Finnish mathematician born in April 1907. As a child, he already loved maths. This love only grew although most of his life was spent in the midst of war.

Ahlfors started his university studies at the University of Helsinki and was taught by two of the best Finnish mathematicians at the time: Ernst Lindelöf and Rolf Nevanlinna. After this, he followed Nevanlinna to Zürich and discovered mathematics can be taught in a different way. Lars came to understand “that [he] was supposed to do mathematics, not just learn it”. During his time in Zürich, Ahlfors encountered Denjoy’s conjecture (now known as the Denjoy–Carleman–Ahlfors theorem) which he proved in 1929. Loosely, the theorem determines the number of values an entire function (a function that is differentiable everywhere in the complex plane) can take at infinity. This is what gave Ahlfors international recognition, though he himself always credits the significant help of Nevanlinna and Pólya as the main influences that lead to his proof. Though impressive, this was not what earned Ahlfors his Fields medal. That award was specifically credited to a single paper. This paper shed some light on Nevanlinna’s theory of meromorphic functions and quasiconformal mappings (more stuff to do with complex functions). Currently this paper is recognised as one of the starting points for essentially a new branch of analysis called metric topology.

Having already gone through the First World War as a child, Ahlfors was just about to go through the next challenge—the Second World War. But, surprisingly, his research benefited. He was able to devote himself to his work completely, even though libraries had closed due to the lack of students. In 1944, Lars was offered a position in Zürich, opportune timing since the Soviet Union was attacking Finland and Ahlfors’s own health was poor. So he, his wife, and two young children planned to flee to Switzerland, via Stockholm, the UK and then Paris.

Times were tough, and Ahlfors was only able to take 10 crowns with him. So what did he do on arrival in Stockholm? He smuggled his Fields medal across the border and sold it in a pawn shop! He later reflected, “I’m sure it is the only Fields medal that has been in a pawn shop. As soon as I got a little money some people in Sweden helped me retrieve it.” What a relief! Imagine if someone had actually bought it!

Luckily, the family made it to Switzerland safely. Slowly, though, Ahlfors began to feel that his invitation had been not an honour, but simply an attempt to fill a position they couldn’t find anyone else for. So when an offer came in from Harvard, Ahlfors gladly accepted it and remained there for more than 30 years until his retirement.

Since Douglas and Ahlfors first won in 1936, 58 other mathematicians have been awarded the medal, including the only person to decline the award—Grigori Perelman—and the first and so far only woman to receive the award—Maryam Mirzakhani. The next ICM is due to be held in 2022 in St Petersburg, Russia, and the story of the Fields medal and its winners will continue.

 

post

Will this article ever end?

The obvious answer is yes: there are only so many atoms in the universe to make paper and ink out of, so of course it will have to end at some point. But questions of this ilk, like so many apparently simple questions, can spark an interesting discussion. The more interesting question is this:

Question: Do all yes/no questions have an answer, and if so, is there an algorithm that can compute it just from the statement itself?

The first half of the question is clearly false, think about paradoxes like ‘Is the sentence “This sentence is false” true?’. For further discussion, please go talk to your favourite philosopher (mine is Nick Bostrom). The second half is something we will be looking into in more detail; more precisely, how such a question led to the creation of modern computer science. To rephrase the previous question with a bit more formality and clarity:

Question: Given a yes/no statement in formal logic, does there exist an algorithm that can compute the yes/no answer?

Formal logic is the logic used by mathematicians and logisticians (see How can we tell that $2\neq 1$?). Further discussion is well outside the scope of this article, but it is just there to make sure that the question the algorithm would try to solve doesn’t produce paradoxes like the one above. This question, or one very similar but in German and itself written in the language of formal logic, was posed by one of the most famous mathematicians of the 20th century—David Hilbert. It is called the decision problem, or Entscheidungsproblem in German. We can clearly see that this is an important question. Almost any question in mathematics can be posed in terms of formal logic and mathematics can be used almost anywhere. So being able to show that there will always be an algorithm to compute the answer to any question would be of incredible use. It would mean that there would always be a clear method to solve any problem that can be reduced to mathematics.

Before we can dig our teeth into this problem though, we must first properly define what we mean by an algorithm. Since we live in a world built by, and upon, algorithms, we intuitively know that it is something that takes in some information, does something to it, and then gives us some sort of ‘useful’ answer. This is a very nebulous definition, clearly, as I used the term ‘does something’. That and the idea of utility is a heavily contextual notion (if I’m stuck and dehydrated in a desert, a proof to the Riemann hypothesis isn’t going to be very useful to me). A slightly more formal definition of an algorithm in plain English would be the definition from Marshall H Stone and Donald Knuth:

“… a set of rules that precisely defines a sequence of operations such that each rule is effective and definite and such that the sequence terminates in a finite time.”

Or in layman’s terms, a finite sequence of well-defined operations. But as any mathematician knows, writing something in plain English is usually much easier than doing so in mathematics.

This is where Alan Turing comes into the picture. Turing was one of the greatest mathematicians of the 20th century—if not of all time—and was a key figure in the foundation of theoretical computer science and artificial intelligence; he is often called the father of both fields. Turing did a whole host of wonderful and amazing work in his short career. Unfortunately, despite the revolutionary work that he did both academically and in breaking Nazi codes during the second world war, his work was not truly recognised until long after his passing. This was, of course, partly to do with the secretive work of codebreaking, but also because he was gay at a time when homosexual acts were illegal. In 1952, he was convicted of ‘gross indecency’ and given the option between a prison sentence or chemical conversion therapy; the chemical injections he chose led to depression, and later, his premature death by suicide. Turing died at the age of only 41, and we can only wonder what astonishing work he would have done if the prejudices of the time hadn’t caused his premature death.

Turing’s first groundbreaking work was solving Hilbert’s problem and defining what was meant by an algorithm in formal mathematics using his elegant concept of the Turing machine. A definition and answer had been given a few months earlier by another mathematician called Alonzo Church—a pioneer in his own right and someone who Turing would later study under for his PhD. However, not only was Turing’s solution a greater conceptual leap than Church’s proof, but it was also a more intuitive and pragmatic idea than the heavy formal logic of Church’s approach, called lambda calculus. But enough beating around the bush, let us talk about the Turing machine.

The Turing machine and the Church–Turing thesis

Turing’s idea of how to formalise an algorithm in a mathematical framework was to design a machine that would be similar to the concepts proposed by Ada Lovelace and Charles Babbage, two more pioneering mathematicians and founders of the field of computer science. This was a mathematical construction composed of an infinite tape, a box (normally called a head) that could read what’s written on the tape and write new things onto the tape, a motor to move the head left or right, a set of numbers/symbols to read, and a processor full of instructions. The idea is that the input for the algorithm is written on the tape, the Turing machine then reads this and performs a sequence of simple actions dictated by the processor, perhaps using the tape to temporarily store information, and then writes the output of the algorithm on the tape. We’ll look at an explicit example of a Turing machine working in a minute, but first all the components of a Turing machine fit and work together like so:

  • The processor can have (finitely) many different configurations of its internal settings: we shall call these states.
  • The Turing machine itself doesn’t have any memory. But we can encode what it previously read by switching into one of many different states.
  • The Turing machine is fully deterministic and well-defined. Given the current state and the symbol the head is reading, we know exactly what the machine will do.
  • In a given state, the Turing machine can do any combination of the following: read the tape, write to the tape, move the head left or right, and (depending on what was read from the tape) change the state that the processor is in.
  • There has to be at least one state in the processor that will stop the Turing machine and signal the end of the computation: this is usually called a halt state for the Turing machine. This is typically done when the head has printed the output of the algorithm.

A schematic representation of a Turing machine.

Turing’s great insight was not only the simplicity of his construction but the notion that perhaps any form of computation could be represented as a Turing machine. This was later expanded upon during his study under Church for his PhD, leading to a groundbreaking theory which is now the cornerstone of computer science and a branch of logic and mathematics called computability theory. The Church–Turing thesis states that any computable function (a function computable by a machine or any formal algorithm that solves a problem) can be translated into an equivalent computation model involving a Turing machine. There is still some ambiguity about what exactly a computable function is and if every possible computation is represented as a computable function. But as these questions are still being discussed by academics all over the world, we will sidestep this thorny issue.

The Church–Turing thesis quickly leads to the conclusion that since our brain can compute mathematics and all algorithms that have been developed thus far can be computed by our brains, every algorithm ever computed is equivalent to a Turing machine! Because of this fact, the modern definition of an algorithm is based around Turing machines.

Now that we have a mathematical definition of a general algorithm, can we answer Hilbert’s question? Yes, we can, but first let’s get a little more comfortable with Turing machines.

Example Turing machine: how to build a dam in three easy steps

Like most things in mathematics, the best way to get some familiarity with an idea is to see a worked through example. The toy machine I will use to illustrate Turing machines is one from a rather famous family of Turing machines created by Tibor Radó called busy beavers s (Radó also
discovered a general solution to Plateau’s problem, see The birth of the Fields medal)
. They are very simple machines that are fed an infinite tape full of 0s and their aim in life is to write as many 1s as possible on the tape; but it can’t just print 1s forever, it has to stop eventually. The example we will look at is the 3-state busy beaver.

The processor for a 3-state busy beaver, which terminates as soon as the head reads a 1 while the processor is in state A.

Starting in state A, the Turing machine reads a 0, so it erases this and writes a 1 in its place. Then it moves the head so that it is positioned over the next symbol, and the processor transitions into state B. The next symbol the head reads is again 0, but this time because it’s in a different state the Turing machine leaves this 0 as it is, moves the head again, and transitions into state C. The machine just keeps repeating this process until it enters its halt state.

This 3-state busy beaver halts after 14 steps and writes six 1s on the tape. You can’t do more without more states. The coloured squares show the head position and the state of the processor.

Although we don’t see it here, where the Turing machine derives its main source of power is its ability to switch state depending on the information on the tape and encoded information from its previous state. If you are familiar with a bit of programming, this is very similar to jumping to a memory location in assembly language or conditional statements and loops that are found in higher-level languages like Python. If you want to write one out for yourself in full—I recommend you don’t. Even a simplified binary number checker takes more than a page of explanation of explicit states and even longer if you wanted to show it all working—I know from painful experience. But if you’re feeling very proactive, try to play out the beaver from above: you could use it on a row of coins to represent the tape with $\text{tails}=0$ and $\text{heads}=1$. Extra credit for making a 4-state beaver.

The Turing machine and the decision problem or: how I learned to stop worrying and love the halting problem

Now, finally, back to Hilbert’s Entscheidungsproblem. How did Turing solve it? Turing’s own paper and full technical proof isn’t too hard to read or understand. I will, however, provide a truncated version of the proof and try to provide an alternate method while keeping the core intuition.

In the paper, Turing proposes a method to encode a Turing machine, the states, and inputted symbols, into what he coins a description number. Similar to what we do to with text and Unicode—uniquely representing some text with a binary number. This description number describes the machine, and therefore the computation that it represents, completely! He then asks the obvious next question.

Question: Can one compute the description number of a Turing machine from a yes/no statement in formal logic?

If we could, then we would show that any yes/no statement has at least one corresponding Turing machine (which Turing showed was equivalent to a computable algorithm) and so the answer to Hilbert’s question would be yes! Unfortunately, this is not the case; so, the answer to Hilbert’s question is a resounding no.

Turing was able to show this by using an idea from another Titan in the world of mathematics and logic, Kurt Gödel. Gödel is famous for a lot of important results from the fields of logic, mathematics, and philosophy, and is an interesting person in his own right. However, the key Turing used—and therefore the thing we will be looking at—is Gödel’s incompleteness theorem. This is a wonderfully mind-bending theorem that deserves its own article, and I highly recommend you go read into it (try Gödel’s Proof by E Nagel and JR Newman), but the important piece of logic that Turing adapted was the idea of using something being self-referential to reach a contradiction. We can see this in the paradox I mentioned at the start: ‘Is the sentence “This sentence is false” true?’. Gödel’s idea was very similar, but to formalise it in the realm of formal logic was his great achievement.

To use this insight, Turing proposed a Turing machine, and therefore a computable algorithm, that I’m sure anyone who has ever written code wishes they had in their back pocket: a Turing machine to see if another Turing machine, with a given input, will halt. Or in other words, a computable algorithm which can tell you whether an algorithm, together with an input, would eventually stop and give you an answer, or whether it would just keep cranking through calculations forever.

As an aside, an algorithm that could do this would not only be a boon for any budding programmer wishing to debug their code—but also an incredibly useful tool in pure mathematics. You could solve Goldbach’s conjecture by applying this mystical Turing machine to a simple program to search for the first counterexample! (What would the program look like?) Unfortunately, like most things that are too good to be true, this doesn’t exist. Let’s see why!

The halt-deducing machine $H$.

Let’s suppose we have this amazing halt-deducing Turing machine and call it $H$ (for $H$alt). If Hilbert’s decision problem has a positive answer, this machine will exist—as asking if a Turing machine halts can be written in formal logic—so let’s not get bogged down in the details of how it would hypothetically work. But, on a high level, if we give $H$ a description of a Turing machine and its input, $H$ will either halt and print ‘halts’ onto the tape if the inputted Turing machine halts: or it will halt and print ‘loop’ onto the tape if the inputted Turing machine loops forever. In particular, note that $H$ itself will always halt.

We can then modify this machine a little bit. To $H$ we attach an inverter which will flip its output: if $H$ outputs ‘halts’, the inverter will go into a loop and not halt; if $H$ outputs ‘loop’, the inverter will halt and output a picture of a scorpion. As mentioned before, $H$ needs two inputs: the Turing machine itself and its input. We wish to check if $H$ terminates when we apply it to itself, so we need to put $H$ into both of its input slots. To do this, we can attach a device to copy the input into it, which we can call a cloner. We can call this new Turing machine $H^*$:

The modified Turing machine, $H^*$.

Now that we have $H^*$, and the idea from Gödel of self-application, all we need to do is input $H^*$ into itself. When we input $H^*$ we have two possibilities. If $H^*$ does halt, then the inverter means that $H^*$ doesn’t halt. But if $H^*$ doesn’t halt, then $H^*$ does halt. We have reached a contradiction!

But the only thing we assumed is that $H$, and therefore $H^*$, exists. But as the existence of $H$ leads to a contradiction, we must conclude that a Turing machine like $H$ cannot exist. However, as a Turing machine is a mathematical object and asking if it will halt can be phrased in terms of formal logic, a yes to Hilbert’s question would mean that $H$ does exist! Therefore Hilbert’s question has negative answer: not all yes/no questions can be answered by an algorithm.

This argument is the same as the one Turing presented in his original paper, albeit simplified: there he shows that the description number of $H$ and $H^*$ can’t be calculated as the computation would go on forever. This is what we call an uncomputable number as no finite algorithm could ever compute it.

Typically, the numbers we are use every day ($\pi$, $\mathrm{e}$, $5$, $3/4$, etc) only require a finite amount of information to represent them. This can be the number itself if it is an integer, or a quotient of integers, but some numbers with infinite non-repeating decimal expansions can also be represented with a finite amount of information, like $\pi$ and $\mathrm{e}$. This is because $\pi$ and $\mathrm{e}$ both have finite generating formulae that can produce every digit. In other words, if you give me a position in $\pi$ or $\mathrm{e}$, like the 1737th position, the formula can find the 1737th digit of the number after a finite number of steps. This isn’t always the case: if I have an infinitely long number whose digits are random, you wouldn’t expect to be able to compute its decimal expansion using a finite algorithm. These numbers are uncomputable as they cannot be computed or represented in any finite sense.

The legacy of Turing and his machines

The legacy of the Turing machine and the ideas Turing put forward during his lifetime are hard to quantify. It is always hard to know what the world would be like without certain people and their ideas in it. Just limiting Turing’s body of work down to his idea of the Turing machine, we can imagine that the world would perhaps be almost unrecognisable. The concept of computers and their programs that were pioneered by Ada Lovelace and Charles Babbage had been around for about 93 years before Turing’s own paper on the matter. Despite Church developing his lambda calculus to solve Hilbert’s problem, it is quite difficult to imagine whether the modern computer would be around today, and if so, what it would look like. Perhaps the best view on the importance of Turing’s work comes from a sentiment put forward by John von Neumann, another intellectual juggernaut of the 20th century, to Stan Frankel. Frankel, writing in a letter to Brian Randell, says,

“… [Von Neumann] firmly emphasised to me, and to others I am sure, that the fundamental conception is owing to Turing—insofar as not anticipated by Babbage, Lovelace and others.”

This was in reference to Von Neumann architecture, the modern framework of the stored program computer, which almost every computing device in the world, past 1945, can draw its direct heritage from. That, and the wider body of work credited to Turing, is why we call him the father of theoretical computer science. There is much more to talk about relating to Turing and his work, but as I said at the beginning, this article must end.

post

Fun with Markov chains

Probability has found itself a comfortable position amongst industry mathematics. And now, with the advancement of the computer age, a 100-year-old piece of maths has jumped to the forefront, namely Markov chains. These can be used for an awe-inspiring range of ideas including predicting the weather, mapping the probability distribution of a Monopoly properties, and even Donald Trump tweet generation. But what exactly are Markov chains, and why are they so unique?

Understanding Markov chains

Let’s say we are having a very nice day and are happy as a result, and that for some reason unknown to us, we only posses the emotional capacity to either be happy or sad. Can we predict what our feelings will be tomorrow? We know that our feelings tomorrow will be dependent on our feelings today; that is, a probabilistic function where the input is today’s mood and only today’s mood. Or in other words, a Markov chain!

So how do we do this? Let a happy day and a sad day each be states. We will represent these as nodes on a graph. Now we need some probabilities for the changing of states. Suppose we consider ourselves optimistic people: let’s say there’s a 5/6 chance that if we are happy today, we will be happy tomorrow, and a 1/2 chance if we are sad today, we will be happy tomorrow. Since something must happen tomorrow (all apocalyptic events excluded), then the sums of the probabilities leaving each state must add to 1.

♪ Because I’m happy with a 5/6 chance tomorrow if I’m feeling happy today… ♪ Image: Frank Schwichtenberg, CC BY-SA 4.0

This is an example of a two-state Markov chain, and while simple, it presents a useful visual aid to help our understanding. Much more formal and rigorous definitions can be found online, but in a nutshell, a Markov chain consists of a set of states where the probability of transitioning to any state is solely dependent on the current state. This dependence is called the Markov property and is what makes this neat piece of maths unique.

We can even use this Markov chain to predict how many happy days we’re going to have over the next week. One way we can do this is by following the chain by hand. Take a fair six-sided die and put a finger on the happy state: let’s say today was a good day. Now roll the die and if 1–5 is rolled, follow the looped arrow back to the happy state, else follow the arrow to the sad state. If we were unfortunate enough to find ourselves on the sad state, let even rolls—2, 4 and 6—send us down the arrow to the happy state; and odd—1, 3 and 5—loop back round to the sad state. If we make seven rolls where each roll is a new day, our finger will conclude on the prediction for how we’ll be feeling a week today. If we note the state after each roll, we’ll have predictions for each of the seven days. Now we can plan our week accordingly!

What about in a month then? Or a year? Following this diagram for 365 days, the rolling becomes a bit tedious. Instead, we can speed this up by taking ideas from linear algebra, and creating a transition matrix. Let each column of the matrix represent the current state and the rows represent the probabilities of transitioning to each state. For this example, the transition matrix is given by where $H$ = happy and $S$ = sad. We can use this representation to make a prediction for any number of days in the future. It should be noted that this could equally work with the columns and rows swapped, but as long as we’re conscious of which way we’re doing it, it simply becomes context. We can check whether the matrix has been created properly; the columns (in this case) should add to 1. If they don’t, something has gone horribly wrong! Say we wanted a prediction for two days’ time. We first raise this transition matrix to the power of 2, giving the following calculation:

\[ \begin{pmatrix}{\frac56}&{\frac12}\\ {\frac16}&{\frac12} \end{pmatrix}^2 = \begin{pmatrix} {\frac{5}{6} \times \frac{5}{6} + \frac{1}{2} \times \frac{1}{6}}&{\frac{5}{6} \times \frac{1}{2} + \frac{1}{2} \times \frac{1}{2}}\\ {\frac{1}{6} \times \frac{5}{6} + \frac{1}{2} \times \frac{1}{6}}&{\frac{1}{6} \times \frac{1}{2} + \frac{1}{2} \times \frac{1}{2}} \end{pmatrix} = \begin{pmatrix} {\frac79}&{\frac23}\\ {\frac29}&{\frac13} \end{pmatrix}. \]

Let’s see what is going on here by looking at the first entry of the new matrix. Like before, this is the probability of starting on happy and finishing back on happy, but this time for the day after tomorrow. If we relate this back to the graph, we can see that each term is a different possible path of probabilities (try saying that five times as fast) for exactly two days; $(5/6) \times (5/6)$ is happy tomorrow then happy the day after, and $(1/6)\times (1/2)$ is sad tomorrow and back to happy the day after. We then just choose a starting state. Like before, let’s be happy today: we represent this as a vector $\left(\begin{smallmatrix} 1 \\[2pt] 0 \end{smallmatrix}\right)$. We then do the following calculation with our matrix we just raised to the power of 2:

\[ \begin{pmatrix}{\frac79}&{\frac23}\\{\frac29}&{\frac13} \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} {\frac79}\\ {\frac29} \end{pmatrix}. \]

In return, we will get this column vector where the first entry is the probability of being on the happy state in exactly two days, and the second entry is being on the sad state. The cool thing is that we can put this matrix to any power $n$, which will give us a set of predictions for the $n$th day from today. Now, we’re mathematicians, so we need to do what any good mathematician should do and ask what happens when $n$ tends to infinity. This is something called the long run average probability and can be used to predict the behaviour of the Markov chain for some arbitrary but significant time in the future. This can be found analytically by taking some more ideas from linear algebra including a process called matrix diagonalisation, which would call for the use of more advanced mathematics. So instead, with a little rough-around-the-edges Python code (see the bottom of this article) we can get a pretty good idea. As $n$ gets very large, it is seen that the happy state tends to $0.75$, or $3/4$, and the sad state to $0.25$, or $1/4$. Therefore, given a significant amount of time in the future, this Markov chain is predicting a 75% chance we’ll be happy: not bad, all things considered.

We should, of course, acknowledge the limitations of this as a model. There are plenty of emotions not taken into consideration, nor whether a day holds special significance, eg a birthday. However, it’s easy to imagine how this can be expanded to accommodate for these other ideas with bigger graphs and larger matrices. The below graph shows how we might add a third emotion, for example boredom. Assign the probabilities based on whatever data we may be using, then fill out the graph accordingly.

We must be careful, however! When adding the new probabilities, it’s vital we ensure the sum of all arrows exiting a state still sum to exactly $1$. In addition, we are after all working with probabilities, and as such should be aware that a low chance does not mean impossible, and a high chance does not mean inevitable. This is just a model, and real life is rarely so organised.

What can be done with them?

Markov chains are used in a wide array of industry-based mathematics. But, with the rise of the technological age, these more light-hearted, fun and fascinating applications have emerged. A very interesting such use of these can be text prediction; quite a simple idea but we can have a lot of fun with it! If we take a few short sentences, say

“I love Markov chains.”
“Markov chains love me.”
“Markov is I.”

and we wanted to generate new original sentences from these, how could we do it? Representing each word as a state, we can create a Markov chain. The probabilities between states are then intuitively derived from the frequency of words following each other within these sentences. Take the word ‘Markov’. In these short sentence, the word ‘Markov’ is followed by the word ‘chains’ or ‘is’. However, if we look at the frequencies of these words, we have that ‘chains’ follows ‘Markov’ $2/3$ of the time, and ‘is’ for $1/3$. Do this analysis for every word and we have all the states and probabilities we need. Hence, like before, we can draw this up.

Once again, the probabilities allow for this to be played with using a fair six-sided die, with which we can generate some new sentences:

“I love Markov is I love Markov chains love me.”
“Markov chains love Markov chains.”
“I.”

Notice that not every pair of states has a pair of probabilities between them. This is because some states have a zero chance of following the current word. For example, in none of the three short sentences does ‘me’ directly follow from ‘chains’, leading to the zero chance of ‘me’ following ‘chains’ in the Markov chain. We could include these arrows on our graph, annotating them with the zero chance, however this would lead to an unnecessary clutter, especially since these extra arrows would be inconsequential for our purposes anyway.

Fake Donald Trump tweets generated by a Markov chain built by Filip Hráček. Image: Reproduced with permission from Filip Hráček

There’s an amusing example of this method in Hannah Fry and Thomas Oléron Evans’s The Indisputable Existence of Santa Claus where they use this method to generate new Queen’s speeches with interesting results. Similarly, the sentences we’ve generated aren’t really sentences despite how much hilarity ensues, but imagine if we had a much larger data set\dots say someone’s Twitter. Twitter’s public availability lends itself as a large library of data containing peoples’ thoughts and opinions that we can use. One such person is 45th president of the United States, Donald Trump, who often likes to express his opinions publicly with questionable legitimacy, although here, this can work in our favour. Filip Hráček is a programmer from the Czech Republic who, as an experiment, created a Donald Trump tweet generator using Markov chains.

The probabilities become a lot more reliable when we introduce this large data set, causing the system to generate more coherent sentences. They’re still not perfect, but as was mentioned, this data coming from Donald Trump works in our favour. By setting each word in the tweet to a state, like before, the probabilities become more fine-tuned with the vast amount of data at hand. This model also employs something called an order-2 Markov chain. What this means is that instead of the next state being solely dependent on the previous, the model takes into account two states, or in the case the words, for making the decision on the next. A Reddit user under the handle Deimorz took this one step further and created a subreddit entirely populated with posts and comments generated by bot accounts using Markov chains. We can’t interact with them, but it’s unnerving to see the conversations they have relating to some topic unknown to us, especially since the bots are seemingly fully invested. There are quite a number of diverse experiments such as these online which use Markov chains, and which you can play with. Curiously, on the more serious side of application, this is the very primitive idea behind text prediction on mobile phones.

Andrey Andreyevich Markov, a Russian mathematician born in 1856, was the creator of this piece of mathematics in his work in probability theory. They have since been used in applied mathematics spanning many fields. Uses of Markov chains appear in computer science, business statistics, mathematical biology, contributing to Google’s PageRank, predictive text on mobile phones, and plenty more. What’s curious is that Markov developed this process for letter analysis in his native language of Russian: given a letter, calculating what is likely to be the next one. His motivation was based in probability and statistics and idea called the law of large numbers. For further reading, I’d recommend a research paper, The five greatest applications of Markov chains by Philipp von Hilgers and Amy N Lanville, which explains some of the most revolutionary uses of Markov chains in history, including going more in depth into Andrey Andreyevich Markov’s own application. It talks about historical uses in information theory, computer performance, and was one of the most interesting reads I found when conducting research for this article.

Now it’s your turn! Get out some dice and play with these examples, or even come up with your own. There is so much more to this neat piece of mathematics worth exploring, far too much to squeeze into this article. So I implore you to get reading, and start creating!

Python code for predicting mood

Some simple but functional code to find the probabilities for 52 weeks:

import numpy as np
A = np.array([[5 / 6, 1 / 2], [1 / 6, 1 / 2]])
B = []
for x in range(0, 365, 7):
    b = np.linalg.matrix power(A, x)
    start = np.array([[1], [0]])
    c = np.dot(b,start)
    B.append(x / 7)
    B.append(c)
print(B)
post

Oπnions: Should I share my code?

Six years ago, in September 2014, I started working on my PhD. By Christmas, I was doing calculations using code that would’ve taken the average PhD student three to four years to write. But this wasn’t down to my own programming skills: this was thanks to my supervisor and his previous students deciding to work on open source software.

Open source software is software whose developers release the source code that makes up the software, rather than just releasing a binary or application that the user can run. Almost always, open source software is free to use and adapt. Continue reading

post

Counting Countdowns

Rachel Riley puts the last of the six numbers on the rack and presses the button to generate a random target. The host—whoever the host is now, I haven’t really watched Countdown since Richard Whiteley died—says a few words and starts a 30-second timer. And the contestant thinks, ‘Now I need to combine those numbers to make the target. I know! I’ll just check all of the possible calculations with these six numbers! I wonder how many there are.’

The aim of the game is to combine the six given numbers using the four basic arithmetic operations ($+, -, \times$ and $\div$) to reach a target number. You do not need to use all of the numbers, but you may not use any number more often than it appears on the board. Also, you’re inexplicably not allowed to use fractions at any point.

For example, given the numbers in the big picture above, with the displayed target of 333, you might solve it as \[ ((50-4) \times 7)+5 + 6, \quad \text{or as} \quad ((50-6)\times 7) + 25,\] or several other acceptable ways.

TL;DR: including the trivial answer of 0, there are 974,861 distinct answers that can be reached by applying the four basic operations to combine six numbers, if you ignore (a) coincidences and (b) Countdown’s fraction-phobia.

Continue reading

post

Chaotic scattering: uncertainty and fractals from reflections

Any Chalkdust reader who has spent time playing billiards, snooker, or pool, will probably have some kind of feel for how the balls move around the table, how they bounce off each other, and how they bounce off the cushions. After a while, you just know instinctively how they should behave, at least in very simple situations. Reading this article is unlikely to improve your potting statistics; writing it has certainly not improved mine.

Specular reflection of a ray
hitting the edge of a disc.

We can start by shrinking the cue ball to the size of a point, and imagining the object ball being squashed into a hard-edged circular disc that is so heavy that it does not recoil in a collision. The path of the incoming cue ball can be represented by a ray. Rays always follow straight lines between bounces, mimicking a ball rolling across a perfectly smooth horizontal table. Reflection of the incident ray by the disc is shown on the right. Draw a line from the disc’s centre through the point where the ray hits the circumference. That line is the normal, and the law of specular reflection tells us that angle out equals angle in, or in symbols, $\theta_\text{ref} = \theta_\text{inc}$.

To make things a bit more interesting, place identical discs at the corners of an equilateral triangle. That setup was first analysed in detail around three decades ago by physicist Pierre Gaspard and chemist Stuart Rice. Scientifically, their groundbreaking paper provided a paradigm for chaotic scattering in a plane but we can think of it as just an abstract game of pinball. The three-disc system turns out to be really useful in applied mathematics for understanding billiard-type problems, while in physics it crops up in fields from laser optics to the kinetic theory of gases.

Formulating the problem

The three-disc arrangement is shown below. Each disc has a radius of 1 unit, the size of the gap between them is $\ell$, and the shaded area $\Omega$ is called the scattering region. Any incoming ray is defined by an incidence angle $\theta_0$ and a displacement $x_0$. We can select whatever values we like for $\theta_0$ and $x_0$, and will refer to the pair of numbers $(x_0,\theta_0)$ as the input.

An incident ray can generally end up doing only one of four things, which are the allowed outputs of the system. If the ray enters $\Omega$, it ricochets between the discs before eventually escaping through one of the three gaps. The fourth possible outcome is that the ray is reflected away early on and so never enters $\Omega$ at all. Sometimes, a ray may also remain trapped inside $\Omega$ forever! But that situation is so incredibly unlikely that we can discard it here. Our question to try and answer is: for a given input, what will the ray do?

The three-disc system with the gaps labelled 1 to 3 and colour-coded, with the red cross showing the origin of $(x,y)$ coordinates. The incident ray has initial position $x_0$ and angle $\theta_0$ (positive angles are measured in the clockwise sense).

The butterfly effect

The zig-zagging path of any ray (its trajectory) can be computed using straight lines and applying the law of specular reflection every time it hits a disc. Crucially, our scattering problem is deterministic because it is governed by mathematical rules in which there is absolutely no randomness whatsoever.

Given an input, classical physics demands that there must exist a unique output. Seems reasonable and obvious. What is not so reasonable and obvious is that a tiny change to the input can lead to a dramatic change in the output. This strange phenomenon—where a system can be susceptible to minuscule fluctuations—is known technically as sensitive dependence on initial conditions. The term ‘butterfly effect’ is perhaps more widely known, coined by mathematician and meteorologist Edward Lorenz in the early 1970s as a nod to the unpredictability he discovered in a toy model of the weather. It purports, only half-jokingly, that a butterfly flapping its wings in Brazil can set off tornadoes in Texas.

The butterfly effect appears in our scattering problem when we have, for example, two incoming rays starting from the same $x_0$ value but with slightly different $\theta_0$ values. The difference between their paths may become magnified through successive bounces, building up remarkably quickly until the two trajectories have diverged and no longer bear any resemblance to one another. Ultimately, we may even find that the two almost (but not quite!) identical inputs lead to completely different outputs.

A demonstration of the butterfly effect in the three-disc system. On the right, $\theta_0$ has been increased by 0.001° compared to the left.

Scattering that exhibits the butterfly effect is said to be chaotic. In common parlance, ‘chaotic’ is often used to convey disorder or perhaps even (perceived) randomness. Here, we are deploying the word in a scientific sense. While it might sometimes look random, the three-disc system cannot do anything except behave deterministically. It also hides some very intricate and very ordered structure that can be seen if we look in the right place$\dots$

Exit basins and their properties

Ideally, we would like to relate a whole range of inputs to their corresponding outputs in one go. A nice way to do that is to think of $x_0$ and $\theta_0$ as labelling the horizontal and vertical axes, respectively, in a plane (just like the $x$–$y$ axes in coordinate geometry). Let us impose a square grid on a section of that $(x_0,\theta_0)$ plane, where all the points represent different initial conditions. For each point in turn, the outcome of the computation is recorded. The collection of outputs is then overlaid on top of the grid, and colour-coded according to our three-disc system in figure 1 to create a kind of map.

Figure 2: A demonstration of the butterfly effect in the three-disc system. On the right, $\theta_0$ has been increased by 0.001° compared to the left.

An example is shown below when the gap between the discs is relatively small. The map answers our earlier question in a very direct and effective way. It tells us exactly what a ray does as a function of starting point $(x_0,\theta_0)$, and even cuts out all the unnecessary information about individual trajectories. But a single map is just the tip of a giant iceberg, and many more questions now follow.

Figure 3: Exit basins when the gap width gradually increases (top to bottom: $\ell = 0.07$, $0.09$, $0.11$, and $0.13$) and the plane of initial conditions $(x_0,\theta_0)$ in the left-hand column is $[-0.1,0.1] \times [-10°,10°]$.

We can see that the $(x_0,\theta_0)$ plane is divided into four coloured regions, known as exit basins, which identify a unique output for a given input—white for gap 1, red for gap 2, black for gap 3 (initial conditions giving rise to trajectories that never enter $\Omega$ are shown in grey). The boundaries of these basins form striated patterns that are intertwined in extremely complicated ways. An important feature of the map is its self-similarity, where zooming-in on any portion of a boundary region uncovers smaller-scale substructure which looks like the pattern as a whole. Self-similarity persists all the way down to arbitrarily-fine length scales, and that property is typically one of the signatures of a mathematical fractal.

It has been known since the mid 1990s that the basins can also possess the infinitely-mindbending Wada property which, in visual terms, means the boundary between two colours never quite forms. Amazingly, the remaining colours somehow always nestle in. Wada says that we can never jump from a black region to a red region without also jumping across a white region. Similar is true for other black-red-white permutations. More subtly, any jump over a boundary necessarily involves crossing all three colours an infinite number of times!

As an analogy to the Wada property, think about the points on a number line. Any number is either rational (expressible as a ratio of integers, like $1/2$ or $3/4$) or irrational ($\pi$ or $\sqrt{2}$ are the usual suspects). We cannot jump from $1/2$ to $3/4$ without necessarily jumping over all the numbers in between, both the (countably-infinite) rationals and the (uncountably-infinite) irrationals. The same idea would also hold if we were to jump between any two irrational numbers instead.

It turns out that the basins vary with the gap width $\ell$ (see below). For instance, the map takes up more of the $(x_0,\theta_0)$ plane as $\ell$ increases. That result makes physical sense: we would expect the scattering region to be accessible from a wider range of initial conditions as the gaps become bigger. We can also see, qualitatively, that the pattern complexity reduces as $\ell$ increases since the density of the striations (loosely speaking, the number of stripes per unit area of boundary) drops off as the gaps expand.

The map looks the same when rotated $180^\circ$ with black and white swapped.

The striations in the maps look quite linear, especially when magnified. However, some curving can be seen if we look over a wider range of the $(x_0,\theta_0)$ plane than shown here, particularly when the gap size increases.

There is also a fundamental physical principle determining how the stripes must fit together (perhaps you spotted it earlier). Look on the right: if the black and white areas are swapped over, the map still looks the same! This is a consequence of the fact that there is no difference between gaps 1 and 3. Our three-disc system has a line of mirror symmetry along the $y$-axis, and so the trajectories from $(-x_0,-\theta_0)$ and $(x_0,\theta_0)$ are necessarily mirror images of each other.

Uncertainty and unpredictability

So far, we have found that the butterfly effect can play a key role in scattering, and that basins become less finely-structured as the gaps increase in size. Far from being independent, these attributes are two sides of the same coin. One way to establish their connection is through the concept of uncertainty, thinking more carefully about what it means for a deterministic system to have an uncertain outcome. How can that apparent contradiction be allowed by physics?!

Let us start with an arbitrary initial condition, say $\boldsymbol{x}_0 \equiv (x_0,\theta_0)$, and from that point we can define two nearby initial conditions which are $\boldsymbol{x}_{0+\varepsilon} \equiv (x_0+\varepsilon,\theta_0)$ and $\boldsymbol{x}_{0-\varepsilon} = (x_0-\varepsilon,\theta_0)$. The small positive number $\varepsilon$ satisfies the inequality $\varepsilon \ll 1$. Equally, we could consider $(x_0,\theta_0+\varepsilon)$ and $(x_0,\theta_0-\varepsilon)$; it makes no difference to what follows. If the trajectories from all three starting points lead to the same outcome, then the input $\boldsymbol{x}_0$ is not susceptible to the butterfly effect when disturbances are of size $\varepsilon$.

We can quantify the uncertainty in a fixed region of the $(x_0,\theta_0)$ plane. Select at random a large number $N$ of $\boldsymbol{x}_0$ points within that region, and let $N_\varepsilon$ be the number of those points which exhibit the butterfly effect at the scale $\varepsilon$. Then, the fraction of our initial conditions with uncertain outcomes is given by the power law ${f_\varepsilon \equiv N_\varepsilon/N \sim \varepsilon^{2-D}}$. The pure number $D$ is the uncertainty fractal dimension. Smooth boundaries are associated with $D = 1$, a value that coincides exactly with the dimension of a typical line in Euclidean geometry (where points are 0D, lines are 1D, areas are 2D, and volumes are 3D). Accordingly, $f_\varepsilon$ scales with $\varepsilon^{2-1}=\varepsilon$, so that halving $\varepsilon$ halves the fraction of initial conditions with uncertain outcomes.

Fractal boundaries are those with $1 < D < 2$, where $D$ is non-integer and lies somewhere between the Euclidean dimensions of a line and an area. We can think of such boundaries as becoming increasingly rough and irregular (sometimes called area-filling) as $D$ approaches 2. In those cases, $f_\varepsilon$ scales with $\varepsilon$ to a power that drops towards zero as $D \rightarrow 2$. The crux of the matter is that for fractal boundaries, the proportion of uncertain points, $f_\varepsilon$, can fall off relatively slowly even for big reductions in $\varepsilon$.

Uncertain outcomes are associated with circle $\Sigma^\prime$ (which overlaps a basin boundary) but not with circle $\Sigma$.

An intuitive way to visualise unpredictability is to consider a small circle $\Sigma$ with radius $\varepsilon$ in the $(x_0,\theta_0)$ plane, as on the right. The centre of that circle of uncertainty is placed on the ‘ideal’ initial condition, $\boldsymbol{x}_0$. However, if we know $\boldsymbol{x}_0$ only to within an error $\varepsilon$, the ‘true’ initial condition may lie anywhere in the neighbourhood $\Sigma$ around $\boldsymbol{x}_0$. If $\Sigma$ contains only one colour, the outcome is independent of the finite accuracy. Uncertainty appears whenever $\Sigma$ impinges on a basin boundary, in which case all colours are contained and the outcome cannot be predicted. That is a recipe for a deterministic system to behave unpredictably, but in such a way that our faith in physics (thankfully) remains intact.

After a quick calculation, we find the following formula for relating $D$ to the way in which $f_\varepsilon$ varies across the scales: \[ D = 2 – \frac{\textrm{d}\;\textrm{log}_{10}f_\varepsilon}{\textrm{d}(\textrm{log}_{10}\varepsilon)}. \]

Computed log-log plots for the three-disc system using $N = 2^{20}$ initial conditions. For the five values of $\ell$ considered, straight-line fits have slopes corresponding to uncertainty dimensions $D \approx$ 1.92, 1.89, 1.86, 1.84, and 1.81, respectively.

The slope of the log-log graph is thus a crucial piece of information, and it needs to be obtained by curve fitting. A set of graphs is shown on the left for the centre panes of the exit basins in figures 2 and 3 when $\varepsilon$ varies across six decimal orders of scale. For $\ell = 0.05$, we find $D \approx 1.92$ while for $\ell = 0.13$, we find $D \approx 1.81$. The largest (smallest) uncertainty dimension occurs for arrangements with the narrowest (widest) gaps, and the conclusion we can rightly draw is that the smaller the gap, the greater the sensitivity to initial fluctuations.

But what does it all mean more generally? It means something quite profound: systems associated with larger values of $D$, irrespective of their physical nature, have more complicated basin boundaries. So in a sense, the fractal dimension becomes a convenient yardstick for comparing the susceptibility of different systems to the butterfly effect.

Concluding remarks

We’ve applied some very deep ideas—ideas that have come to play a pivotal role in modern understandings of physics—to a toy scattering problem. But the bigger picture to think about is that simplicity can very often be deceptive. Uncertain outcomes are present whenever finite precision in our knowledge of initial conditions becomes important, not just here but pretty much everywhere. Perhaps one of the most important scientific discoveries of the 20th century is that unpredictability, so beautifully encoded within fractal basin boundaries, is in the DNA of physical law.

If you’re still not convinced, or if you want to amaze people with all this stuff, take four silvered spheres such as Christmas tree decorations and a bit of Blu-Tack or glue (see How to make). Form a tetrahedron with the decorations, place it on top of a mirror, and surround it on three sides with three pieces of cardboard (each of a different colour). Now look into the gaps of the tetrahedron. What you’ll see is chaotic scattering in action, and an impressive fractal pattern made possible by the law of specular reflection. You can see an example in the banner image at the top, and read more about it in the article Christmas Chaos by Windell Oskay.

I always derive great pleasure from seeing simple systems behave in complicated ways (the greater the contrast, the better). And it is inevitably mathematics that allows us to unpack what is going on. The fact that even the most trivially-familiar laws of Nature can conspire to create such complexity should be a source of wonder and inspiration to us all—not just Chalkdust readers!

post

In conversation with Trachette Jackson

Michigan. Image: Wikimedia commons user Wapcaplet, CC BY-SA 3.0.

Oncology, the study of cancer, is just one of many specialisms which increasingly employs the predictive power of applied mathematics. This issue we chat with Trachette Jackson, professor of mathematics at the University of Michigan, to learn about the surprising effectiveness of mathematics in the treatment of cancer, as well as to hear about her own journey into mathematical oncology.

Modelling in medicine

Cancer cells. Image: Public domain.

Trachette starts by bringing us up to date on how mathematics has been used in cancer treatment. “The mathematical approach has been applied to just about every aspect of tumour growth, starting decades ago.” One aspect is cancer therapeutics: “We write down equations that describe the mechanism of action of new drugs and how the tumour responds, then make predictions about how best to deliver those drugs.” Another aspect is more fundamentally biological, such as how cells are transformed: “You can find mathematical equations about the probabilities of acquiring mutations and under what circumstances a tumour forms, as well as what the composition of that tumour will be and how many cells in that tumour will have these different mutations.” This sort of modelling allows us to diagnose or assess the risk of cancer developing, as well as treat it. Continue reading

post

How to cheat at cards

As a child I didn’t want to be a mathematician—I didn’t even know that was a job. Instead, for a short time, my dream was to be a crime-fighting card cheat. My interest arose from the TV series Sword of Justice where the main character obtained some help from card cheats in his quest to take down the despicable crime syndicate that had framed him and sent him to prison. (The opening credits on YouTube will attest to the programme’s brilliance.)

Despite the incompatibility of combining crime-fighting and card cheating I told my family that that was to be my chosen career. Unfortunately, I wasn’t very good at it, usually announcing my intentions to cheat before offering to play a quick game with someone. Nonetheless, to this day my family refuse to play cards with me—even Uno.

Along the way, I did learn a thing or two though. So allow me to explain to you an impressive trick that all the best card cheats know and that has some interesting mathematics hidden behind it.

The perfect shuffle

Every card cheat needs to be able to give the appearance of shuffling a deck of cards so that it remains in order. However, there is an entirely legitimate shuffle that allows us to cheat: the perfect shuffle. It mixes the cards but in a perfectly known and determined way.

A perfect (out) shuffle.

A standard deck has 52 cards. Split them into two equal piles and then interweave the piles so that the new order of cards alternates between the two piles.

The effect on the deck is shown on the right, where the cards are labelled 0 to 51 for reasons which will become clear later.

Instructions on how to do the shuffle are given below. Admittedly, it is not easy and plenty of practice is required but is satisfying once learned and has an impressive ‘wow!’ reaction when performed.

Eight perfect shuffles

What can we do with the perfect shuffle when we have learned it? First we have an interesting—and hopefully surprising—fact:

Perfect shuffle a deck eight times. It is now back in the order you started with!

Modular arithmetic is also used by clocks: they count the time mod 12.

With this you can perform a surprising trick. Put the deck in new deck order. If you perfect shuffle it five times, then the deck looks shuffled. There are patterns if one looks closely but you can show it to someone and with a straight face claim it is mixed up. Do the perfect shuffle three times for them—so now you have done it eight times in total—and to your spectator’s bafflement the deck is suddenly in order!

How can we use mathematics to prove this surprising fact? We start by numbering our cards. As above, from top to bottom, we shall number them 0 to 51, rather than the usual 1 to 52.

Next we use modular arithmetic. Arithmetic modulo $n$ is where we consider all numbers that have the same remainder after division by $n$ to be equivalent numbers. So $5=17 \mod 4$ means that 5 and 17 are equivalent modulo 4 as they both have remainder 1 after division by 4. Sometimes people write $5\equiv 17 \mod 4$ rather than using the equals sign.

With our 0 to 51 numbering the formula for finding where a card goes to after a perfect shuffle is relatively easy:

(Except for card 51) the card at position $x$ goes to position $2x \mod 51$.

For example, card 7 goes to $ 2 \times 7 \mod 51 = 14 \mod 51$, ie position 14. Note that this is the 15th physical card in the deck as we started counting at 0.

Similarly card 35 goes to \begin{align*} 2 \times 35 \mod 51 &= 70 \mod 51 \\ &= 51 + 19 \mod 51\\ &= 19. \end{align*} That is, card 35 goes to position 19.

Note that the card 7 gets lower in the deck and card 35 gets higher. So immediately we see that a perfect shuffle is beginning to mix the cards.

The formula has the exception of card 51. It is easy to see in the picture on the previous page that this card does not move and so we don’t have to worry about it in the formula—we will ignore it from now on.

But why does this $x\mapsto 2x \mod 51 $ result hold? Consider first the top half of the deck, ie all the cards in positions 0 to 25. During the shuffle, the two cards in a pair of consecutive cards get a card between them. So the new positions of the top half cards will be multiplied by 2. Since we can have at most $2\times 25=50$, which is less than 51, we get $x\mapsto 2x\mod 51$ since $2x$ is in this case actually equal to $2x\mod 51$ (rather than just congruent).

For the bottom half cards we see that their positions should also be multiplied by 2 since they get cards between them. However, we note that card 26 (the top card of the bottom half of the deck) needs to go to position 1 (the second physical card position!). One way to do this is to take mod 51 (since $2\times 26 \mod 51 = 1$). It is easy to check that we can use this for all the other cards.

Now let’s see what happens when we do eight perfect shuffles on a particular card, say card 12. We just repeat the process that the card in position $x$ will go to position $2x \mod 51$: \[ \begin{array}{lccrcr} {\text{once}} & 12 & \to & 24 \mod 51 & & \\ {\text{twice}} & 24 & \to & 48 \mod 51 & & \\ {\text{three times}} & 48 & \to & 96 \mod 51 & =& 45\mod 51 \\ {\text{four times}} & 45 & \to & 90 \mod 51 & =& 39\mod 51 \\ {\text{five times}} & 39 & \to & 78 \mod 51 & =& 27\mod 51 \\ {\text{six times}} & 27 & \to & 54 \mod 51 &=& 3 \mod 51 \\ {\text{seven times}} & 3 & \to & 6 \mod 51 & & \\ {\text{eight times}} & 6 & \to & 12 \mod 51 &=& 12 \mod 51. \end{array} \] Of course, that doesn’t prove that this happens for the whole deck. I may have been cheating by picking one that worked—and you should never trust a card cheat! So let’s do the calculation for a general $x$. If we repeat the $2x\mod 51$ operation we get $2(2x)\mod 51$. Continuing in this way for the full 8 times we see card $x$ goes to position $2^8x\mod 51$. Simplifying mod 51 we get \begin{align*} 2^8 x \mod 51 &= 256 x \mod 51 \\ &=\left( (5\times 51) x + x \right) \mod 51\\ &=\left( 0 + x \right) \mod 51\\ &=x. \end{align*}

Hence, we have proved that eight shuffles move the card in position $x$ back to position $x$. If we have a different number of cards in our deck, then the number of perfect shuffles needed to return the deck to its original order varies. For example, a deck with 50 cards needs 21 perfect shuffles. The number of shuffles required for the deck is called the order. The following table shows the order of the perfect shuffle for a selection of decks where $N$ denotes the number of cards:

$N$ order
4 2
6 4
8 3
10 6
12 10
14 12
16 4
18 8
50 21
52 8
54 52

The word order arises due to a connection with group theory and the order of elements of a certain group. The details of this and many more hardcore mathematical facts can be found in Magic Tricks, Card Shuffling and Dynamic Computer Memories by S Brent Morris: one of its appendices is a list of all orders for $N$ up to 200.

Binary fun with in and out perfect shuffles

Up to this point I have talked of a perfect shuffle, ie singular. In fact, there are two types:

  • A perfect out shuffle: the top card remains the top card.
  • A perfect in shuffle: the top card becomes the second card.

A perfect in shuffle.

The perfect out shuffle is just what have been calling the perfect shuffle. The in shuffle is the new one. Its effect on deck order can be seen below—the top card ends up ‘in’ the deck.

By combining out and in shuffles we can move the top card to anywhere in the deck. And, as we shall see, from a mathematical perspective we get a surprising appearance of binary.

Alex Elmsley (1929–2006) was a mathematician and magician. He studied physics and mathematics at the University of Cambridge before becoming interested in computers. Among magicians he is known for inventing a convincing false count of cards which is now known as the Elmsley Count.

Elmsley was very interested in the perfect shuffle (which among magicians is also known as the Faro shuffle) and investigated the mathematics behind it. In particular, he found a method for moving the top card to any position in the deck through a sequence of in and out shuffles.

Let us suppose we wish to move the top card to position $k$ in the deck. (We keep the numbering of the cards from 0 to 51 so the top card is in position 0.) To move the card via a sequence of in and out shuffles we proceed as follows:

Translate $k$ to binary. Then do out and in shuffles where $\text{out} = 0$ and $\text{in} = 1$.

Note how brilliant it is that ‘o’ looks like 0 and ‘i’ looks like 1.

For example, suppose we want to move the top card to position 13. Well, 13 is written as 1101 in binary. So we do ‘i i o i’: in in out in.

Moving the top card to position 13.

So why does this work? Imagine we have a card in position $x$ in the top half of the deck. (The case of the bottom half is similar and is, of course, left as an exercise.) What does an out shuffle do to the binary representation of the number? Since an out shuffle is $x\mapsto 2x \mod 51$ it just adds a zero to the representation. So, a card at position 101 moves to 1010. This is no different to when we tell children that to multiply by 10 they can just place a 0 at the end.

Now, an in shuffle is given by \[ x\mapsto 2x +1 \mod 52 . \] That is, multiply by 2, add 1 and take mod 52 (note, not 51 this time). This can be checked in the same way as for the out shuffle. For a binary representation this will just add a 1 to the end. Thus, the card in position 101 moves to position 1011.

To put this all together, consider what happens in the above example where we move the top card to position 13. That is, we apply in in out in and track the progress of the initial top card. The first in shuffle moves the top card to position 1 in the deck. The second in shuffle moves it to 11. The out moves it to 110 and the last in moves it to position 1101.

OK, so this is an unexpected and charming appearance of binary but it only moves one card. That’s useful, but let’s see how we can move more cards and cheat in a real game.

Dealing four aces

An impressive bit of cheating is to deal yourself four aces in a game of poker—that’s a hand that is hard to beat. For our version we need a game with three other players. Take out the four aces and place them on top. A card cheat can do this secretly with ease. Now, do two perfect out shuffles. Deal the cards starting with you, ie go round in a circle giving each person a card until everyone has five cards each. You now have a hand with the four aces!

The reason that this works is simple. At the start, the pack (starting at the top) looks like

AAAAxxxxxxx…

where A denotes an ace and x denotes an indifferent card (whose value is unimportant).

Two perfect out shuffles give you all the aces (green).

Do the first shuffle and indifferent cards get placed between the aces. So we get

AxAxAxAxxxxxxx…

Do the second shuffle and we get a set up ready for a good hand when there are three other players:

AxxxAxxxAxxxAxxxxxxx…

Well, of course if you are going to cheat and win all the time, people are going to notice. To dodge this professional card cheaters would often work in pairs and the winning would alternate between the two cheaters, reducing the chances of discovery. So in this poker game of four players you need an accomplice. Suppose you are sitting in a circle with you dealing first to the person on your left and then going clockwise.

AAAA!

Wherever your accomplice sits you will be able to deal them the four aces. As before arrange the four aces on top. Suppose your partner is sitting at position 3. You need to deal them the third card—the card at position 2 in our numbering. Let’s move that top ace to position 2. In binary the number 2 is 10 so we do ‘io’ perfect shuffles. Hey presto, the top ace is now in the right place and furthermore so are the other aces as the two shuffles put three cards between each of them. A normal deal now will give your partner-in-crime all four aces. Let’s see that in a diagram:

start: AAAAxxxxxxx…
in shuffle: xAxAxAxAxxxxx…
out shuffle: xxAxxxAxxxAxxxAxxxxx…

Wherever your partner sits you can do the right sequence of in and out shuffles to ensure that they receive the aces!

If you wish to know more about card shuffling and mathematics, then the best book to read is (as mentioned already) Magic Tricks, Card Shuffling and Dynamic Computer Memories by S Brent Morris. And remember, card cheating is a crime—so don’t get caught!

How to do the perfect shuffle

Splitting the deck. Image: C Houston

The perfect shuffle requires cards in good condition so buy a new deck on which to learn and don’t use it for anything else—cards buckled in card games will not weave very well. There are various methods to achieve a perfect shuffle. This version is done ‘in the hands’ but some can be done on the table and don’t look in any way suspicious.

Squaring the cards. Image: C Houston

Hold the lower half of the deck in your left hand with fingers arranged around as above. The right hand has a similar grip but does not go all the way round and the index finger sometimes contacts the top of the deck, sometimes not.

Butt the cards together to square them up. If the cards are not square, then perfect shuffling is impossible. Press the halves together, with the right index finger touching both top cards so that the halves can be put in position. For an out shuffle the right hand part should be slightly higher so that the left hand top card will go under the right hand top card at the end of the shuffle. For an in shuffle the right hand part needs to be lower.

A perfect shuffle. Image: C Houston

Remove the index finger and begin applying pressure at the bottom corner where the two packs meet. Once interweaving has started let the deck ‘do the work’. It took me many weeks to be able to do this consistently so don’t be discouraged if you can’t do it straight away. Like many skills it needs practice!