In conversation with Christina Pagel

This interview was conducted on 11 August 2020, and our discussion of the pandemic reflects this.

Given everything we read in the news during this pandemic, it is no longer a surprise to anyone that maths plays a crucial role in solving problems that affect our daily lives. This has been thrown into the spotlight recently, with mathematical modellers advising government policies across the world and statisticians holding the key to decoding the chaos of pandemic data; but it has been going on behind the scenes for quite some time. Operational research (OR) is the branch of applied maths dedicated to using maths to make better decisions, and it can be applied to almost any field. If that sounds vague, fret not, because we sat down with Christina Pagel, professor of operational research and director of the Clinical Operational Research Unit (CORU) at UCL, and a member of the Independent Sage (Scientific Advisory Group for Emergencies) committee, to clear up exactly what it entails.

“Operational research is a really applied branch of maths, and you can use any kind of maths, as long as you’re answering a real world problem.” But some maths is more typical of operational research than others. For example, queueing theory is the mathematical theory behind modelling queues and making them more efficient, ie deciding who gets served in what order. Another classic of OR is optimisation, which is choosing how to allocate resources given certain constraints and goals, such as minimising costs or maximising profit. “That’s used everywhere from transport, health care, emergencies… The travelling salesman is a really well-known optimisation problem—how do I visit these destinations in the shortest time possible?” Chalkdust would like to apologise to readers for any distress caused by the reminder of their decision maths A-level module.

To save you from digging out your old lecture notes: a Poisson distribution describes events that occur independently at a fixed rate, while an exponential distribution is the probability distribution of the time between Poisson events.

Data analysis is crucial too. “How do we use the data that people have to help them make decisions? That’s a big branch of operational research.” And of course, as with most branches of maths these days, simulations play a big role. “Say in queueing theory, it’s fine as long as you have Poisson arrivals and exponential service times, but once you get to real life and see that actually you have this funky algorithm for choosing who gets served and how long it takes, then you start having to use simulation because you just can’t solve it analytically.”

But the field is very problem-focused, and for Christina the maths is of only secondary interest to the questions themselves. “I’ve become less interested, as I become older and more senior, in the novelty or difficulty of the mathematics and much more interested in the problem.” And it shows—she has worked on more problems than there are maths puns in an issue of Chalkdust.

Paediatrics, politics, periods…

Great Ormond Street hospital, c~1872. Image: Welcome Collection CC BY 4.0

As director of CORU at UCL, Christina focuses on operational research applied to healthcare. She recently held a position as researcher in residence at Great Ormond Street hospital (GOSH), a children’s hospital in London, helping them solve problems like predicting how many beds they will need, or when the children’s respiratory disease peak will be. The peak is about a month and half earlier than the adult flu peak, and Christina built a model for GOSH to let them know when it begins. This is crucial to know because when it comes, “demand will double very quickly and they don’t have more capacity.”

This is only the tip of the iceberg in regards to all the problems healthcare needs mathematicians to solve. Another classic operational research problem that CORU has worked on recently is investigating the ideal placement of the UK’s 11 specialised ambulance services which transport sick children from local hospitals to paediatric intensive care hospitals, as well as the possibility of changing the number of ambulances at each location. “We also do simple models of vaccination programs for the [UK government’s] Department of Health and Social Care. If I’m introducing a new vaccine, what is the impact of other vaccines? How many times do I have to vaccinate? That involves a mixture of theoretical modelling and then data analysis. Sometimes we mix them together, like in queue models for how many health visitors you need to serve a certain community with certain needs, which is something we’re doing right now for instance.”

If you can’t articulate what you’re trying to do, then all of your solutions for getting there are meaningless.

For even further evidence of the infinite set of problems mathematicians are in demand to solve (as if we needed it), Christina tells me she began a fellowship in the US in 2016 to study their healthcare system, but unexpectedly found herself more useful in political science. “Within about two months of me getting there, Trump was elected. And it became really clear that he was going to try to repeal Obamacare. He failed, but I didn’t know that at the time.” She felt there was no point working to improve a health system that was about to be upheaved, but she saw politicians arguing about Obamacare and she realised that she had a unique perspective on how to understand their feuds.

“I thought, ‘Do we understand what the goal is in the situation?’ That’s a classic operational research point of view. If you can’t articulate what you’re trying to do, then all of your solutions for getting there are meaningless.” She devised a survey for politicians to understand what their goal was. The survey had thirteen possible goals that were developed with a focus group of serving politicians and academic health policy experts, such as ‘improve health’, ‘reduce costs’ and ‘reduce inequality’, and participants were asked to rank them on importance. Using a voting system, she was able to give the items an ‘overall’ ranking, and used a stats technique for plotting multidimensional priorities to see how people on different parts of the political spectrum felt about healthcare. “It wasn’t anything particularly sophisticated, but they just hadn’t ever done that!” What was common sense to Christina was a completely new way of looking at the problem to political scientists.

For Christina, applied maths goes far beyond merely applying maths. She’s an interdisciplinary science communicator able to turn her hand to everything from politics to physics to biology. Image: Reproduced with permission from Christina Pagel

The methodology was a triumph in and of itself—perhaps fortunately, since the more challenging task of showing that politicians agreed on the goals didn’t transpire quite as planned. “I thought everyone would say improving health is the most important thing. But actually, improving health was only most important for Democrats, and second most important was reducing inequalities and improving access to healthcare. Whereas, for Republicans, the most important thing was reducing costs, and the second most important thing was reducing the involvement of government in healthcare, which to me was really bizarre, but that was important for them. Improving health came fifth out of thirteen, and last was reducing inequality.” But even though they could not agree, the survey still clarified exactly why they couldn’t agree. “It’s really helped them understand how they can talk to each other. For instance, if you’re a Democrat, and you want to push a policy because it reduces inequality, to your Republican colleagues that’s not the angle you use, you have to explain how it reduces costs.”

She is now working on a project looking at women’s period pain. “It’s not really my expertise, but if no one else is going to do it, then I’ll do it.” She is working with the Health Foundation to look at GP records to quantify the problem, which she hopes will convince medical researchers to give the issue serious attention. “80% of women at some point in their lives suffer from really bad period pain, and about 20% have some years of their life where actually it’s debilitating for two or three days a month. People have just found a way to live with it, when you shouldn’t have to live with that—why should you have to live with that? So we’re now trying to take it further and make it into a bigger project.” Picking up a problem wherever she sees one to solve is rather a habit of hers it seems. Of course, healthcare has had one particularly big problem to solve recently.

…and pandemics

Well, we had to talk about it eventually.

Operational research has played a crucial role in managing the pandemic from the beginning, and Christina laments that even better use of it has not been made. “There are loads of places operational research could have helped [the UK government] to do better.” An obvious issue is distribution of PPE (personal protective equipment), but there are many examples. “For instance, 30% of people with Covid-19 in intensive care units (ICUs) had kidney failure, so the whole country ran short of renal medicine, and that had knock on effects on people receiving dialysis.” When medicine is in short supply like that, how should it be distributed and prioritised? “How do you decide how many ICU beds you need when you’re reorganising hospitals? How do you decide how many emergency hospitals you need, like the Nightingales? All of that is OR. Even things like oxygen supply—Covid leaves so many people on supplemental oxygen that hospitals were running short, so how do you manage that? Because if you run out, then everyone in the hospital who needs oxygen is screwed which you obviously don’t want.”

Christina has been playing her part as a member of Independent Sage—or, as she affectionately calls it, “indie Sage”—a group of scientists who produce independent advice on the UK’s handling of the pandemic, to challenge and analyse that given by the government’s official scientific advisory group, Sage. Although initially she was expecting to be doing operational research, it became more a public communication of science role. It turns out this is something she excels at. “Because I’ve been working across disciplines—clinicians, patients, people in the government, local commissioners—I’ve had to always try to explain things to lots of different types of people. That’s been really helpful in indie Sage, in that I’m not in a silo.” She now does weekly YouTube briefings (on the indie_SAGE channel) breaking down where we are at with the pandemic and collating government data from countries around the world, and reasonably regularly appears on TV and radio explaining the latest numbers.

I’ve been working across disciplines—clinicians, patients, people in the government, local commissioners—I’ve had to always try to explain things to lots of different types of people.

Independent Sage believes the UK government should be aiming to achieve elimination of Covid. “There’s a technical difference between elimination and eradication. Eradication is what we’re trying to do to polio and what we did to smallpox, but elimination is what New Zealand did, which is zero community transmission.” This would mean the virus can only enter the country via travellers, which Christina says could hopefully be handled with effective test and trace, and quarantine. “And once you’ve done that, you can go back to normal life! Masks, social distancing, you don’t have to worry about that stuff.” Critics of the strategy say it is simply unachievable. “But it’s not saying you’re never going to get a case. Small outbreaks are much easier to stamp down. It’s like in my house, I have a zero fire policy, I’m not going to let any fire come out, and if it does I’ll put a tea towel over it. We’re stuck in this limbo where you can open mostly but not completely, and if you relax when you haven’t got it down far enough it goes out of control. We’re saying get it down far enough, and you do that through really, really good contact tracing. You have to break the chain of transmission, that’s what South Korea did, that’s what China did.” Unfortunately, between speaking with Christina and writing this article, it’s starting to seem like this prediction may be coming true.

Elimination… is zero community transmission… and once you’ve done that, you can go back to normal life! Masks, social distancing, you don’t have to worry about that stuff.

So what does she think the UK should have done to get to such low levels of Covid? “You close down the areas that are really risky. We know outside is safe, but indoor pubs… it’s not a good idea. When countries opened shops, nothing really happened, but when they opened pubs, a few weeks later cases went up. Pubs, restaurants, bars, household parties… all of that causes superspreading events.”

If we had put on more restrictions in the short term while cases were still low, Christina believes we could, in a matter of weeks, have been able to achieve low enough levels to try to eliminate Covid and then we would be in a much better position to reopen schools and have students return to university. The returning of students to university poses a particular concern. “Younger people are much less likely to get symptoms, so they may get Covid and have no idea. And if we don’t have a really good contact tracing system, you can’t stop that. Whereas a really good contact tracing system stops people without symptoms going out, that’s how it works.” To clarify, she doesn’t believe the problem was opening up too early, but rather too quickly. “We opened up schools, and then two weeks later we opened shops, and two weeks later bars, and then gyms and then workplaces. But actually every time you open something, you need to give it about four weeks before you see anything in the data.” More patience with easing restrictions could have avoided the need for local lockdowns. “Local lockdowns are very damaging, whereas if they just waited and got to very low levels of Covid, it would have been fine.”

Master of all trades

Hearing how multi-disciplinary her current job is, perhaps it should not be a surprise that Christina has dipped her toes in quite a few fields before settling on maths, and has four master’s degrees to show for it. “I did maths as an undergrad, because I wanted to be a physicist, which made perfect sense at the time. And I thought quantum mechanics was awesome, so my first master’s was in quantum theory.” But when it came to choosing a PhD, she was told she would have to do a topic that was very heavy on tensors, and she understandably ran for the hills. “I thought, ‘I’m out then, it’s not for me!’ So I got a job and decided to do a part-time MA in classics because I always loved history, I loved ancient history, I loved Latin. I chose maths for my degree because you can’t skip from doing maths A-level to doing a master’s in maths, but you can do that in humanities.” Even though supposedly she needed an arts degree, she was admitted to the MA in classics with a first in maths, simply because people find maths impressive. “People will just give you the benefit of the doubt, it’s actually really handy.” She really enjoyed the opportunity to learn purely for pleasure.

I did a PhD in space physics, because I thought… ‘I want to be an astronaut.’

Eventually, physics called her back. “I did a PhD in space physics, because I thought… ‘I want to be an astronaut.’” Again, they were more than happy to accept a student with a background in maths. But she found herself frustrated with the obscurity of her work. “No one would have cared if I got it wrong. Literally, there were ten people in the world interested in that area of physics.” This is what inspired her to go into research that had a very direct application. “I thought this practical use of maths to help concrete problems is really appealing, so then I came to CORU.”

Although she had found her new calling, she couldn’t bring herself to give up her other passions yet. “I did another part-time master’s in medieval history, and again I really loved it.” The final master’s, which she did later in her career, was in statistics. “That was because people in health think if you do maths, then you’re a statistician. So I thought I’ll just do a stats qualification so I can say I am! But it wasn’t nearly as fun.” However, she did enjoy having her talents in maths reaffirmed. “I spend most of my time now doing project management, so it was kind of nice to do maths again and realise I could still do it.”

It is certainly good to hear that mathematicians have this power to jump around to any field they want. “The earlier you switch, the easier it is. For me, I looked at people who were working at CORU at the time, and about half the unit had done undergrads in physics, so I knew it was fine. We advertise that we don’t mind if people come with a different background. Maths teaches you how to think, and it is really flexible. You can’t change field and expect everything to stay the same. You have to be willing to learn a new programming language, a completely different way of looking at things. Operational research suits people who aren’t wedded to methods, or a certain type of way of doing things, but are actually really interested in real problems.” It may be a relief to anyone choosing modules or PhD topics that their choice won’t limit their career options—and in fact, as our conversation comes to a close, she has some advice for people making these decisions now. “If you’re thinking about doing a PhD, it does matter who your supervisor is. It’s quite an intense relationship, they’re the person who is going to be guiding you into becoming an independent scientist, and having someone who doesn’t want to do it or who is not that engaged can just be a really bad experience. And you have to find it interesting—because you’re going to be doing it for three years, and that’s a long time!”


How can we tell that 2≠1?

What if I told you that $2=1$? You may say I’m wrong. OK, well, what if I proved it to you? We can both agree that there’s an $x$ and a $y$ where $x = y$. From there, multiply, subtract, factorise, divide, substitute, divide again, and you get $2 = 1$.

Still not happy? You’re probably unconvinced by my so-called ‘proof’. OK, I say, and, after a minute, hand you a sheet of paper with the following hastily scrawled on it: It’s better, but you’re still displeased. This time, I’ve made clear what steps I’m taking from $x = y$ to $2 = 1$. However, you point out, I don’t connect any of these steps. Nodding slowly, I take my time and write out a very nice, orderly proof, complete with justifications for each line:

At this point, you spot my mistake: in going from line 4 to 5, I have divided both sides by $x-y$. But we began with the assumption that $x = y$, meaning that $x-y = 0$, and dividing by 0 is not defined! This means that lines 5 to 7 are operating on nonexistent values and are therefore meaningless.

You’re happy with yourself, but something is bothering you. To reveal my mistake, you asked me to be more precise. But why stop here? Because you found what you were looking for? That’s not how truth is found.

My proof, like all proofs, is a path from one statement to another, just as we may follow the path from $ax^2 + bx + c = 0$ to $x = \big(-b \pm \sqrt{b^2-4ac}\big)/{2a}$, or from the existence of rectangles to the transitivity of parallelism (see below). Along this path I have made several intermediate statements, and linked them together with justifications. You found that one of my links is flawed, and you wonder how we know that the others aren’t also wrong. You begin to question foundational principles, wondering, for instance, why we’re even allowed to do the same thing to both sides of an equation.

Euclidean geometry: For the unfamiliar—Euclidean geometry (standard geometry on a flat surface) rests on 5 assumptions, one of which (the parallel postulate) has historically been regarded as ugly. In attempting to eliminate the parallel postulate, mathematicians have found numerous other statements that are equivalent to it, such as that a rectangle exists or that parallelism is transitive.

You keep digging deeper and deeper, questioning more and more of what you previously took to be correct. Eventually, you come across a piece of mathematics that is perhaps the most beautiful and elegant thing you’ve ever laid your eyes upon: natural deduction.

Natural deduction

Natural deduction is one result of asking for deeper and deeper justification when doing maths. A system of natural deduction is a set of very simple, almost irrefutable rules that act to formalise our intuition about what is definitely true.

Reiteration (R): if $P$, then $P$.

These rules include such things as reiteration, which simply allows us to repeat ourselves. Precisely, reiteration says that if you know that a statement $P$ is true, then you can conclude that $P$ is true. This is hardly controversial.

Conjunction introduction ($\land$I): if $P$ and $Q$ then $P\land Q$.

There are two rules for the natural idea of ‘and’. First is the so-called conjunction introduction rule, stating that if you know that $P$ and $Q$ are both true, then you may conclude $P \land Q$, pronounced ‘$P$ and $Q$’. On the other side, we have conjunction elimination, stating that if you know that $P \land Q$ is true, then you may conclude $P$ and also may conclude $Q$.

Conjunction elimination ($\land$E): if $P\land Q$, then $P$ and $Q$.

These rules don’t feel like they do much besides swapping out ‘and’ for ‘$\land$’; however, doing so is important for formality and precision.

Disjunction introduction ($\lor$I): if $P$, then $P\lor Q$.

Things start to get tricky with the rules codifying ‘or’. The first, disjunction introduction, tells us that if $P$ is true, then you may conclude $P \lor Q$, pronounced ‘$P$ or $Q$’: if I am hungry, then it’s also true that I’m either hungry or tired.

Disjunction elimination ($\lor$E): if $P\lor Q$ and from $P$ we can prove $X$ and from $Q$ we can prove $X$, then $X$.

The second rule, disjunction elimination, states that if $P \lor Q$ is true, and from $P$ you can prove $X$, and from $Q$ you can prove $X$, then you may conclude $X$. More colloquially, if either $P$ or $Q$ is true, and in both cases $X$ is true, too, then $X$ is true. For example, if I’m either well-rested or well-fed, and being well-rested makes me happy, and being well-fed makes me happy, then I must be happy.

Implication introduction ($\Rightarrow$I): if from $P$ we can prove $Q$, then $P\Rightarrow Q$.

Then come the rules regarding implication. We have implication introduction, stating that if from $P$ we can prove $Q$, then we may conclude $P \Rightarrow Q$, pronounced ‘$P$ implies $Q$’. And we have implication elimination (also known as modus ponens), which states that if $P \Rightarrow Q$ is true and $P$ is true, then we can conclude $Q$. If the weather being rainy implies that I am cosy, and the weather is rainy, then I must be cosy.

Implication elimination ($\Rightarrow$E): if $P\Rightarrow Q$ and $P$, then $Q$.

Finally, we come to the most arcane rules, those handling negation. The negation of $P$ is written $\neg P$ and pronounced ‘not $P$’. Before talking about the $\neg P$ rules, however, we must first introduce a new symbol: $\bot$ (pronounced ‘bottom’), which represents impossibility or contradiction. We can then introduce bottom introduction, which states that if both $P$ and $\neg P$ are true, which is absurd (usually… there are systems of logic that admit both $P$ and $\neg P$ at the same time, called paraconsistent logics), then we can conclude $\bot$, to represent this impossibility.

Bottom introduction ($\bot$I): if $P$ and $\neg P$, then $\bot$.

We’re then able to make use of $\bot$ through negation introduction, which states that if from $P$ we can prove $\bot$, then we can conclude $\neg P$. This is reasonable; if $P$ being true led to a contradiction, then $P$ isn’t true, so $\neg P$ is.

Negation introduction ($\neg$I): if from $P$ we can prove $\bot$, then $\neg P$.

Finally we have negation elimination. This one is a nice easy way to end: it says that if we know $\neg \neg P$, then we can conclude $P$. If something isn’t not true, then it must be true!

Negation elimination ($\neg$E): if $\neg\neg P$, then $P$.

And with that, we have completed (one kind of) natural deduction, laying out a framework for proofs based on undeniable principles so that we can be completely confident in our results.

Now, you may be wondering, hey, maths is about numbers and shapes and functions and vector fields, but all we’ve been working with are $P$s and $Q$s! Not a single $n$ or $x$, let alone an $f$, has been written so far!

Fear not! Purely logical systems such as natural deduction are key ingredient for building typical maths. For example, to define numbers, we may first extend to predicate logic, then construct the naturals (via the Peano axioms), which we’ll use to make the integers and the rationals (via equivalence classes), then finally the reals (via Dedekind cuts).

So, in fact, we still we get to work with all the maths we’re used to! Plus, due to the use of natural deduction, we have the added benefit of being confident about what we’re doing at every layer of abstraction!

Predicate and propositional logic: The logic we’ve been building, with $\land$, $\lor$, $\Rightarrow$, $\neg$, and $\bot$, is known as propositional or zeroth-order logic. Predicate or first-order logic is an extension of propositional logic wherein our statements ($P$, $Q$, $X$, etc) may be parametrised. So as well as having $H$ mean that ‘I am hungry’, we may also have $\mathcal H(x)$ mean that ‘$x$ is hungry’. Additionally, predicate logic includes two quantifiers, $\forall$ and $\exists$, which respectively mean ‘for every’ and ‘there exists’: $\forall x \mathcal H(x)$ means that everyone is hungry, and $\exists x \mathcal H(x)$ means that (at least) one person is hungry.

So what?

If you’re anything like I was at age 17, or anything like how I portrayed you in the beginning of this article, you’re drooling right now. It’s like all of your fantasies regarding rigour and precision have been heard and answered by divine mathematicians.

But maybe you’re not intrinsically motivated by rigour, so you’re less excited by natural deduction. Which is fine! I’m not hurt. Maybe a little bit. Or maybe you just feel that this is overkill—did you really need all this work to know that $2 \neq 1$? Or maybe you’re not convinced that these rules are correct; perhaps you don’t agree that from $\neg \neg P$ we can conclude $P$.

Excluding the middle: If you don’t agree, you are not alone! That $\neg \neg P$ entails $P$ is a consequence of a rule called the law of excluded middle, which states that $P \lor \neg P$. (This law is built-in to the system of natural deduction that we created.) Some mathematicians (the intuitionists or constructionists) reject the law of excluded middle, thus also forfeiting that $\neg \neg P$ entails $P$. One reason to question the law of excluded middle is that it allows us to state that something exists without stating what it is. For instance, we are able to prove that an irrational number raised to the power of an irrational number can be rational, but without giving an actual example. If we reject the law of excluded middle, then all such proofs must actually construct an example.

Still, I posit, natural deduction is worth your time. Because we’ve been so rigorous in building the system up, we gain the benefit of knowing exactly what we’re talking about. Before establishing such precision, we may have used $P \Rightarrow Q$, but without a sense of what, exactly, it really means. Now we have a precise definition: it means that from $P$ we can derive $Q$ (as per implication introduction); and it means that if we have $P$ then we can conclude $Q$ (as per implication elimination); and it means nothing else.

From this precision, we reap at least two amazing things: metamathematics and computers.

For one, we now can dip our toes into the metamathematical branch of proof theory, where we prove things about proof systems. For instance, we may wonder if natural deduction—or any proof system—is complete, meaning, roughly, that any question we can ask within the system can be answered by the system. Likewise, we may wonder if it’s consistent, meaning we can never prove $\bot$. Or perhaps it’s both? (Interestingly, logical systems capable of sustaining mathematics are never both at once, due to Gödel’s incompleteness theorem.) Proof theory is full of fascinating and surprising results, all enabled by being very precise about what we’re talking about.

@mathsproofbot and @mathstableaubot both prove the true statements that @mathslogicbot tweets.

Additionally, with our newfound precision, we get to enlist computers! Because computers generally demand the utmost precision in order to be of any use, they aren’t of much help until we achieve such rigour. Now, however, we are able to get their help writing proofs, using proof assistants such as Coq and Lean, or interactive proof-writing systems such as my own (see There are even programs that can write proofs entirely for us, as exemplified by @mathsproofbot and @mathstableaubot.

So let us return to my claim that $2=1$. How can we reject this? ‘By intuition’ is the easiest way: clearly $2$ is not $1$. However, now we may also turn to our system of natural deduction, where we were very careful about what we took to be true, and point out that $2 = 1$ is not true in this system. To exemplify this, we can show that it will be rejected by proof assistants and proof-writing algorithms. Finally, we may rest confident. That is, until we dig deeper once again, questioning the principles of our system of natural deduction…


Colouring for mindfulness

Imagine three mice equally distanced from each other, ie at the vertices of an equilateral triangle. If at the same time, all three mice start chasing their neighbour clockwise, then each of their paths would be a logarithmic curve. But this is rather hard to draw, especially if we want to restrict ourselves to only using a ruler.

Instead, let us imagine that the mice can only run in a straight line and need to stop to reassess their direction. If at a given stage we draw their intended path, and assume that the mice cover a tenth of the distance to the next mouse before stopping and reassessing their direction, we get the picture below. While these pictures have been drawn using straight lines only, we see three logarithmic spirals emerging:

Stages 1, 2, 3, and 20.

But why stop there? Why not start with $4$, $5$ or $n$ mice on the vertices of a regular square, pentagon or $n$-gon? The following instructions show a very algorithmic approach to drawing these patterns:

Continue reading


The birth of the Fields medal

What comes to mind when you think of the most prestigious awards in mathematics? Chances are, it’s either the Abel prize or the Fields medal. I think most of us have heard about Abel at one point or another, but I became increasingly curious about Fields. Who is behind the so-called Nobel prize of mathematics?

John Charles Fields

John Charles Fields was born in 1863, in Ontario, Canada and studied at the University of Toronto before moving to the United States in 1887 to study for a PhD at Johns Hopkins University in Baltimore. Fields was involved in several mathematical societies—the Royal Society of Canada and the Royal Canadian Institute among others—and he spent most of his life lecturing at the University of Toronto. Though a great mathematician, he excelled at organising mathematical events and promoting research.

One of his greatest achievements was working with the International Mathematical Union to hold the 1924 International Congress of Mathematicians (ICM) in Toronto. This was only the second congress held by the union after the First World War. During the first congress, Germany, Austria–Hungary, Bulgaria and Turkey were excluded from the union and many were worried this might escalate. But Fields was determined to make the 1924 congress work. He spent months in Europe fundraising tirelessly. Some say personal acquaintances with rulers of Europe aided his efforts—attendance of a dinner with the king of Sweden and a later meeting with Mussolini in Bologna certainly support this claim—but no direct sources remain. After organising the conference, with money to spare, he paid for European mathematicians’ travel costs across the Atlantic. And this is where the idea of the Fields medal was born.

The proposal

It is proposed to found two gold medals to be awarded at successive International Mathematical Congress for outstanding achievements in mathematics […]

The idea of the award was first presented on 24 February 1931. Because of Fields’ talent for acquiring funds, the medals were to be funded by the remaining finances from the 1924 ICM. Fields wrote, “It is proposed to found two gold medals to be awarded at successive International Mathematical Congress for outstanding achievements in mathematics […]” The awards would be open to the whole world and would be made by an international committee.”

He proposed the medals would be handed out at every congress for work already done, but also to encourage further achievement, starting with the next congress in 1936. Unfortunately, Fields’ health started to decline in 1932. Just days before his death, he noted in his will to leave an additional 47,000 Canadian dollars to fund the medal. As he had intended, the very first medals were awarded in the 1936 ICM in Oslo, Norway.

Today, Fields medals are awarded every four years to between two and four brilliant mathematicians under the age of 40 with a prize of C$15,000. The age limit was not directly due to Fields himself; it was added in 1966 to promote diversity, although recently this choice has been criticised as there is anecdotal evidence suggesting female mathematicians fare better later in their careers.

The medal itself was designed by Robert Tait McKenzie and features Fields’ monogram alongside a profile of Archimedes and the date of the medal’s founding, incorrectly written in Roman numerals: MCNXXXIII (1933). The reverse features an illustration of one of Archimedes’ most famous achievements: deducing the surface areas and volumes of the cylinder and the sphere which can be inscribed within the cylinder. Each face of the medal bears a Latin phrase:


(To transcend one’s human limitations and master the universe)

on the obverse, and the reverse reads:


(Mathematicians gathered from the whole world to honour noteworthy contributions to knowledge)

The date of the prize being awarded and the recipient’s name is engraved on the rim of the medal.

In 1936, Jesse Douglas and Lars Ahlfors were the first to receive Fields medals. What was so special about these two mathematicians, that they stood out from across the globe to become the first winners of the (arguably) highest honour in mathematics?

Jesse Douglas

Jesse Douglas’s love for mathematics started in high school. While studying at the City College of New York, he became the youngest recipient of the college’s Belden medal for excellence in mathematics in his first year. After his undergraduate degree, Douglas began his doctoral study at Columbia University under the supervision of Edward Kasner, who introduced Douglas to the problem that became his most noteworthy achievement—Plateau’s problem.

Plateau’s problem (also known as the soap bubble problem) is about showing the existence of a minimal surface for a given boundary, and possibly with other constraints. This has a fascinating physical application in the form of soap films. A frame filled with a thin soap bubble, due to the action of surface tension, will always take the shape of minimum surface area: a so-called minimal surface. In 1931, Douglas discovered and proved a general solution, for which he came to win the Fields medal (although this was also done independently by Tibor Radó: Radó also created the busy beaver family of Turing machines, see Will this article ever end). Before this contribution, only some special cases had been proven by such mathematicians as Schwarz, Weierstrass, and Riemann.

Around the same time, Douglas was working at the Massachusetts Institute of Technology as an assistant professor, later to be promoted to an associate professor. He continued to work on Plateau’s problem even after solving it, focusing on further generalisations of the problem. He published 11 papers between 1939 and 1940 on these generalisations.

After working on Plateau’s problem for many years, Douglas diverted his attention to group theory, making significant contributions to the field. He spent the last 10 years of his life as a professor, back at the City College of New York.

Lars Ahlfors

Lars Ahlfors was a Finnish mathematician born in April 1907. As a child, he already loved maths. This love only grew although most of his life was spent in the midst of war.

Ahlfors started his university studies at the University of Helsinki and was taught by two of the best Finnish mathematicians at the time: Ernst Lindelöf and Rolf Nevanlinna. After this, he followed Nevanlinna to Zürich and discovered mathematics can be taught in a different way. Lars came to understand “that [he] was supposed to do mathematics, not just learn it”. During his time in Zürich, Ahlfors encountered Denjoy’s conjecture (now known as the Denjoy–Carleman–Ahlfors theorem) which he proved in 1929. Loosely, the theorem determines the number of values an entire function (a function that is differentiable everywhere in the complex plane) can take at infinity. This is what gave Ahlfors international recognition, though he himself always credits the significant help of Nevanlinna and Pólya as the main influences that lead to his proof. Though impressive, this was not what earned Ahlfors his Fields medal. That award was specifically credited to a single paper. This paper shed some light on Nevanlinna’s theory of meromorphic functions and quasiconformal mappings (more stuff to do with complex functions). Currently this paper is recognised as one of the starting points for essentially a new branch of analysis called metric topology.

Having already gone through the First World War as a child, Ahlfors was just about to go through the next challenge—the Second World War. But, surprisingly, his research benefited. He was able to devote himself to his work completely, even though libraries had closed due to the lack of students. In 1944, Lars was offered a position in Zürich, opportune timing since the Soviet Union was attacking Finland and Ahlfors’s own health was poor. So he, his wife, and two young children planned to flee to Switzerland, via Stockholm, the UK and then Paris.

Times were tough, and Ahlfors was only able to take 10 crowns with him. So what did he do on arrival in Stockholm? He smuggled his Fields medal across the border and sold it in a pawn shop! He later reflected, “I’m sure it is the only Fields medal that has been in a pawn shop. As soon as I got a little money some people in Sweden helped me retrieve it.” What a relief! Imagine if someone had actually bought it!

Luckily, the family made it to Switzerland safely. Slowly, though, Ahlfors began to feel that his invitation had been not an honour, but simply an attempt to fill a position they couldn’t find anyone else for. So when an offer came in from Harvard, Ahlfors gladly accepted it and remained there for more than 30 years until his retirement.

Since Douglas and Ahlfors first won in 1936, 58 other mathematicians have been awarded the medal, including the only person to decline the award—Grigori Perelman—and the first and so far only woman to receive the award—Maryam Mirzakhani. The next ICM is due to be held in 2022 in St Petersburg, Russia, and the story of the Fields medal and its winners will continue.



Will this article ever end?

The obvious answer is yes: there are only so many atoms in the universe to make paper and ink out of, so of course it will have to end at some point. But questions of this ilk, like so many apparently simple questions, can spark an interesting discussion. The more interesting question is this:

Question: Do all yes/no questions have an answer, and if so, is there an algorithm that can compute it just from the statement itself?

The first half of the question is clearly false, think about paradoxes like ‘Is the sentence “This sentence is false” true?’. For further discussion, please go talk to your favourite philosopher (mine is Nick Bostrom). The second half is something we will be looking into in more detail; more precisely, how such a question led to the creation of modern computer science. To rephrase the previous question with a bit more formality and clarity:

Question: Given a yes/no statement in formal logic, does there exist an algorithm that can compute the yes/no answer?

Formal logic is the logic used by mathematicians and logisticians (see How can we tell that $2\neq 1$?). Further discussion is well outside the scope of this article, but it is just there to make sure that the question the algorithm would try to solve doesn’t produce paradoxes like the one above. This question, or one very similar but in German and itself written in the language of formal logic, was posed by one of the most famous mathematicians of the 20th century—David Hilbert. It is called the decision problem, or Entscheidungsproblem in German. We can clearly see that this is an important question. Almost any question in mathematics can be posed in terms of formal logic and mathematics can be used almost anywhere. So being able to show that there will always be an algorithm to compute the answer to any question would be of incredible use. It would mean that there would always be a clear method to solve any problem that can be reduced to mathematics.

Before we can dig our teeth into this problem though, we must first properly define what we mean by an algorithm. Since we live in a world built by, and upon, algorithms, we intuitively know that it is something that takes in some information, does something to it, and then gives us some sort of ‘useful’ answer. This is a very nebulous definition, clearly, as I used the term ‘does something’. That and the idea of utility is a heavily contextual notion (if I’m stuck and dehydrated in a desert, a proof to the Riemann hypothesis isn’t going to be very useful to me). A slightly more formal definition of an algorithm in plain English would be the definition from Marshall H Stone and Donald Knuth:

“… a set of rules that precisely defines a sequence of operations such that each rule is effective and definite and such that the sequence terminates in a finite time.”

Or in layman’s terms, a finite sequence of well-defined operations. But as any mathematician knows, writing something in plain English is usually much easier than doing so in mathematics.

This is where Alan Turing comes into the picture. Turing was one of the greatest mathematicians of the 20th century—if not of all time—and was a key figure in the foundation of theoretical computer science and artificial intelligence; he is often called the father of both fields. Turing did a whole host of wonderful and amazing work in his short career. Unfortunately, despite the revolutionary work that he did both academically and in breaking Nazi codes during the second world war, his work was not truly recognised until long after his passing. This was, of course, partly to do with the secretive work of codebreaking, but also because he was gay at a time when homosexual acts were illegal. In 1952, he was convicted of ‘gross indecency’ and given the option between a prison sentence or chemical conversion therapy; the chemical injections he chose led to depression, and later, his premature death by suicide. Turing died at the age of only 41, and we can only wonder what astonishing work he would have done if the prejudices of the time hadn’t caused his premature death.

Turing’s first groundbreaking work was solving Hilbert’s problem and defining what was meant by an algorithm in formal mathematics using his elegant concept of the Turing machine. A definition and answer had been given a few months earlier by another mathematician called Alonzo Church—a pioneer in his own right and someone who Turing would later study under for his PhD. However, not only was Turing’s solution a greater conceptual leap than Church’s proof, but it was also a more intuitive and pragmatic idea than the heavy formal logic of Church’s approach, called lambda calculus. But enough beating around the bush, let us talk about the Turing machine.

The Turing machine and the Church–Turing thesis

Turing’s idea of how to formalise an algorithm in a mathematical framework was to design a machine that would be similar to the concepts proposed by Ada Lovelace and Charles Babbage, two more pioneering mathematicians and founders of the field of computer science. This was a mathematical construction composed of an infinite tape, a box (normally called a head) that could read what’s written on the tape and write new things onto the tape, a motor to move the head left or right, a set of numbers/symbols to read, and a processor full of instructions. The idea is that the input for the algorithm is written on the tape, the Turing machine then reads this and performs a sequence of simple actions dictated by the processor, perhaps using the tape to temporarily store information, and then writes the output of the algorithm on the tape. We’ll look at an explicit example of a Turing machine working in a minute, but first all the components of a Turing machine fit and work together like so:

  • The processor can have (finitely) many different configurations of its internal settings: we shall call these states.
  • The Turing machine itself doesn’t have any memory. But we can encode what it previously read by switching into one of many different states.
  • The Turing machine is fully deterministic and well-defined. Given the current state and the symbol the head is reading, we know exactly what the machine will do.
  • In a given state, the Turing machine can do any combination of the following: read the tape, write to the tape, move the head left or right, and (depending on what was read from the tape) change the state that the processor is in.
  • There has to be at least one state in the processor that will stop the Turing machine and signal the end of the computation: this is usually called a halt state for the Turing machine. This is typically done when the head has printed the output of the algorithm.

A schematic representation of a Turing machine.

Turing’s great insight was not only the simplicity of his construction but the notion that perhaps any form of computation could be represented as a Turing machine. This was later expanded upon during his study under Church for his PhD, leading to a groundbreaking theory which is now the cornerstone of computer science and a branch of logic and mathematics called computability theory. The Church–Turing thesis states that any computable function (a function computable by a machine or any formal algorithm that solves a problem) can be translated into an equivalent computation model involving a Turing machine. There is still some ambiguity about what exactly a computable function is and if every possible computation is represented as a computable function. But as these questions are still being discussed by academics all over the world, we will sidestep this thorny issue.

The Church–Turing thesis quickly leads to the conclusion that since our brain can compute mathematics and all algorithms that have been developed thus far can be computed by our brains, every algorithm ever computed is equivalent to a Turing machine! Because of this fact, the modern definition of an algorithm is based around Turing machines.

Now that we have a mathematical definition of a general algorithm, can we answer Hilbert’s question? Yes, we can, but first let’s get a little more comfortable with Turing machines.

Example Turing machine: how to build a dam in three easy steps

Like most things in mathematics, the best way to get some familiarity with an idea is to see a worked through example. The toy machine I will use to illustrate Turing machines is one from a rather famous family of Turing machines created by Tibor Radó called busy beavers s (Radó also
discovered a general solution to Plateau’s problem, see The birth of the Fields medal)
. They are very simple machines that are fed an infinite tape full of 0s and their aim in life is to write as many 1s as possible on the tape; but it can’t just print 1s forever, it has to stop eventually. The example we will look at is the 3-state busy beaver.

The processor for a 3-state busy beaver, which terminates as soon as the head reads a 1 while the processor is in state A.

Starting in state A, the Turing machine reads a 0, so it erases this and writes a 1 in its place. Then it moves the head so that it is positioned over the next symbol, and the processor transitions into state B. The next symbol the head reads is again 0, but this time because it’s in a different state the Turing machine leaves this 0 as it is, moves the head again, and transitions into state C. The machine just keeps repeating this process until it enters its halt state.

This 3-state busy beaver halts after 14 steps and writes six 1s on the tape. You can’t do more without more states. The coloured squares show the head position and the state of the processor.

Although we don’t see it here, where the Turing machine derives its main source of power is its ability to switch state depending on the information on the tape and encoded information from its previous state. If you are familiar with a bit of programming, this is very similar to jumping to a memory location in assembly language or conditional statements and loops that are found in higher-level languages like Python. If you want to write one out for yourself in full—I recommend you don’t. Even a simplified binary number checker takes more than a page of explanation of explicit states and even longer if you wanted to show it all working—I know from painful experience. But if you’re feeling very proactive, try to play out the beaver from above: you could use it on a row of coins to represent the tape with $\text{tails}=0$ and $\text{heads}=1$. Extra credit for making a 4-state beaver.

The Turing machine and the decision problem or: how I learned to stop worrying and love the halting problem

Now, finally, back to Hilbert’s Entscheidungsproblem. How did Turing solve it? Turing’s own paper and full technical proof isn’t too hard to read or understand. I will, however, provide a truncated version of the proof and try to provide an alternate method while keeping the core intuition.

In the paper, Turing proposes a method to encode a Turing machine, the states, and inputted symbols, into what he coins a description number. Similar to what we do to with text and Unicode—uniquely representing some text with a binary number. This description number describes the machine, and therefore the computation that it represents, completely! He then asks the obvious next question.

Question: Can one compute the description number of a Turing machine from a yes/no statement in formal logic?

If we could, then we would show that any yes/no statement has at least one corresponding Turing machine (which Turing showed was equivalent to a computable algorithm) and so the answer to Hilbert’s question would be yes! Unfortunately, this is not the case; so, the answer to Hilbert’s question is a resounding no.

Turing was able to show this by using an idea from another Titan in the world of mathematics and logic, Kurt Gödel. Gödel is famous for a lot of important results from the fields of logic, mathematics, and philosophy, and is an interesting person in his own right. However, the key Turing used—and therefore the thing we will be looking at—is Gödel’s incompleteness theorem. This is a wonderfully mind-bending theorem that deserves its own article, and I highly recommend you go read into it (try Gödel’s Proof by E Nagel and JR Newman), but the important piece of logic that Turing adapted was the idea of using something being self-referential to reach a contradiction. We can see this in the paradox I mentioned at the start: ‘Is the sentence “This sentence is false” true?’. Gödel’s idea was very similar, but to formalise it in the realm of formal logic was his great achievement.

To use this insight, Turing proposed a Turing machine, and therefore a computable algorithm, that I’m sure anyone who has ever written code wishes they had in their back pocket: a Turing machine to see if another Turing machine, with a given input, will halt. Or in other words, a computable algorithm which can tell you whether an algorithm, together with an input, would eventually stop and give you an answer, or whether it would just keep cranking through calculations forever.

As an aside, an algorithm that could do this would not only be a boon for any budding programmer wishing to debug their code—but also an incredibly useful tool in pure mathematics. You could solve Goldbach’s conjecture by applying this mystical Turing machine to a simple program to search for the first counterexample! (What would the program look like?) Unfortunately, like most things that are too good to be true, this doesn’t exist. Let’s see why!

The halt-deducing machine $H$.

Let’s suppose we have this amazing halt-deducing Turing machine and call it $H$ (for $H$alt). If Hilbert’s decision problem has a positive answer, this machine will exist—as asking if a Turing machine halts can be written in formal logic—so let’s not get bogged down in the details of how it would hypothetically work. But, on a high level, if we give $H$ a description of a Turing machine and its input, $H$ will either halt and print ‘halts’ onto the tape if the inputted Turing machine halts: or it will halt and print ‘loop’ onto the tape if the inputted Turing machine loops forever. In particular, note that $H$ itself will always halt.

We can then modify this machine a little bit. To $H$ we attach an inverter which will flip its output: if $H$ outputs ‘halts’, the inverter will go into a loop and not halt; if $H$ outputs ‘loop’, the inverter will halt and output a picture of a scorpion. As mentioned before, $H$ needs two inputs: the Turing machine itself and its input. We wish to check if $H$ terminates when we apply it to itself, so we need to put $H$ into both of its input slots. To do this, we can attach a device to copy the input into it, which we can call a cloner. We can call this new Turing machine $H^*$:

The modified Turing machine, $H^*$.

Now that we have $H^*$, and the idea from Gödel of self-application, all we need to do is input $H^*$ into itself. When we input $H^*$ we have two possibilities. If $H^*$ does halt, then the inverter means that $H^*$ doesn’t halt. But if $H^*$ doesn’t halt, then $H^*$ does halt. We have reached a contradiction!

But the only thing we assumed is that $H$, and therefore $H^*$, exists. But as the existence of $H$ leads to a contradiction, we must conclude that a Turing machine like $H$ cannot exist. However, as a Turing machine is a mathematical object and asking if it will halt can be phrased in terms of formal logic, a yes to Hilbert’s question would mean that $H$ does exist! Therefore Hilbert’s question has negative answer: not all yes/no questions can be answered by an algorithm.

This argument is the same as the one Turing presented in his original paper, albeit simplified: there he shows that the description number of $H$ and $H^*$ can’t be calculated as the computation would go on forever. This is what we call an uncomputable number as no finite algorithm could ever compute it.

Typically, the numbers we are use every day ($\pi$, $\mathrm{e}$, $5$, $3/4$, etc) only require a finite amount of information to represent them. This can be the number itself if it is an integer, or a quotient of integers, but some numbers with infinite non-repeating decimal expansions can also be represented with a finite amount of information, like $\pi$ and $\mathrm{e}$. This is because $\pi$ and $\mathrm{e}$ both have finite generating formulae that can produce every digit. In other words, if you give me a position in $\pi$ or $\mathrm{e}$, like the 1737th position, the formula can find the 1737th digit of the number after a finite number of steps. This isn’t always the case: if I have an infinitely long number whose digits are random, you wouldn’t expect to be able to compute its decimal expansion using a finite algorithm. These numbers are uncomputable as they cannot be computed or represented in any finite sense.

The legacy of Turing and his machines

The legacy of the Turing machine and the ideas Turing put forward during his lifetime are hard to quantify. It is always hard to know what the world would be like without certain people and their ideas in it. Just limiting Turing’s body of work down to his idea of the Turing machine, we can imagine that the world would perhaps be almost unrecognisable. The concept of computers and their programs that were pioneered by Ada Lovelace and Charles Babbage had been around for about 93 years before Turing’s own paper on the matter. Despite Church developing his lambda calculus to solve Hilbert’s problem, it is quite difficult to imagine whether the modern computer would be around today, and if so, what it would look like. Perhaps the best view on the importance of Turing’s work comes from a sentiment put forward by John von Neumann, another intellectual juggernaut of the 20th century, to Stan Frankel. Frankel, writing in a letter to Brian Randell, says,

“… [Von Neumann] firmly emphasised to me, and to others I am sure, that the fundamental conception is owing to Turing—insofar as not anticipated by Babbage, Lovelace and others.”

This was in reference to Von Neumann architecture, the modern framework of the stored program computer, which almost every computing device in the world, past 1945, can draw its direct heritage from. That, and the wider body of work credited to Turing, is why we call him the father of theoretical computer science. There is much more to talk about relating to Turing and his work, but as I said at the beginning, this article must end.


Fun with Markov chains

Probability has found itself a comfortable position amongst industry mathematics. And now, with the advancement of the computer age, a 100-year-old piece of maths has jumped to the forefront, namely Markov chains. These can be used for an awe-inspiring range of ideas including predicting the weather, mapping the probability distribution of a Monopoly properties, and even Donald Trump tweet generation. But what exactly are Markov chains, and why are they so unique?

Understanding Markov chains

Let’s say we are having a very nice day and are happy as a result, and that for some reason unknown to us, we only posses the emotional capacity to either be happy or sad. Can we predict what our feelings will be tomorrow? We know that our feelings tomorrow will be dependent on our feelings today; that is, a probabilistic function where the input is today’s mood and only today’s mood. Or in other words, a Markov chain!

So how do we do this? Let a happy day and a sad day each be states. We will represent these as nodes on a graph. Now we need some probabilities for the changing of states. Suppose we consider ourselves optimistic people: let’s say there’s a 5/6 chance that if we are happy today, we will be happy tomorrow, and a 1/2 chance if we are sad today, we will be happy tomorrow. Since something must happen tomorrow (all apocalyptic events excluded), then the sums of the probabilities leaving each state must add to 1.

♪ Because I’m happy with a 5/6 chance tomorrow if I’m feeling happy today… ♪ Image: Frank Schwichtenberg, CC BY-SA 4.0

This is an example of a two-state Markov chain, and while simple, it presents a useful visual aid to help our understanding. Much more formal and rigorous definitions can be found online, but in a nutshell, a Markov chain consists of a set of states where the probability of transitioning to any state is solely dependent on the current state. This dependence is called the Markov property and is what makes this neat piece of maths unique.

We can even use this Markov chain to predict how many happy days we’re going to have over the next week. One way we can do this is by following the chain by hand. Take a fair six-sided die and put a finger on the happy state: let’s say today was a good day. Now roll the die and if 1–5 is rolled, follow the looped arrow back to the happy state, else follow the arrow to the sad state. If we were unfortunate enough to find ourselves on the sad state, let even rolls—2, 4 and 6—send us down the arrow to the happy state; and odd—1, 3 and 5—loop back round to the sad state. If we make seven rolls where each roll is a new day, our finger will conclude on the prediction for how we’ll be feeling a week today. If we note the state after each roll, we’ll have predictions for each of the seven days. Now we can plan our week accordingly!

What about in a month then? Or a year? Following this diagram for 365 days, the rolling becomes a bit tedious. Instead, we can speed this up by taking ideas from linear algebra, and creating a transition matrix. Let each column of the matrix represent the current state and the rows represent the probabilities of transitioning to each state. For this example, the transition matrix is given by where $H$ = happy and $S$ = sad. We can use this representation to make a prediction for any number of days in the future. It should be noted that this could equally work with the columns and rows swapped, but as long as we’re conscious of which way we’re doing it, it simply becomes context. We can check whether the matrix has been created properly; the columns (in this case) should add to 1. If they don’t, something has gone horribly wrong! Say we wanted a prediction for two days’ time. We first raise this transition matrix to the power of 2, giving the following calculation:

\[ \begin{pmatrix}{\frac56}&{\frac12}\\ {\frac16}&{\frac12} \end{pmatrix}^2 = \begin{pmatrix} {\frac{5}{6} \times \frac{5}{6} + \frac{1}{2} \times \frac{1}{6}}&{\frac{5}{6} \times \frac{1}{2} + \frac{1}{2} \times \frac{1}{2}}\\ {\frac{1}{6} \times \frac{5}{6} + \frac{1}{2} \times \frac{1}{6}}&{\frac{1}{6} \times \frac{1}{2} + \frac{1}{2} \times \frac{1}{2}} \end{pmatrix} = \begin{pmatrix} {\frac79}&{\frac23}\\ {\frac29}&{\frac13} \end{pmatrix}. \]

Let’s see what is going on here by looking at the first entry of the new matrix. Like before, this is the probability of starting on happy and finishing back on happy, but this time for the day after tomorrow. If we relate this back to the graph, we can see that each term is a different possible path of probabilities (try saying that five times as fast) for exactly two days; $(5/6) \times (5/6)$ is happy tomorrow then happy the day after, and $(1/6)\times (1/2)$ is sad tomorrow and back to happy the day after. We then just choose a starting state. Like before, let’s be happy today: we represent this as a vector $\left(\begin{smallmatrix} 1 \\[2pt] 0 \end{smallmatrix}\right)$. We then do the following calculation with our matrix we just raised to the power of 2:

\[ \begin{pmatrix}{\frac79}&{\frac23}\\{\frac29}&{\frac13} \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} {\frac79}\\ {\frac29} \end{pmatrix}. \]

In return, we will get this column vector where the first entry is the probability of being on the happy state in exactly two days, and the second entry is being on the sad state. The cool thing is that we can put this matrix to any power $n$, which will give us a set of predictions for the $n$th day from today. Now, we’re mathematicians, so we need to do what any good mathematician should do and ask what happens when $n$ tends to infinity. This is something called the long run average probability and can be used to predict the behaviour of the Markov chain for some arbitrary but significant time in the future. This can be found analytically by taking some more ideas from linear algebra including a process called matrix diagonalisation, which would call for the use of more advanced mathematics. So instead, with a little rough-around-the-edges Python code (see the bottom of this article) we can get a pretty good idea. As $n$ gets very large, it is seen that the happy state tends to $0.75$, or $3/4$, and the sad state to $0.25$, or $1/4$. Therefore, given a significant amount of time in the future, this Markov chain is predicting a 75% chance we’ll be happy: not bad, all things considered.

We should, of course, acknowledge the limitations of this as a model. There are plenty of emotions not taken into consideration, nor whether a day holds special significance, eg a birthday. However, it’s easy to imagine how this can be expanded to accommodate for these other ideas with bigger graphs and larger matrices. The below graph shows how we might add a third emotion, for example boredom. Assign the probabilities based on whatever data we may be using, then fill out the graph accordingly.

We must be careful, however! When adding the new probabilities, it’s vital we ensure the sum of all arrows exiting a state still sum to exactly $1$. In addition, we are after all working with probabilities, and as such should be aware that a low chance does not mean impossible, and a high chance does not mean inevitable. This is just a model, and real life is rarely so organised.

What can be done with them?

Markov chains are used in a wide array of industry-based mathematics. But, with the rise of the technological age, these more light-hearted, fun and fascinating applications have emerged. A very interesting such use of these can be text prediction; quite a simple idea but we can have a lot of fun with it! If we take a few short sentences, say

“I love Markov chains.”
“Markov chains love me.”
“Markov is I.”

and we wanted to generate new original sentences from these, how could we do it? Representing each word as a state, we can create a Markov chain. The probabilities between states are then intuitively derived from the frequency of words following each other within these sentences. Take the word ‘Markov’. In these short sentence, the word ‘Markov’ is followed by the word ‘chains’ or ‘is’. However, if we look at the frequencies of these words, we have that ‘chains’ follows ‘Markov’ $2/3$ of the time, and ‘is’ for $1/3$. Do this analysis for every word and we have all the states and probabilities we need. Hence, like before, we can draw this up.

Once again, the probabilities allow for this to be played with using a fair six-sided die, with which we can generate some new sentences:

“I love Markov is I love Markov chains love me.”
“Markov chains love Markov chains.”

Notice that not every pair of states has a pair of probabilities between them. This is because some states have a zero chance of following the current word. For example, in none of the three short sentences does ‘me’ directly follow from ‘chains’, leading to the zero chance of ‘me’ following ‘chains’ in the Markov chain. We could include these arrows on our graph, annotating them with the zero chance, however this would lead to an unnecessary clutter, especially since these extra arrows would be inconsequential for our purposes anyway.

Fake Donald Trump tweets generated by a Markov chain built by Filip Hráček. Image: Reproduced with permission from Filip Hráček

There’s an amusing example of this method in Hannah Fry and Thomas Oléron Evans’s The Indisputable Existence of Santa Claus where they use this method to generate new Queen’s speeches with interesting results. Similarly, the sentences we’ve generated aren’t really sentences despite how much hilarity ensues, but imagine if we had a much larger data set\dots say someone’s Twitter. Twitter’s public availability lends itself as a large library of data containing peoples’ thoughts and opinions that we can use. One such person is 45th president of the United States, Donald Trump, who often likes to express his opinions publicly with questionable legitimacy, although here, this can work in our favour. Filip Hráček is a programmer from the Czech Republic who, as an experiment, created a Donald Trump tweet generator using Markov chains.

The probabilities become a lot more reliable when we introduce this large data set, causing the system to generate more coherent sentences. They’re still not perfect, but as was mentioned, this data coming from Donald Trump works in our favour. By setting each word in the tweet to a state, like before, the probabilities become more fine-tuned with the vast amount of data at hand. This model also employs something called an order-2 Markov chain. What this means is that instead of the next state being solely dependent on the previous, the model takes into account two states, or in the case the words, for making the decision on the next. A Reddit user under the handle Deimorz took this one step further and created a subreddit entirely populated with posts and comments generated by bot accounts using Markov chains. We can’t interact with them, but it’s unnerving to see the conversations they have relating to some topic unknown to us, especially since the bots are seemingly fully invested. There are quite a number of diverse experiments such as these online which use Markov chains, and which you can play with. Curiously, on the more serious side of application, this is the very primitive idea behind text prediction on mobile phones.

Andrey Andreyevich Markov, a Russian mathematician born in 1856, was the creator of this piece of mathematics in his work in probability theory. They have since been used in applied mathematics spanning many fields. Uses of Markov chains appear in computer science, business statistics, mathematical biology, contributing to Google’s PageRank, predictive text on mobile phones, and plenty more. What’s curious is that Markov developed this process for letter analysis in his native language of Russian: given a letter, calculating what is likely to be the next one. His motivation was based in probability and statistics and idea called the law of large numbers. For further reading, I’d recommend a research paper, The five greatest applications of Markov chains by Philipp von Hilgers and Amy N Lanville, which explains some of the most revolutionary uses of Markov chains in history, including going more in depth into Andrey Andreyevich Markov’s own application. It talks about historical uses in information theory, computer performance, and was one of the most interesting reads I found when conducting research for this article.

Now it’s your turn! Get out some dice and play with these examples, or even come up with your own. There is so much more to this neat piece of mathematics worth exploring, far too much to squeeze into this article. So I implore you to get reading, and start creating!

Python code for predicting mood

Some simple but functional code to find the probabilities for 52 weeks:

import numpy as np
A = np.array([[5 / 6, 1 / 2], [1 / 6, 1 / 2]])
B = []
for x in range(0, 365, 7):
    b = np.linalg.matrix power(A, x)
    start = np.array([[1], [0]])
    c =,start)
    B.append(x / 7)

Oπnions: Should I share my code?

Six years ago, in September 2014, I started working on my PhD. By Christmas, I was doing calculations using code that would’ve taken the average PhD student three to four years to write. But this wasn’t down to my own programming skills: this was thanks to my supervisor and his previous students deciding to work on open source software.

Open source software is software whose developers release the source code that makes up the software, rather than just releasing a binary or application that the user can run. Almost always, open source software is free to use and adapt. Continue reading


Counting Countdowns

Rachel Riley puts the last of the six numbers on the rack and presses the button to generate a random target. The host—whoever the host is now, I haven’t really watched Countdown since Richard Whiteley died—says a few words and starts a 30-second timer. And the contestant thinks, ‘Now I need to combine those numbers to make the target. I know! I’ll just check all of the possible calculations with these six numbers! I wonder how many there are.’

The aim of the game is to combine the six given numbers using the four basic arithmetic operations ($+, -, \times$ and $\div$) to reach a target number. You do not need to use all of the numbers, but you may not use any number more often than it appears on the board. Also, you’re inexplicably not allowed to use fractions at any point.

For example, given the numbers in the big picture above, with the displayed target of 333, you might solve it as \[ ((50-4) \times 7)+5 + 6, \quad \text{or as} \quad ((50-6)\times 7) + 25,\] or several other acceptable ways.

TL;DR: including the trivial answer of 0, there are 974,861 distinct answers that can be reached by applying the four basic operations to combine six numbers, if you ignore (a) coincidences and (b) Countdown’s fraction-phobia.

Continue reading


Chaotic scattering: uncertainty and fractals from reflections

Any Chalkdust reader who has spent time playing billiards, snooker, or pool, will probably have some kind of feel for how the balls move around the table, how they bounce off each other, and how they bounce off the cushions. After a while, you just know instinctively how they should behave, at least in very simple situations. Reading this article is unlikely to improve your potting statistics; writing it has certainly not improved mine.

Specular reflection of a ray
hitting the edge of a disc.

We can start by shrinking the cue ball to the size of a point, and imagining the object ball being squashed into a hard-edged circular disc that is so heavy that it does not recoil in a collision. The path of the incoming cue ball can be represented by a ray. Rays always follow straight lines between bounces, mimicking a ball rolling across a perfectly smooth horizontal table. Reflection of the incident ray by the disc is shown on the right. Draw a line from the disc’s centre through the point where the ray hits the circumference. That line is the normal, and the law of specular reflection tells us that angle out equals angle in, or in symbols, $\theta_\text{ref} = \theta_\text{inc}$.

To make things a bit more interesting, place identical discs at the corners of an equilateral triangle. That setup was first analysed in detail around three decades ago by physicist Pierre Gaspard and chemist Stuart Rice. Scientifically, their groundbreaking paper provided a paradigm for chaotic scattering in a plane but we can think of it as just an abstract game of pinball. The three-disc system turns out to be really useful in applied mathematics for understanding billiard-type problems, while in physics it crops up in fields from laser optics to the kinetic theory of gases.

Formulating the problem

The three-disc arrangement is shown below. Each disc has a radius of 1 unit, the size of the gap between them is $\ell$, and the shaded area $\Omega$ is called the scattering region. Any incoming ray is defined by an incidence angle $\theta_0$ and a displacement $x_0$. We can select whatever values we like for $\theta_0$ and $x_0$, and will refer to the pair of numbers $(x_0,\theta_0)$ as the input.

An incident ray can generally end up doing only one of four things, which are the allowed outputs of the system. If the ray enters $\Omega$, it ricochets between the discs before eventually escaping through one of the three gaps. The fourth possible outcome is that the ray is reflected away early on and so never enters $\Omega$ at all. Sometimes, a ray may also remain trapped inside $\Omega$ forever! But that situation is so incredibly unlikely that we can discard it here. Our question to try and answer is: for a given input, what will the ray do?

The three-disc system with the gaps labelled 1 to 3 and colour-coded, with the red cross showing the origin of $(x,y)$ coordinates. The incident ray has initial position $x_0$ and angle $\theta_0$ (positive angles are measured in the clockwise sense).

The butterfly effect

The zig-zagging path of any ray (its trajectory) can be computed using straight lines and applying the law of specular reflection every time it hits a disc. Crucially, our scattering problem is deterministic because it is governed by mathematical rules in which there is absolutely no randomness whatsoever.

Given an input, classical physics demands that there must exist a unique output. Seems reasonable and obvious. What is not so reasonable and obvious is that a tiny change to the input can lead to a dramatic change in the output. This strange phenomenon—where a system can be susceptible to minuscule fluctuations—is known technically as sensitive dependence on initial conditions. The term ‘butterfly effect’ is perhaps more widely known, coined by mathematician and meteorologist Edward Lorenz in the early 1970s as a nod to the unpredictability he discovered in a toy model of the weather. It purports, only half-jokingly, that a butterfly flapping its wings in Brazil can set off tornadoes in Texas.

The butterfly effect appears in our scattering problem when we have, for example, two incoming rays starting from the same $x_0$ value but with slightly different $\theta_0$ values. The difference between their paths may become magnified through successive bounces, building up remarkably quickly until the two trajectories have diverged and no longer bear any resemblance to one another. Ultimately, we may even find that the two almost (but not quite!) identical inputs lead to completely different outputs.

A demonstration of the butterfly effect in the three-disc system. On the right, $\theta_0$ has been increased by 0.001° compared to the left.

Scattering that exhibits the butterfly effect is said to be chaotic. In common parlance, ‘chaotic’ is often used to convey disorder or perhaps even (perceived) randomness. Here, we are deploying the word in a scientific sense. While it might sometimes look random, the three-disc system cannot do anything except behave deterministically. It also hides some very intricate and very ordered structure that can be seen if we look in the right place$\dots$

Exit basins and their properties

Ideally, we would like to relate a whole range of inputs to their corresponding outputs in one go. A nice way to do that is to think of $x_0$ and $\theta_0$ as labelling the horizontal and vertical axes, respectively, in a plane (just like the $x$–$y$ axes in coordinate geometry). Let us impose a square grid on a section of that $(x_0,\theta_0)$ plane, where all the points represent different initial conditions. For each point in turn, the outcome of the computation is recorded. The collection of outputs is then overlaid on top of the grid, and colour-coded according to our three-disc system in figure 1 to create a kind of map.

Figure 2: A demonstration of the butterfly effect in the three-disc system. On the right, $\theta_0$ has been increased by 0.001° compared to the left.

An example is shown below when the gap between the discs is relatively small. The map answers our earlier question in a very direct and effective way. It tells us exactly what a ray does as a function of starting point $(x_0,\theta_0)$, and even cuts out all the unnecessary information about individual trajectories. But a single map is just the tip of a giant iceberg, and many more questions now follow.

Figure 3: Exit basins when the gap width gradually increases (top to bottom: $\ell = 0.07$, $0.09$, $0.11$, and $0.13$) and the plane of initial conditions $(x_0,\theta_0)$ in the left-hand column is $[-0.1,0.1] \times [-10°,10°]$.

We can see that the $(x_0,\theta_0)$ plane is divided into four coloured regions, known as exit basins, which identify a unique output for a given input—white for gap 1, red for gap 2, black for gap 3 (initial conditions giving rise to trajectories that never enter $\Omega$ are shown in grey). The boundaries of these basins form striated patterns that are intertwined in extremely complicated ways. An important feature of the map is its self-similarity, where zooming-in on any portion of a boundary region uncovers smaller-scale substructure which looks like the pattern as a whole. Self-similarity persists all the way down to arbitrarily-fine length scales, and that property is typically one of the signatures of a mathematical fractal.

It has been known since the mid 1990s that the basins can also possess the infinitely-mindbending Wada property which, in visual terms, means the boundary between two colours never quite forms. Amazingly, the remaining colours somehow always nestle in. Wada says that we can never jump from a black region to a red region without also jumping across a white region. Similar is true for other black-red-white permutations. More subtly, any jump over a boundary necessarily involves crossing all three colours an infinite number of times!

As an analogy to the Wada property, think about the points on a number line. Any number is either rational (expressible as a ratio of integers, like $1/2$ or $3/4$) or irrational ($\pi$ or $\sqrt{2}$ are the usual suspects). We cannot jump from $1/2$ to $3/4$ without necessarily jumping over all the numbers in between, both the (countably-infinite) rationals and the (uncountably-infinite) irrationals. The same idea would also hold if we were to jump between any two irrational numbers instead.

It turns out that the basins vary with the gap width $\ell$ (see below). For instance, the map takes up more of the $(x_0,\theta_0)$ plane as $\ell$ increases. That result makes physical sense: we would expect the scattering region to be accessible from a wider range of initial conditions as the gaps become bigger. We can also see, qualitatively, that the pattern complexity reduces as $\ell$ increases since the density of the striations (loosely speaking, the number of stripes per unit area of boundary) drops off as the gaps expand.

The map looks the same when rotated $180^\circ$ with black and white swapped.

The striations in the maps look quite linear, especially when magnified. However, some curving can be seen if we look over a wider range of the $(x_0,\theta_0)$ plane than shown here, particularly when the gap size increases.

There is also a fundamental physical principle determining how the stripes must fit together (perhaps you spotted it earlier). Look on the right: if the black and white areas are swapped over, the map still looks the same! This is a consequence of the fact that there is no difference between gaps 1 and 3. Our three-disc system has a line of mirror symmetry along the $y$-axis, and so the trajectories from $(-x_0,-\theta_0)$ and $(x_0,\theta_0)$ are necessarily mirror images of each other.

Uncertainty and unpredictability

So far, we have found that the butterfly effect can play a key role in scattering, and that basins become less finely-structured as the gaps increase in size. Far from being independent, these attributes are two sides of the same coin. One way to establish their connection is through the concept of uncertainty, thinking more carefully about what it means for a deterministic system to have an uncertain outcome. How can that apparent contradiction be allowed by physics?!

Let us start with an arbitrary initial condition, say $\boldsymbol{x}_0 \equiv (x_0,\theta_0)$, and from that point we can define two nearby initial conditions which are $\boldsymbol{x}_{0+\varepsilon} \equiv (x_0+\varepsilon,\theta_0)$ and $\boldsymbol{x}_{0-\varepsilon} = (x_0-\varepsilon,\theta_0)$. The small positive number $\varepsilon$ satisfies the inequality $\varepsilon \ll 1$. Equally, we could consider $(x_0,\theta_0+\varepsilon)$ and $(x_0,\theta_0-\varepsilon)$; it makes no difference to what follows. If the trajectories from all three starting points lead to the same outcome, then the input $\boldsymbol{x}_0$ is not susceptible to the butterfly effect when disturbances are of size $\varepsilon$.

We can quantify the uncertainty in a fixed region of the $(x_0,\theta_0)$ plane. Select at random a large number $N$ of $\boldsymbol{x}_0$ points within that region, and let $N_\varepsilon$ be the number of those points which exhibit the butterfly effect at the scale $\varepsilon$. Then, the fraction of our initial conditions with uncertain outcomes is given by the power law ${f_\varepsilon \equiv N_\varepsilon/N \sim \varepsilon^{2-D}}$. The pure number $D$ is the uncertainty fractal dimension. Smooth boundaries are associated with $D = 1$, a value that coincides exactly with the dimension of a typical line in Euclidean geometry (where points are 0D, lines are 1D, areas are 2D, and volumes are 3D). Accordingly, $f_\varepsilon$ scales with $\varepsilon^{2-1}=\varepsilon$, so that halving $\varepsilon$ halves the fraction of initial conditions with uncertain outcomes.

Fractal boundaries are those with $1 < D < 2$, where $D$ is non-integer and lies somewhere between the Euclidean dimensions of a line and an area. We can think of such boundaries as becoming increasingly rough and irregular (sometimes called area-filling) as $D$ approaches 2. In those cases, $f_\varepsilon$ scales with $\varepsilon$ to a power that drops towards zero as $D \rightarrow 2$. The crux of the matter is that for fractal boundaries, the proportion of uncertain points, $f_\varepsilon$, can fall off relatively slowly even for big reductions in $\varepsilon$.

Uncertain outcomes are associated with circle $\Sigma^\prime$ (which overlaps a basin boundary) but not with circle $\Sigma$.

An intuitive way to visualise unpredictability is to consider a small circle $\Sigma$ with radius $\varepsilon$ in the $(x_0,\theta_0)$ plane, as on the right. The centre of that circle of uncertainty is placed on the ‘ideal’ initial condition, $\boldsymbol{x}_0$. However, if we know $\boldsymbol{x}_0$ only to within an error $\varepsilon$, the ‘true’ initial condition may lie anywhere in the neighbourhood $\Sigma$ around $\boldsymbol{x}_0$. If $\Sigma$ contains only one colour, the outcome is independent of the finite accuracy. Uncertainty appears whenever $\Sigma$ impinges on a basin boundary, in which case all colours are contained and the outcome cannot be predicted. That is a recipe for a deterministic system to behave unpredictably, but in such a way that our faith in physics (thankfully) remains intact.

After a quick calculation, we find the following formula for relating $D$ to the way in which $f_\varepsilon$ varies across the scales: \[ D = 2 – \frac{\textrm{d}\;\textrm{log}_{10}f_\varepsilon}{\textrm{d}(\textrm{log}_{10}\varepsilon)}. \]

Computed log-log plots for the three-disc system using $N = 2^{20}$ initial conditions. For the five values of $\ell$ considered, straight-line fits have slopes corresponding to uncertainty dimensions $D \approx$ 1.92, 1.89, 1.86, 1.84, and 1.81, respectively.

The slope of the log-log graph is thus a crucial piece of information, and it needs to be obtained by curve fitting. A set of graphs is shown on the left for the centre panes of the exit basins in figures 2 and 3 when $\varepsilon$ varies across six decimal orders of scale. For $\ell = 0.05$, we find $D \approx 1.92$ while for $\ell = 0.13$, we find $D \approx 1.81$. The largest (smallest) uncertainty dimension occurs for arrangements with the narrowest (widest) gaps, and the conclusion we can rightly draw is that the smaller the gap, the greater the sensitivity to initial fluctuations.

But what does it all mean more generally? It means something quite profound: systems associated with larger values of $D$, irrespective of their physical nature, have more complicated basin boundaries. So in a sense, the fractal dimension becomes a convenient yardstick for comparing the susceptibility of different systems to the butterfly effect.

Concluding remarks

We’ve applied some very deep ideas—ideas that have come to play a pivotal role in modern understandings of physics—to a toy scattering problem. But the bigger picture to think about is that simplicity can very often be deceptive. Uncertain outcomes are present whenever finite precision in our knowledge of initial conditions becomes important, not just here but pretty much everywhere. Perhaps one of the most important scientific discoveries of the 20th century is that unpredictability, so beautifully encoded within fractal basin boundaries, is in the DNA of physical law.

If you’re still not convinced, or if you want to amaze people with all this stuff, take four silvered spheres such as Christmas tree decorations and a bit of Blu-Tack or glue (see How to make). Form a tetrahedron with the decorations, place it on top of a mirror, and surround it on three sides with three pieces of cardboard (each of a different colour). Now look into the gaps of the tetrahedron. What you’ll see is chaotic scattering in action, and an impressive fractal pattern made possible by the law of specular reflection. You can see an example in the banner image at the top, and read more about it in the article Christmas Chaos by Windell Oskay.

I always derive great pleasure from seeing simple systems behave in complicated ways (the greater the contrast, the better). And it is inevitably mathematics that allows us to unpack what is going on. The fact that even the most trivially-familiar laws of Nature can conspire to create such complexity should be a source of wonder and inspiration to us all—not just Chalkdust readers!