In conversation with Kat Phillips

Many undergraduates at some point consider doing a PhD after their degree—but it can be confusing when trying to work out what a PhD is really like, or even how to apply. That’s why Kat Phillips, a PhD student in fluid dynamics at the University of Bath, has taken it upon herself to demystify the whole experience. We sat down with Kat to talk about her outreach projects, Twitch streaming her PhD work, and how her local stadium came to be named after her.

Kat first became interested in doing a PhD during her undergrad degree at Cardiff University: “I think it was probably third year when I realised more concretely what direction I wanted to go in. I had a lecturer who taught me fluid dynamics, and she was incredible, just the way that she delivered a lecture was very clear. She was such a big inspiration and I thought: `that is what I want to do.'” This sparked her interest not only in fluid mechanics, but also in outreach. “It’s not just the research that I like. It’s the learning, and the teaching, and the communication. My dream job at the minute is something that lets me keep doing all my outreach stuff while letting me muck around at a university, sitting at a blackboard not talking to anyone.”

So what exactly is Kat researching? “I would say that I sit in classical fluid dynamics, on the bridge between analytical and numerical. The system that I’m looking at is bouncing droplets on a deep bath.” Droplets impacting onto liquid surfaces are seen all the time in nature (eg when it rains), and this has a number of important industrial applications too. In aerodynamics, their impact with wet surfaces can cause large ice accretions on aircraft surfaces. In industrial spray painting or inkjet printing, it is important to understand droplet impacts to achieve an even coverage.

“Historically, when people have looked at impacting droplets, they say: the droplet falls. There is some pressure transfer that is possible, because there is an air layer keeping the droplet and the bath separate. The bath acts as a trampoline and kicks the droplet back up, and then it flies away.” We can see something like this happening in the picture below this paragraph: this is the droplet after impact, having been launched back from the liquid surface. “What I’m doing is looking at a single impact and trying to compute the air layer between the droplet and the liquid surface dynamically. The goal is, throughout the evolution, to constantly update what that pressure transfer does. And we do that using lubrication theory. So it’s a really cool coupled system between this droplet, which is deformable, and a deep liquid bath with an impacting point. And you’ve got a thin film that’s technically being deformed at both boundaries. All of that’s happening in one big Matlab code.”

A water droplet after impact

A water droplet after impact. Image: Davide Restivo, CC BY-SA 2.0

A photo of a desk, with a laptop, headphones, and graphics tablet.

Kat’s first ever streaming setup

Stream big

Alongside her research, Kat has a Twitch channel where she livestreams about her PhD experiences, and about maths more generally. She started her channel in lockdown, inspired by the move to online learning.

While teaching undergraduate classes over Zoom, Kat encouraged her students to communicate with her however they felt most comfortable: “I think I told one of my kids that they could send a carrier pigeon if they could figure out how to do that—I will accept any form of communication.” This seemed to work for Kat’s students, who preferred to leave comments using the chat function rather than turning on their mic. For Kat, this sparked an idea: “in my spare time I was watching a lot of Twitch, and I realised that the way that I was teaching was exactly how I was watching my entertainment with streaming. So I had this mad idea: if I can do it in a Zoom call, I can do it on Twitch.” And so, Kat’s Twitch channel, KatDoesMaths, was born.

Kat’s livestreams are split into three categories: pomodoros, maths office hours, and gaming. For Chalkdust readers who are unfamiliar with the concept of pomodoro, this is a productivity technique developed in the 1980s by a (then) university student, Francesco Cirillo. Armed with a tomato-shaped timer from his kitchen, he would do 25-minute bursts of uninterrupted focused work, with five-minute breaks between them. This technique of splitting a period of time into several productive intervals, interspersed with short breaks to minimise distractions, is still a popular time management method used by students and workers around the world today.

Kat uses the pomodoro technique often, and livestreams some of these sessions. “During the sessions I am completely quiet. I share my screen, but it will be slightly blurred, so people can usually tell if I’ve got Matlab open, or if I’m writing in Overleaf, or just what sort of software I’m using.” The audience can work alongside Kat, and chat during the breaks: “I find that during the breaks is a good chance to chat, because you have a dedicated topic. You can explain what you were just doing, and a lot of people do like asking about what I’m up to.”

Through these sessions, Kat can interact with her audience, while also having time to work on her PhD: “it’s really good for me as well. I find that I do my best work when I’m streaming, because it forces me to get off Twitter and actually do work.” As she enters the final year of her PhD, her livestreams are mostly pomodoro sessions: “I do pomos the most, because work is a bit manic right now as I’m going into my final year. They also require the least amount of prep from me.”

Not all of Kat’s sessions follow this format. “The other streams that I do—I call them maths office hours—are more going through maths problems. A few times I’ve given talks on Twitch, depending on how much time I have to prep.” The audience can request topics for these office hours, and Kat will answer questions live. “Typically, I’ll be going through past exam questions or teaching a topic. But the goal is not to have a lesson: the whole point is that it’s a conversation.” The topic of the streams isn’t rigid: “it’s normally a war of attention with me going through a topic and then getting sidetracked by something like ‘hey, have you seen this cool way to divide by nine really quickly?'” And it’s not just maths students who join these streams: “oftentimes, I’ll get people from all different stages of careers and with different background experiences chatting.”

Kat also streams herself gaming, usually puzzle-related games, though these happen the least of the three. “Ideally, I aim for about one office hour a week, and then a game stream every now and then, when I’d want to or when I have something to play. Then I do pomodoros whenever I’m at my desk back at home and doing work.”

Through these streams, Kat has developed an audience of regular viewers. “I think one of the really nice things about Twitch is that it encourages that community. I have a Discord as well, where followers can chat to each other, as well as to me.” Although these conversations often centre around maths, most of Kat’s audience are not, in fact, currently engaged in formal maths education. “A lot of the people that will engage with me on my streams are people who are already in the workforce, but maybe were either interested in maths as a kid, or are doing something in a tangential field. Generally speaking, I kind of assume that the person that I’m talking to on the other end is just someone that kind of has a vague interest in maths, but not studying maths.” And this audience engages with the different streams in different ways: “I think there’s definitely a Venn diagram somewhere of my whole community, and different people enjoy different parts of it. The ones that are coming to the pomodoros typically will be studying something, or they’ll be working on a project.” While some of the audience may have initially been attracted to Kat’s Twitch because of one type of livestream, they often stay for the others. “The people that come to the game streams, I’ll normally end up trying to lure them into the office hours. Most of the audience are used to seeing games, and then you slowly move them into thinking about maths.” This has worked for Kat, with more people coming to view her streams: “it’s a nice growing community, I’d say.”

A screenshot of Kat's Twitch profile, showing thumbnails for two recent streams.

Some different streaming highlights

This community of viewers follow Kat’s journey through her PhD, gaining an insight to the highs and lows of the process. “I think it’s good for people to see a version of a PhD that isn’t just ‘we get it right all of the time’. My community have been with me for the last two years of my PhD, and it has been ups and downs.” One thing she has shared with her audience is her experiences with navigating the process of writing a paper: “something that keeps coming up is the PhD dream of writing a paper. Every time that gets knocked back, I’ll tell my audience why and how, because it’s easy to feel like that’s a really bad thing. But it is what it is, and you just have to work with that.” Kat hopes that talking about this online will help destigmatise the PhD process, saying “especially if you’re far away from academia, you have this idea of what it’s supposed to be, but it’s different for everyone.”

A particular highlight of Kat’s streaming happened during an office hour, where Kat ran into a bit of a surprise. “There’s this functionality in Twitch where at the end of a stream, you can send all of your viewers on to watch someone else.” During an office hour where she was teaching herself hypothesis testing, a popular streamer sent his audience over to KatDoesMaths. “I’m not a statistician by trade. I was trying to teach myself, making an absolute mess of it… when he raided my channel with 1800 people. So not only was I fangirling over this big celebrity suddenly watching me do maths, but there were also 1800 people there watching me trying to do hypothesis testing. That is still one of the highlights of streaming this far.”

Behind the research

Outside of Twitch, Kat is heavily involved in several initiatives revolving around demystifying academia—from PhD applications to showcasing what day-to-day research actually looks like. “I think anyone that wants to get into academia should have the opportunity to. I think it’s a dangerous game we play when we say that everyone should have a certain level of qualification. I think that the ability to get the qualifications should not be the limiting factor. If you want to do a PhD, just because you really like learning, I think that’s a good enough reason to. Often people miss out on opportunities because they don’t know what’s available to them, or what’s possible for them. I think that’s such a shame because it favours people that don’t need the extra help already.” One thing Kat mentioned in particular was the lack of awareness that most students have about funding when applying for PhDs: “I think, at the bare minimum, everyone should be aware that there are funding opportunities and scholarships. The idea of paying for a PhD put me off, and I didn’t realise it was standard to get funding until I applied.”

A selfie of Kat and three of her colleagues, with their backs to the stage in a theatre

The Behind the research team

One of the initiatives Kat is involved with is Behind the research, based at the University of Bath. This is a toolkit for students in the maths department with the primary goal of equipping them with the skills and knowledge necessary to do outreach. This toolkit comprises of both a physical kit (laptops, cameras and other technical equipment) as well as information (for instance, which software works well to edit outreach videos). “If anyone in the university department wants to do some outreach using online or video audio visual stuff, I want them to be able to do it. I don’t want the limiting factor to be ‘I can’t afford a camera’. You don’t need high tech equipment to do that sort of thing, but I think it’s nice that they don’t have to do it alone”. The tagline for the initiative is “meet the real people behind the research”, which the team do by producing YouTube videos and blog posts of their experiences.

Kat and the other organisers of Si Mon Sci Comm Con

Kat also co-organises PhD Your Way—a national open day focusing on maths PhDs, bringing together institutions from all over the UK to inform prospective PhD candidates about the process. “It’s independent of any one institution. The idea is that we bring all of the universities to one place, so that you know the information you have to get isn’t hidden on different websites under different names.”Current PhD students also come along, so attendees can ask them questions about their experiences: “you can actually ask them, not only all the admin stuff, but what is it actually like living there? What’s the social life like? Is there support if I have dependants? Or if I’m part time, how does that work? Is the mental health support good? All of that information.” This event has been successful in the past: “I met someone the other day that decided to do a PhD because of this event, so we have influenced at least one person. I’m very proud of that.”

Kat is also beginning to take a step into in-person outreach talks. Later this year, she will be touring with Education in Action, giving a series of talks for their Physics in Action sessions. “The goal is to show that fluids are everywhere, and that one set of equations can describe everything.” This is Kat’s first solo outreach in-person lecture experience, and we spoke about the jump from online outreach to in-person talks: “I think a lot of people are trying to go the other way, starting with doing more traditional maths communication, like talks and going into schools, and then working outwards to online content. Whereas I’m trying to jump the other way.” But Kat told us that “going this way has worked really well for me in terms of my public speaking skills”, and she’s really excited for the talks. “The talks in public, when you give a big lecture, are really cool. You get the energy off the crowd.”

Katie Phillips Stadium

Before saying goodbye to Kat, there was one more thing we wanted to chat to her about. Everyone knows the sudden pang of fear that comes when someone asks the dreaded icebreaker question: “so—what is a fun fact about yourself?” Everyone except for Kat, that is. How many people can say they had a rugby stadium named after themselves? Kat told us: “I’m from South Wales, and I have been a big fan of the Ospreys since I was a child—in part by choice, and in part because in South Wales, you get disowned if you don’t like rugby”. Kat was joking about being disowned of course: what reasonable person could disown someone for simply betraying their entire culture. Kat had been a season ticket holder from the start of the rugby club, and so on their tenth anniversary, she was entered into a celebratory raffle they were holding. “It just so happened that I won the raffle. So, when I was 17, they called me up and said ‘for one game against Glasgow Warriors, we’re changing absolutely everything in the stadium’.” And so on that fateful day, Kat arrived to watch the Osprey v Glasgow Warriors game at the newly-named Katie Phillips Stadium. “They printed fake temporary boards—saying the Katie Phillips Stadium—and they put them up. The programme said Katie Phillips. All of the commentators had to go along with it. It was a really, really surreal experience.”

The future

So what will KatDoNext? In the immediate future, she plans to continue with her PhD and livestream the process: “a lot of my time is spent doing pomodoro sessions, especially coming into the final year of my PhD. As soon as I go into writing up status, I’m definitely going to be streaming for hours, I think.” But for the distant future, she hopes to stay within academia: “I want to keep going as far as I can. I’m still in love with my research, fundamentally, fluid dynamics is just really cool, and I will never not have that opinion. I think so. Getting to study fluids is pretty dope. So yeah, I’ll keep doing that for at least a little bit longer.”


The ninth Dedekind number

In April 2023, a new mathematical milestone was reached when scientists discovered the ninth Dedekind number:


To be pedantic, this newly-discovered number is actually the tenth in a sequence of rapidly increasing integers, first defined by mathematician Richard Dedekind in the 1890s. Like all sane people, Dedekind believed in indexing from zero. So, the first number in this sequence is the zeroth Dedekind number, the second is the first, and so on. Not confusing at all.

This ninth Dedekind number was, in fact, simultaneously and independently discovered by two separate research groups, both unaware of the other’s work. Christian Jäkel, mathematician at TU Dresden, published his algorithm and computation of the 42-digit number on 3 April. Three days later, Lennart Van Hirtum with a team from KU Leuven and Paderborn University published a paper, employing a very different technique. They got the same result—a relief for both, no doubt.

Before tackling the mathematical definition of Dedekind numbers, let’s rewind 192 years. Julius Wilhelm Richard Dedekind was born on 6 October 1831 in Braunschweig, Germany. Deciding that Julius Wilhelm Richard was too much of a mouthful, he dropped his first two names before starting his mathematics studies at the University of Göttingen in 1850. He finished his PhD just two years later, as Gauss’s last doctoral student, before moving to Berlin where he hung out with Riemann for a couple of years. Two years of hypothesising with Bernard was enough for Richard. He moved back to Göttingen, where he taught courses on geometry and probability, and became good friends with Dirichlet. Later, after a brief stint in Zürich, he returned to his hometown in 1862, where he worked and taught for the rest of his life, until his death in 1916.

Dedekind gave the field of mathematics many breakthroughs. In 1888, he published Was sind und was sollen die Zahlen? (roughly translating to ‘What are numbers and what are they good for?’) which included his definition of an infinite set. The Dedekind cut is a standard definition of the real numbers. Dedekind groups are groups where every subgroup is normal.

And Dedekind numbers, which he first defined in 1897, are a sequence of integers describing the number of different monotonic boolean functions of $n$ variables! Until March 2023, we only knew the first nine (up to $n=8$) of them. They are:

2, 3, 6, 20, 168, 7581, 7828354, 2414682040998, 56130437228687557907788.

Now, I know what you’re thinking. What the heck is a ‘MoNoToNiC BoOlEaN fUnCtIoN oF $n$ VaRiAbLeS?’


A Boolean variable is a variable whose value is either $1$ or $0$, or equivalently, true or false. A Boolean function is a function whose inputs are Boolean variables, and its output is also a Boolean variable. These are sometimes called switching functions, truth functions or logical functions.

A simple example of a Boolean function is one which is constant, always returning the same value no matter what the input is. For a Boolean variable $x$, the function
\[f(x) = 1\]
is a constant Boolean function. Another simple example of a Boolean function is the negation function:
\[f(x) = \begin{cases} \;0 & \text{if } x=1\\ \;1 & \text{if } x=0 \end{cases} \]
which negates the value of the input. In true/false terms, it returns false if $x$ = true, and true if $x$ = false. Another way of defining this function is with a truth table:

\[x\] \[f(x)\]
0 1
1 0

Truth tables are a standard way of defining Boolean functions. A truth table has one column for each input variable, plus one final column showing all possible results of the function. Each row is one possible configuration of input variable values, and the result of the function for those values.

In Boolean algebra, the negation function, or not operator, is represented by the notation: $\neg x$. You might recognise this symbol from mathematical logic. The notation for the and and or operators, $\land$ and $\lor$, may also be familiar. This leads nicely into defining simple Boolean functions for two variables.

Truth table for and:

\[x\] \[y\] \[x \land y\]
0 0 0
1 0 0
0 1 0
1 1 1

Truth table for or:

\[x\] \[y\] \[x \lor y\]
0 0 0
1 0 1
0 1 1
1 1 1

For not, both $x$ and $y$ must be true for the function to return true. For or, either $x$ or $y$ can be true for the function to return true. In terms of ones and zeros, we can write these functions with ordinary arithmetic operations: $x \land y = xy$, $x \lor y = x + y-xy$, $\neg x = 1-x$.


In maths, the term monotonic describes something that changes in a way such that it either never decreases or never increases. I like to think of it as meaning ‘always going in the same direction’. For example, the function $f(x) = \mathrm{e}^x$ is monotonic, but $f(x) = \sin(x)$ is not monotonic.

A Boolean function is monotonic if, for any combination of inputs, switching one of the inputs from 0 to 1 only results in the output switching from 0 to 1, and not from 1 to 0. More precisely, an $n$-variable Boolean function is monotonic if the following holds: for two different inputs $\boldsymbol{x}=\{x_1, x_2, \dots, x_n\}$,  $\boldsymbol{y}=\{y_1, y_2, \dots, y_n\}$ with $x_i, y_j \in \{0,1\}$, if $x_i \leq y_i$ for all $i$, then $f(\boldsymbol{x}) \leq f(\boldsymbol{y})$.

This means, and eagle-eyed Chalkdust readers may have already spotted, that the or and and functions shown above are both examples of monotonic Boolean functions. But negation, $\neg x$, is a non-monotonic Boolean function.

In fact, the monotonic Boolean functions are defined by expressions which combine inputs using only the and ($\land$) and or ($\lor$) operators. The not ($\neg$) operator is forbidden. If it appears in the simplest form of a Boolean function’s definition, then the function is not monotonic.

To get a little more notationally-spicy, when other operators in Boolean algebra appear in a function, namely implication ($x \rightarrow y$), bi-implication ($x \leftrightarrow y$) and exclusive or ($x \oplus y$), they are non-monotonic by construction.

This is because writing these operators with and and or requires the use of not. Implication, which means ‘if $x$ then $y$’, can be written as: $x \rightarrow y = \neg x \lor y$. Bi-implication, also called ‘equivalence’, is equivalent (haha) to: $x \leftrightarrow y = (x \lor \neg y) \land (\neg x \lor y)$. Exclusive or, which returns true if and only if $x$ and $y$ are different and so is sometimes called ‘non-equivalence’, can be written as: $x \oplus y = (x \land \neg y) \lor (\neg x \land y)$. So, if you come across a Boolean function with any of these operators, you’ll know it’s non-monotonic. Alright, that’s enough notation overload.

Let’s look at a more complex example of a monotonic Boolean function, this time for three variables. The function \[ ((x \land y) \lor (x \land z) \lor (y \land z))\] can be expressed simply as ‘at least two of $x$, $y$ and $z$ are true.’ We can see this function is monotonic by examining its truth table, but we can also see this by drawing a Hasse diagram. These are diagrams that crop up in order theory, but for our intensive purposes, there is simply a directed edge in the graph from one node to another when one of the inputs changes from 0 to 1. We colour the nodes in the graph to represent the output of the function. If the output of our Boolean function, $((x \land y) \lor (x \land z) \lor (y \land z))$, equals $0$, the node is orange, and if it equals $1$, the node is blue.

Hasse diagram and…

A Boolean function is monotonic when its Hasse diagram representation, sometimes referred to as an $n$-cube labelled with truth values, has no upward edge from true to false. In our case, all the blue nodes are above the orange, so this Boolean function is monotonic. It’s worth mentioning, because I’m apparently incapable of writing an article without encountering them, that this function is also commonly represented with… you guessed it, a Venn diagram (ta-da!).

… the corresponding Venn diagram of the function ‘at least two of $x, y, z$ are true’.

Speaking of cubes… this delivers us nicely to an alternative way of defining Dedekind numbers. The fun thing about Dedekind numbers is that they actually have multiple definitions, which you may find easier to digest depending on your preferred flavour of mathematics.

There are three common ways to define Dedekind numbers: in terms of Boolean functions, in set-theoretic terms, and in geometric terms by colouring the corners of an $n$-dimensional cube. Other definitions include: the number of antichains of subsets of an $n$-element set, the number of elements in a free distributive lattice with $n$ generators, or one more than the number of abstract simplicial complexes on a set with $n$ elements.

I’m no pure mathematician, let alone a set theorist, so I’ll leave these scary-sounding wordy ones for the reader to go ahead and investigate. But cubes? Sure, I can picture one of those!

You can think of a monotone Boolean function as game with an $n$-dimensional cube. Balance the cube on one corner and then colour each of the remaining corners either blue or orange. The rule is this: you may never place an orange corner above a blue one. The number of different ways of doing this for $n$ dimensions is equivalent to the $n$th Dedekind number.

It makes sense that this cube-colouring definition is equivalent to the Boolean function one. An $n$-dimensional cube has $2^n$ vertices. For a Boolean function of $n$ variables, there are $2^n$ possible input states since each variable can either be 1 or 0—each state matches up with one of the vertices of the $n$-cube. For each possible input state, there are two possible outputs, again either 1 or 0—this corresponds to colouring each vertex of the $n$-cube in one of two colours. Thus, there are $2^{2^n}$ possible Boolean functions of $n$ variables—the same number of ways to colour the corners of an $n$-cube.

$n$-dimensional cubes for $n = 0, 1, 2, 3, 4$

$n$-dimensional cubes for $n = 0, 1, 2, 3, 4$


Let’s start with the trivial case, $n=0$. We’re dealing with a zero-dimensional cube, which is actually just a single point (a zero-dimensional anything is a single point). And since this single point can either be blue or orange, we see that the zeroth Dedekind number is 2. In terms of Boolean functions, the functions that have zero variables are the constant functions: $f=0$ and $f=1$, and both these are monotonic. This is another way of seeing that the first Dedekind number is $2$. Hurrah!

Onto the next. When $n=1$, we’ve got one variable to play with. What could we possibly do with it? Well, we could either set it to a constant, $f(x) =0$, $f(x) = 1$, or we could return it via the identity function: $f(x)=x$. This identity function satisfies all our requirements for being a monotonic Boolean function. The only other possible thing we could do with one variable is negate it, $f(x)=\neg x$, but recall that negation is forbidden when monotonicity is required. Thus, the number of monotonic Boolean functions with one variable is $3$. Woohoo!

A one-dimensional cube is just a line connecting two vertices. There are only three ways to colour these two vertices such that an orange node is never above a blue one. So again, we see that the first Dedekind number is 3.

Now onto $n=2$, where we’ve got two variables to play with. Now is a good time to bring up the fact that any function that is a composition of monotone Boolean functions is itself monotone. In fancier words, the ‘class of all monotone Boolean functions is closed’, and the set of functions $\{0, 1, \lor, \land\}$ is a complete basis for this class (proof left as an exercise for the Boolean-algebra-enthusiast reader).

For $n=2$, we still have our constant and identity functions. But now we can also use the other functions in our basis. The monotone Boolean functions of two variables are: $f(x,y) = 0$, $f(x,y) = x$, $f(x,y) = y$, $f(x,y) = x \lor y$, $f(x,y) = x \land y$, and $f(x,y) = 1$. So the third Dedekind number is $6$!

Let’s revisit the Hasse diagram representation with these functions, keeping in mind that we now have our cube-based definition of the Dedekind numbers.

Hasse diagrams for monotonic Boolean functions of 2 variables, where orange nodes indicate the function returning false and blue nodes indicate true.

Hasse diagrams for monotonic Boolean functions of 2 variables, where orange nodes indicate the function returning false and blue nodes indicate true.

In all of these diagrams, there is never an orange node on a level above a blue one, so we know these are all monotonic. If we look at the Hasse diagram for a function that includes negation, we see that there are false nodes above true ones, so these cannot be monotonic.

Some Hasse diagrams for Boolean functionsof 2 variables. These all have 𝑡𝑟𝑢𝑒 nodes above 𝑓𝑎𝑙𝑠𝑒 ones, so are all nonmonotonic.

Some Hasse diagrams for Boolean functions of 2 variables. These all have 𝑡𝑟𝑢𝑒 nodes above 𝑓𝑎𝑙𝑠𝑒 ones, so are all nonmonotonic.

If we think about these as 2-dimensional cubes (AKA “squares'”) balancing on one corner, consider the question: how many different ways are there to colour the four corners with two colours? Well, it’s $2^4 = 16$. But only six of these possible ways correspond to monotonic Boolean functions.

Thinking about the problem in this way, it’s easy to see how the Dedekind numbers can become gigantic very quickly. How many different ways are there to colour the corners of an $n$-dimensional cube balancing on one vertex with two colours, such that there’s never an orange corner above a blue one? When $n=3$, the cube has eight corners, so there’s $2^8 = 256$ different ways to colour the vertices, but how many of these correspond to monotonic Boolean functions?

I’ve lost count

It turns out, for $n=3$ the answer is $20$. And for $n=4$, out of a possible 65,536 Boolean functions, only 168 are monotonic. In 1897, when Dedekind first defined the sequence, this was as far as he got. It took over forty years, and the invention of computers, for the next numbers in the sequence to be determined. The last Dedekind number to be found, $n=8$, was discovered in 1991. Finding the ninth Dedekind number was an open challenge for 32 years.

Until earlier this year! But how did they do it?

Christian Jäkel, at TU Dresden, developed a computer algorithm that used matrix multiplication and symmetries in a free distributive lattice (which appears in our list of equivalent definitions earlier). He developed a technique to ‘shift’ or ‘jump’ up to four spaces forward, and calculate the proceeding numbers in the sequence. With this algorithm, from a free distributive lattice with $168$ ($n=4$) elements, he could calculate the eighth Dedekind number in just three seconds. So, all he needed to do was run the same code on the $n=5$ lattice.

The $n=5$ lattice, however, has $7581$ elements. He rented eight graphics cards and ran the algorithm for 28 days, before the 42-digit number finally appeared on 3 April. Thank goodness he didn’t miss a minus sign.

Meanwhile, Lennart Van Hirtum, with the team at KU Leuven and Paderborn University, pursued an entirely different approach. Their approach started back in 2014, when Patrick De Causmaecker and Stefan De Wannemacker found a new formula for counting anti-chains. Seven years later, as a master’s student in Leuven, Lennart found a way to simplify the formula, priming it for numerical computation. With specifically-designed hardware and specialised parallel arithmetic units, they ran the calculation on the Paderborn supercomputer for three months, before the answer for the ninth Dedekind number was revealed to them.

Lennart’s code actually finished running on 8 March, almost a month before Christian’s preprint appeared on arXiv. They were, in fact, running the code for a second time to validate their answer when Christian’s paper came out. Since their number was the same, Lennart and the team hurried to share their own result a few days later. They published their paper on arXiv on 6 April.

To give you a sense of how big 28638657766829841 1128469151667598498812366 ($\approx 2.86 \times 10^{42}$) actually is, there are $7.5 \times 10^{18}$ grains of sand on Earth. There are an estimated $2 \times 10^{23}$ stars in the universe. These values are around $10^{20}$ times smaller than the ninth Dedekind number.


On the cover: flexahedron

Those who have spent as many hours as I have watching maths YouTube videos may well have come across the idea of a flexagon. Flexagons are origami-like models of 2D shapes, normally made out of strips of paper. As the name suggests, they can be folded (or flexed) to reveal new faces not previously seen. Flexagons hide their complexity in layers of faces, but the 3D analogy of a flexagon—a flexahedron, or sometimes called a tri-kaleidocycle—puts that complexity on show.

As a lover of polyhedra, these 3D flexagons fascinated me and ever since I first heard the word flexahedron, I’ve had ambitions of designing my own. These shapes are so beautiful that I can’t understand why anyone wouldn’t want to!

The first flexahedron I ever saw was the (disgustingly named) infinity cube, which I came across when I was 11. Initially, it appears to comprise of eight small cubes in a $2\times2\times2$ arrangement. On further inspection though, its components are connected together in such a way that twisting them morphs from large cube, to cuboid, and back to cube through a number of different transitions.

An infinity cube

The Yoshimoto cube—designed by Naoki Yoshimoto—builds on the ideas of the infinity cube, but the result is something even more beautiful. Each of the small cubes in the infinity cube is dissected through three of its diagonals, with the entire shape splitting into two parts that each move in the same way as the whole. What’s more, when folded up individually each of these creates a stellated dodecahedron.

One half of a Yoshimoto cube at various points in their flexing journeys

Naoki Yoshimoto, whose first cube was unveiled in his From Cube to Space exhibition, also designed two further flexahedra based on the cube. These both consist of circles of polyhedra connected at their edges, able to form exciting shapes as they cycle and fold. Each of these flexahedra is made of a loop of identical (up to reflection) polyhedral modules joined together in a repetitive way, giving them a pleasing symmetry.

The two halves of a Yoshimoto cube

We can make all these models using nets. For the original Yoshimoto cube, we make eight modules from this net:

We can work out the relative lengths of the sides—every edge of the module is either an edge of the cube it dissects, or lies along a diagonal of the cube. A quick application of Pythagoras’ theorem tells us that the diagonal is $\sqrt3$ times the length of a side, so the equal sides of each isosceles triangular face are $\sqrt3/2$ times the cube side length.

Before designing my own flexahedron—which I’d decided would be an imitation of a Yoshimoto cube—I had to first consider the necessary qualities. Symmetry seems to be crucial to a good flexahedron: all those that we have encountered so far are made of either identical modules, or equal numbers of mirror-image modules. This both makes the shape feel purposeful and allows it to exploit the quirks of geometry. Another important aspect of all the flexahedra we have discussed is their ability to cycle: we can repeat the same mechanical moves without ever having to reverse them. For me this is vital, as it gives a certain freedom of movement, and contributes to the fluidity of the shape. One of the most impressive things about the original Yoshimoto cube is that it seems as if it achieves this with nothing to spare—we know the modules are arranged in a ring in terms of how they are connected, but there is never a gap between its parts as it folds. Finally, I want there to be something interesting about the shape: maybe it folds into many patterns or maybe it moves in a fascinating way similar to the original Yoshimoto cube.

I wanted the flexahedron to be able to take the form of a `nice’ polyhedron in the same way that the Yoshimoto cube is based around (to nobody’s surprise) a cube. I’ll call this our basis polyhedron. It made sense to begin my construction here, so naturally, I turned my attention to the platonic solids. Yoshimoto had already created a flexahedron with a cube as its basis polyhedron, so I wanted to choose one of the remaining four platonic solids to build my model from. Given the dual of the cube is an octahedron (ie joining the centres of the faces of a cube gives an octahedron and vice versa), I decided that it was the octahedron that seemed most promising.

Considering the symmetries of the basis polyhedron is important, as it suggests a natural dissection, namely cutting the shape along every plane of symmetry. This, however, generally creates more modules than is ideal for the final flexahedron. When a flexahedron has more than an arbitrary limit of, say twenty modules, the final shape resembles more of a floppy loop than anything else. Dissecting a cube or octahedron along its planes of symmetry splits each into 48 parts—far more than is ideal! We therefore seek structure amongst this dissection.

Splitting an octahedron into 48 parts along
every plane of symmetry

After making the 48 sections of an octahedron and experimenting, I found two possible useful sets of modules. The first breaks the octahedron into 16 (non-regular) tetrahedra with this net: The faces of each tetrahedron are right-angled and distinct from the others in the same tetrahedron. To form an octahedron, we need eight copies of this tetrahedron, and a further eight of its mirror image.

An octahedron with one of the 16 tetra-
hedra removed

The modules are joined together along their edges, in what I shall call a connection. To get the fluid loop that I want for my flexahedron, I need to ensure that no module face has more than one connection. The two module connections must therefore be on opposite edges, since the modules are tetrahedra. In order for adjacent modules to connect nicely, ie in order to ensure that equivalent faces on each module are adjacent after joining, we must have each shape connected to its mirror image. This means that after we connect the first two modules, we are forced to make the rest of the ring in a certain way, alternating mirror images and always connecting on opposite edges. In a tetrahedron, there are three sets of opposite edges, meaning there are three different ways to attempt to make a loop. Happily, all three work. Surprisingly, changing just the connection edge creates quite a different octa-flexahedron!

Three octa-flexahedra

This isn’t quite what’s on the cover though: the cover shows a net of a flexahedron made of 12 hexahedral modules, each of which includes an edge of the basis octahedron. Each triangular face of the octahedron is formed by three triangles, meeting at the centre of the face of the octahedron: each module has two of these triangles as faces, and four other faces that are on the interior of the octahedron.

Splitting the octahedron into 12 hexahedra

A net of a hexahedron, and a hexahedron

These modules are symmetric, but the connected edges are not symmetrically arranged—there is some chirality here! There are two ways that this arrangement can form an octahedron, and the curved pattern drawn on the surface on the cover distinguishes them. In one orientation, a circle is formed on a face, in the other the lines connect differently and no circle is formed.

While making these shapes module-by-module is practical, one can also create a giant net from which the whole thing can be folded. Doing this for the stretchy flexahedron produced the design on the cover. The thick cyan lines on the image show where the modules connect, and thus the position of the fabric hinges.

So have a go! I’ve achieved my aim of designing my own flexahedron; now I pass the baton over to you. You can cut out the cover and fold along the lines to recreate my flexahedron. Or if you want a real challenge, make your own!

The octa-flexahedra as seen on the cover


I’m counting on it

These days, abacuses might simply look like toys for teaching children to count, but before the invention of the calculator, there were few better ways to perform difficult calculations. Across the world, there have been many different designs for the abacus: let’s take a look at four of them.

Roman abacus

Roman numerals were not designed for computation, but to record numbers after the computation. Computations were done with either tokens (often pebbles) on a board with lines on it or with an abacus. The use of counting boards in shops persisted well into the Renaissance: today, we still refer to a flat surfaces in shops as counters. If caught cheating customers in Renaissance Italy, a merchant’s counting board (or bank) was physically broken (ruptured) by the local authorities. This is where the word bankrupt comes from.

Long before the Renaissance, the ancient Romans used a metal sheet with grooves cut in it as a counting board. Small pebbles or beads sitting in the grooves represented numeric quantities. The Latin for pebble is calculi: this is the root of many mathematical words, such as calculus and calculation. Perhaps the most interesting feature of the Roman abacus is the part that handles fractions: this could represent any fraction built from twelfths.

Chinese abacus

The Chinese abacus, or suan pan, has large beads, slightly rounded on the edges, strung on rods in a wooden frame. The wooden frame is divided into two sections horizontally. On the top, there are two beads on each rod. This section of the abacus is called heaven. The beads in heaven represent five units for that column. Each column represents a position in a decimal number. In the larger section below the centre dividing bar, we are on earth and each bead represents one unit in that decimal position.

When a bead is pushed toward the dividing bar, it counts as a value in that decimal position. The extra beads are used for carrying values: If I bring all of the beads on earth up to the centre bar, there is a 5 in that column. The five beads on earth should be returned to their starting positions, and replaced with a single bead in heaven. Likewise, if you have two beads in heaven, you should return them and replace them with a single bead on earth in the column to the left. This is basically mechanical scratch paper!

A column carries over when it exceeds the value of nine. This is a mechanical version of the same thing you do when you are adding columns of digits and put a mark by the column to the left of the one in which you are working.

You may have noticed that the extra beads for carrying mean that each column on a suan pan can hold up to a value of 15 units. This means that you can represent hexadecimal numbers. This was very useful when working with early digital computers, particularly those from IBM. The suan pan became a tool for computer nerds in the 1960s and 1970s who were looking at core dumps before it was possible to get octal and hexadecimal options on calculators.

Modern programmers seldom have to work that close to the hardware, so debugging core dumps with a suan pan is probably unknown to them.


In our standard decimal numbering system, we carry over into the next column once we exceed 9. In hexadecimal, we carry over once we exceed 15. Typically the digits 0 to 9 and the letters A to F are used to write the hexadecimal digits 0 to 15.

Every four bits (or nibble) of computer memory can store a value from 0 to 15, and so every nibble can be treated as a hexadecimal digit. The core dumps on early IBM computers essentially printed out all these hexadecimal digits.

Japanese abacus

A traditional soroban

The Japanese abacus or soroban is generally smaller than the suan pan, but usually has more rods. The standard design is usually around 17 to 20 rods. The first thing to notice is that the beads have a different shape from the suan pan: they look more like two identical cones glued base to base. These sharp edges make moving the smaller beads easier: you can quickly slide the beads just by putting your finger between two of them. The extra rods also let you keep longer numbers and more of them when you’re working.

The next major difference is that heaven and earth do not look the same on the soroban as they do on the Chinese suan pan. At the start of the 20th century, the standard was to have one bead in heaven and five beads on earth: as in the suan pan, once five beads on earth are raised they should be replaced with one bead in heaven.

There are dots on the crossbar to let you know where decimal points occur when you keep more than one number for your calculation, such as a dividend and a divisor.

Later in the 20th century, the soroban configuration was changed to one bead in heaven and four beads on earth, so we could only represent the digits 0 to 9. There’s a totally untrue story about this change, which attributed it to the “great abacus bead shortage” of the second world war. The truth is the extra bead was never used, as students were already taught to do the carrying without the extra bead.

A more modern soroban

Russian abacus

The Russian abacus or schoty is not so well known as the Japanese and Chinese versions of this tool, although versions of it appear in Armenia and Turkey, where it was known as a choreba and a coulba. Its real problem is that it was invented in the 17th century, and strictly meant for commerce, not as a general calculating tool.

The schoty has a frame with wires that hold beads, but they are arranged horizontally instead of vertically. Most of the wires hold ten beads, with the fifth and sixth beads being coloured. The rods are slightly curved, so the beads will fall to the right or left side of the wire. The beads are moved right to left. The first beads of the thousand and million wires are also highlighted for easier identification.

The frame is made of wood or metal with 11 wires holding wooden beads. There is no division within the frame, like the other versions we’ve seen.

The eighth row has only four beads on it, with two coloured beads in the centre. This allowed calculations to be performed using the quarter rouble coins that were in circulation. Some older models of schoty had two four-beaded wires (the first and eighth).


The maths before the scalpal

When an aneurysm forms on a patient’s blood vessel, a critical decision has to be made: should the aneurysm be surgically repaired or left alone? The primary concern is the aneurysm rupturing, as this can cause very serious health complications and possibly death. However, surgical intervention is risky and can cause blood haemorrhaging as well as a stroke or brain damage (if the aneurysm is on the brain). So, the million pound question: to repair or not to repair?

Mathematical and computational modelling can help us answer this question. By understanding the flow dynamics of the blood in and around the aneurysm we can make more informed decisions on whether or not surgical procedures should be carried out.

Patient-specific data can be collected through medical imaging techniques such as CT scans (which use a large number of X-rays from different angles to build a computational picture of blood vessels and blood flow), PET scans (which use positron beams to generate computational images) and MRIs (which collect images by causing protons in the body to align with a magnetic field). This data can then be fed into a mathematical model in order to predict the likely outcome of surgical intervention or leaving the aneurysm alone.

A CT scanner. Image: Tomáš Vendiš, CC BY-SA 3.0

An MRI scanner

First we need to consider some general blood flow rheology (ie fluid characteristics). Blood is a non-Newtonian fluid because its viscosity depends on the rate of shear. Imagine a bottle of tomato ketchup: when you tip it upside down the ketchup probably won’t flow out; but if you tap the side of the ketchup bottle, it suddenly begins to flow.

This is because ketchup is a shear-thinning fluid: the shear force you apply when you tap the side of the bottle reduces the viscosity of the ketchup and allows it to flow. Blood is also a shear-thinning fluid. Broadly speaking, the shear-thinning nature of blood is due to the fact that red blood cells tend to cluster together at low shear rates and spread out at high shear rates.

The graphs below show the flow profiles of a Newtonian fluid (eg water) flowing in a pipe and a shear thinning fluid (eg blood) flowing through a vessel.

Flow profiles of water in a pipe…

…and blood in a vessel

We can clearly see a difference: the flow front is more truncated in the case of blood, which indicates a shear-thinning fluid. For fluids like blood and water, fluid shear stress obeys the relation
$$\tau = K\dot{\gamma}^n,$$
where $\tau$ is the shear stress, $K$ is a constant called the flow consistency index, $\dot{\gamma}$ is the shear rate, and $n$ is the flow behaviour index. This is called the power-law fluid equation. Meanwhile, the effective viscosity for these fluids is given by
$$\mu_{\text{eff}} = K\dot{\gamma}^{n-1}.$$
For $n=1$, shear stress and shear rate are in direct proportion—if this relationship holds, the fluid is Newtonian, and we see this means it has a constant viscosity. For values of $n<1$, we have a shear-thinning fluid---increased shear means increased $\dot{\gamma}$, which causes the viscosity to decrease. In the case of a shear-thickening fluid (meaning viscosity increases with shear) such as corn starch, we would have $n>1$.

After a fancy derivation (based on conservation of momentum), we can use the power-law fluid equation to approximate the velocity of the flow at a specific point in the vessel
$$v = \left(\frac{\tau_\text{wall}}{KR}\right)^{1/n}\frac{R^{1+1/n}-r^{1+1/n}}{1+1/n},$$ where $\tau_\text{wall}$ is the shear stress at the wall, $R$ is the radius of the blood vessel and $r$ is the radial position of a given fluid particle.

This is how we get the profiles shown on the left. We can see from this formula that in the case of a shear-thinning fluid ($n<1$) the $1/n$ terms will be greater than 1, meaning that there will be smaller variation in fluid velocity at smaller radial positions and greater variation at higher radial values. This is what leads to the truncated profile.

The main takeaway from this is that in general, blood flow profiles are relatively simple. This is because most blood vessels are very narrow and so viscous forces will dominate over internal forces leading to a laminar (non-chaotic) flow. But this is not the case in an aneurysm.

In an aneurysm, we often see turbulent flows instead of laminar ones. Turbulent flows are chaotic in the sense that the velocity vectors throughout the fluid point in many different directions. The reason that aneurysms facilitate turbulent flows is that there is an abrupt change in cross-sectional area between the blood vessel and the aneurysm attached to it.

Turbulence creates two problems. First, there is the issue that turbulent flows are harder to analyse and predict. Secondly, the turbulence can cause an increase in pressure and shear stress on the aneurysm, leading to an increased chance of rupture.

Sketch of blood flow direction through an blood vessel with a large bulse out the side of it

A simulation of blood flow in an aneurysm might look like this

The blood flow around an aneurysm can be simulated. The flow regime within the aneurysm is more complex than the flow regime in other parts of the blood vessel. To quantify the risk of rupture, we can investigate the wall shear stress—the force exerted by the fluid on the wall. Unfortunately for us, turbulent flows are not nearly as straightforward as laminar ones and we can no longer use our power-law fluid equation.

Instead, we can estimate the wall shear stress from simulations—an approach known as computational fluid dynamics. We can integrate the quantity calculated over time in order to find a time averaged wall shear stress,
$$\overline{\tau_\text{wall}}=\frac{1}{T}\int^T_{0}\left\lvert \tau_\text{wall}\right\rvert\;\mathrm{d}t,$$
where $T$ is the duration of a cardiac cycle.

If the value of the time averaged shear stress is high (especially when localised) then risk of rupture is increased through mechanical weakening of the vessel wall. However, low wall shear stress can be bad news as well. Low wall shear stress promotes disorganisation of the endothelium (the inner lining of blood vessels) which in turn increases this risk of rupture. This is another reason why the problem is so complex as both high and low wall shear stress can increase the likelihood of rupture. It seems that the distribution of wall shear stress is important, with localised areas of high stress along with larger areas of low stress increasing the likelihood of rupture.

It should be noted that cyclic stressing (stretching and compressing) of the aneurysm wall can weaken it. We therefore need a way to quantify the cyclic wall shear stress. We can do this using a quantity called the oscillatory shear index, which is a measure of the fluctuation of low and high shear stress defined by
$$\text{OSI}=\frac{1}{2}\left(1-\frac{\left\lvert\int^T_{0}\tau_\text{wall}\;\mathrm{d}t\right\rvert}{\int^T_{0}\left\lvert \tau_\text{wall}\right\rvert\;\mathrm{d}t}\right).$$
It has been suggested that when the OSI is high, the vessel walls will be weakened over time, further increasing the likelihood of rupture.

Researchers have proposed that the direction of the wall shear stress may be important and should be part of the computations performed. We can measure the wall stress in all directions and then take the divergence in order to measure the effect of shear stress direction. The divergence quantifies whether the stress generally acts ‘outwards’ or ‘inwards’.

General process workflow of predictive EVAR modelling from model generation to the postprocessing of the simulation results

Now we need to consider whether this divergence has a positive value or a negative value and what this means. A positive value indicates stretching of the vessel wall while a negative value indicates compression. We can think of these regions as two propagating waves. Finding the wave centres of the positive and negative divergence regions can also be beneficial. If these centres are close together, then the stretching and compressing effects will compete with each other which can lead to further damage to the endothelium.

What we have shown here is that by approximating a flow we can calculate a number of different parameters which can help surgeons make key decisions.

So far we have talked about studying the risks associated with leaving an aneurysm alone. On the other side of the coin: once we have decided that intervention is necessary, what can mathematics tell us about surgical procedures?

For this we will consider an aneurysm repair procedure called endovascular aortic repair (EVAR). A key part of this procedure is the installation of a stent graft (a fabric tube supported by a metal mesh which can reinforce the aneurysm wall). This stent can lead to long term complications such as occlusion of blood flow (if a kink forms in the graft) or the graft itself can be fractured. This means that the design of the graft as well as its position is vital. Do you want to take a guess at how we optimise these things? That’s right! More computational modelling!

Through the use of medical imaging, fluid dynamics and solid–fluid interaction, we can simulate the surgical process ahead of time and help surgeons to minimise complications, both in the operating theatre and in the long term.


Who needs differentiation?

Finding tangents, computing turning points and establishing extreme values of functions all seem like problems that require calculus for us to solve. But in fact there are often techniques which involve nothing more than some geometric thinking and elementary algebra. Imagine we were in a world where calculus hadn’t yet been invented (or that you haven’t learned any calculus yet)—how would we go about maximising and minimising functions, finding tangents and so on?

As a first problem, let’s find the minimum value of $x+y$ for points on the unit circle $x^{2}+y^{2}=1$.

One way of solving this problem is to consider lines with equation $x+y=c$ for different values of the constant $c$:

Since the circle lies to the right of $x+y=-2$, we can see that $x+y$ has got to be larger than $-2$, as otherwise $(x,y)$ cannot lie on the circle. It should also be clear that we need to choose a value of $c$ so that the line is a tangent to the circle.

We can do this by observing that if we try to find the value of $x$ (or $y$) for which the line is a tangent to the curve, there will only be one solution to the resulting equation (corresponding to the line ‘touching’ the circle at only one point). But this corresponds to the discriminant of the quadratic resulting from solving the two equations simultaneously being zero: substituting $y=c-x$ into $x^2+y^2=1$, then doing a little rearranging gives
The discriminant of this quadratic is
This is zero when $c=\pm\sqrt{2}$, and so the minimum value of $x+y$ for points on the unit circle is $-\sqrt{2}$.

Tangents and turning points

We can use the idea of finding a single intersection point (corresponding to a line being tangent to a curve) to solve some problems where differentiation would be the standard approach: finding tangents to curves and turning points on curves.

As an example of the former, let’s suppose that we want to find the tangent to the curve $y=2x^{2}-5x+5$ at the point $(1,2)$. We know that a line with gradient $m$ through the point $(1,2)$ has equation $y-2=m(x-1)$. The diagram below shows the curve with several lines through the point $(1,2)$:

If we rearrange the equation of the general line we get $y=mx+2-m$, which we can then substitute into the equation of the curve to give
With a bit of rearranging, we can turn this into
The discriminant of this quadratic is
and so the value of $m$ for which the discriminant is zero is $-1$. So the tangent to the curve at $(2,1)$ is $y=-x+3$:

As an example of finding turning points, let’s consider the curve $y=(x^{2}+3)/(x-1)$:

From the diagram, we can see that there are two turning points—shown by blue crosses—where the curve has a local maximum or minimum value. It might be worth noting at this point that we would require the quotient rule if we were to find these points by differentiation—the single intersection method here is certainly more elementary! We start by considering the graph intersecting a horizontal line; some examples are shown on the graph: one intersects twice, the other two do not intersect at all. If a horizontal line touches the curve just once—and so is a tangent—it will do so at one of the turning points. For a single intersection, we again require a discriminant to be zero. A horizontal line has equation $y=c$: solving this simultaneously with the equation of the curve gives $$c = \frac{x^2+3}{x-1},$$ which rearranges to \[x^{2}-cx+(3+c)=0.\]
The discriminant of this quadratic is
so the values of $c$ that give a discriminant of zero are $-2$ and $6$. These correspond to the $y$-coordinates of the turning points, and it’s a quick task to find the corresponding $x$ values, giving the turning points as $(-1,-2)$ and $(3,6)$:

Cubics and beyond

So far, our problems have all boiled down to setting a quadratic’s discriminant to zero and working from there. Unfortunately, with many functions this approach won’t be possible. If we want to use our method above for finding a tangent to a cubic, we find it’s not so straightforward—when we substitute our straight line equation into our cubic we get another cubic rather than a quadratic, so there’s no way of setting the discriminant equal to zero. We can observe that when we reach this second cubic it must have a double factor at the point of tangency, so we can force it to be of the form $(x-a)^{2}(x-b)$ by equating coefficients.

There is, however, a lovely trick using polynomial division to find tangents to polynomial curves, which was brought to my attention by David T Williams on Twitter (and it was still Twitter then…), which is to use the fact that when a polynomial, $p(x)$, is divided by $(x-a)^2$, the remainder gives us the expression for the tangent at $x=a$.

To see why this is the case, write
When $x$ is close to $a$ we can see that $x-a$ is small and so $(x-a)^{2}$ is very small—in other words $p(x)\approx r(x)$. Since $(x-a)^{2}$ is a quadratic, the remainder will be a linear function, so we’re approximating $p(x)$ by a linear function $r(x)$ at $x=a$. But this is exactly what a tangent is! (For those who have studied Taylor series, our remainder $r(x)$ represents the first two terms of the Taylor series for $p(x)$ about $x=a$.) This method works nicely for polynomial functions only, where we can use long division to find the appropriate linear remainder.

As an example, let’s find the tangent to $y=x^{4}-7x^{3}+20x^{2}-37x+42$ at $x=3$.

All we need to do is divide the polynomial by $(x-3)^{2}=x^{2}-6x+9$:

And so the required tangent is $y=2x-3$:

So it turns out there’s a lot we can do with curves without resorting to calculus. In the example above we don’t even need to calculate the $y$-value at $x=3$. if you’re fluent with polynomial division it’s quicker than using differentiation.

As a final teaser, try the following problem using (a) a discriminant method and (b) calculus:

If $x^{2}-2xy-4x+4y^{2}=12$, what’s the maximum value of $y-2x$?


How much hair?

The first Fermi problem I tackled was in an introductory astrophysics lecture. One of the questions on the first problem sheet was: Are there more mosquitoes in the world than stars in our galaxy? Questions like these are intriguing, easy to understand, and have a definite answer. However, with only limited information available, we need to use a creative way of guesstimating to answer them. In principle a mighty (but hypothetical) being like Maxwell’s demon or the flying spaghetti monster would be able to count all mosquitoes and stars at one instance in time (or, as physicists might put it, in one specific frame of reference), but this brute force approach might not be necessary to answer the question: educated guesses can do the job just as well.

Total Eclipse of the Hair-t. Bonnie Tyler would be proud.

The legendary Italian physicist Enrico Fermi was the master of these questions. He was a key player in the Manhattan project to build the first nuclear weapon—you might spot him in the background of Christopher Nolan’s Oppenheimer, which unfortunately doesn’t show how he estimated how much energy the bomb would release. During the first detonation of the bomb, the Trinity test on 16 July 1945, Fermi threw a piece of crumpled paper in the air and observed how far it was blown away by the shock waves released in the explosion. He came up with an instant estimate of the energy released by the bomb, which was later confirmed after the analysis of all the data obtained during the test. Fermi solved a Fermi problem in the most elegant way!

Proposing and solving Fermi questions is an essential skill not only for physicists. Companies use questions like How many golf balls would fit in a car? or How many leaves are on a tree? in job interviews to test applicants’ creativity and capability for abstraction.

For me, Fermi problems lurk around every corner and I think they make life more fun… like the other day when I was waiting at the hairdressers for a new hair cut. This is usually the time when you can think about the universe, and stumble across deep questions of major importance for humankind. Questions like: How quickly is my hair growing and how much hair do I have? As an astrophysicist, I wanted to think about them on a cosmic scale. Next time you’re struggling for small talk at the hairdressers, maybe you can ask them…

Are there more hairs or stars in the universe?

If we want to make an educated estimate, it’s often best to start with simple observations in our daily life. Looking in the mirror, my head is roughly a sphere, with diameter 16cm. About half of this sphere is covered with hair, so the total hair-ea on my head is \[A_\text{head} = \frac12 \times 4\pi \times \frac{(16\,\text{cm})^2}{4} \approx 400\,\text{cm}^2\] or, to use traditional German units, $5.6 \times 10^{-5}$ standard football fields.

Hairy Earps: About half of the surface hair-ea of Mary’s head is covered in hair. Image: James Boyes, CC BY 2.0

If all the hairs were squashed up on top of each other, how many could fit on a head? Each hair is between 50 and $120\,\mu\text{m}$ wide—we’ll call it $80\,\mu\text{m}$—so the average cross section of a single hair is \[\overline{A}_\text{hair} = \pi \times \frac{(80\,\mu\text{m})^2}{4} = 5000\,\mu\text{m}^2.\] This is about $7\times 10^{-13}$ standard football fields. So if all the hairs were tightly packed, we’d get an estimate of $A_\text{head}/\overline{A}_\text{hair} \approx \text{8,000,000}$ hairs on our heads. However, scalp hair is not densely packed—between two single hairs there is a distance. I had another look in the mirror, and observed that the average distance between two hairs is on the order of 1mm. That means between one hair and the next one, about 1mm $/$80$\mu$m $\approx$ 13 hairs could fit.

Rubeus Hair-grid: it’s helpful to assume that hairs grow in a grid pattern with 1mm gaps between hairs. That’s a gap of around 13 hair-widths, so only 1 out of 196 `hair spaces’ are filled.


There’s only one hair spot occupied in each square of $14\times 14$ hair diameters, so we should divide our estimate by 196. This means that there are about to get $\text{8,000,000} / 196 \approx \text{40,000}$ hairs on each person’s head, at a density of roughly 100 hairs per $\text{cm}^{2}$. In reality, hairs don’t grow in a grid pattern, so the actual number is a bit higher: the average person has 100,000 hairs on their head. We’re out by a factor of 2: pretty good going for a guesstimate!

We just need two more facts to be able to answer our question. Firstly, how many stars are there? In the observable universe we believe there are $10^{11}$ galaxies, with $10^{11}$ stars in each galaxy (on average)—so there are about $10^{22}$ stars in total. Secondly, there are about $8\times 10^{9}$ humans in our solar system. We’ve just worked out that each one has about $10^5$ hairs on their heads, adding up to $8\times 10^{14}$ hairs, which we’ll round up to $10^{15}$. Thus we have more hairs in our galaxy than stars but there are more stars in the observable universe than hairs!

How long would it take for a single hair to grow to the moon?

All this talk of stars is lovely, but I wanted to focus on something a bit closer to home. I wondered: how long would it take for a single hair to grow long enough to reach the moon?

Hair-ston, we have a problem. Image: Gregory H Revera, CC BY-SA 3.0

Let’s start with the speed of hair growth. As a child I learned that my scalp hair grows at a rate of about $v_\text{hair} = 1\,\text{cm}$ per month (As a grown-up I can check Wikipedia: the range is more or less $v_\text{hair} = 0.6$–$3.35\,\text{cm}/\text{month}$.). This seems like a natural unit to use for hair growth, but unfortunately it is not a SI unit, nor does it relate to football fields. As an astronomer, I’d rather work in more familiar units: \[v_\text{hair} = 3.86\times 10^{-12}\,\text{km}/\text{s}.\] In order to have a better feeling of how fast this is, we can compare the speed of hair growth with the speed of light, $c_\text{light} = \text{299,792.458}\,\text{km}/\text{s}$. It turns out that photons travel through space by a factor of $c_\text{light}/v_\text{hair}\approx 10^{17}$ times faster than hair grows!

The distance between the Earth and the moon is \[D_\text{moon} =\text{384,400}\,\text{km},\] which is about one light-second. Since hair grows $10^{17}$ times more slowly than light moves, our total growth time is $t=D_\text{moon}/v_\text{hair}\approx 10^{17}\,\text{s}$. In human-readable units this time is about $3\times 10^{9}\,\text{yrs}$—that’s pretty close to the age of Earth, or $4.5\times 10^{9}\,\text{yrs}$. Sending humans to the moon via spaceships based on human hair growth is thus not a very smart idea—not to mention the amount we’d be spending on shampoo.

Ah, but what if we all teamed up? Since we have about $8\times 10^{9}$ contributing people on the planet, the global human scalp hair production rate (or GHSHPR for short) is about $ 8\times 10^{9} \times \text{100,000} \times v_\text{hair} \approx 3000\,\text{km}/\text{s}$. This means that, if we all pitched in, we could grow enough hair to reach the moon in about two minutes!

On the other hand, with 8 billion people and $100\,000$ hairs each, sticking all those hairs together into one mega-hair might be a problem. We’d need $8 \times 10^{14}$ pieces of tape, and even if we could stick ten hairs per second, it would take us $8\times 10^{14}\times 0.1\,\text{s}$, or about two and a half million years, to glue it all together.


Total Eclipse of the Hair-t. Bonnie Tyler would be proud.


What if we wanted to cover the Earth in hair?

Let’s admit it—one big downside of our planet is that it’s not very fluffy. But we can make it fluffy (like the Tribbles in Star Trek) by covering it with hair!

The Earth’s radius is $R_\text{Earth} = 6371\,\text{km}$, so its surface area is $4 \pi \times (6371\,\text{km})^2 \approx \text{40,000,000}\,\text{km}^2$. To make the Earth satisfyingly hairy, we’d need $4\pi R^2_\text{Earth} /\overline{A}_\text{hair} \approx 10^{22}$ hairs. This is about a tenth of a mole of hair, but it’s several orders of magnitude more than the hair we’ve got available. Even if we all teamed up, it would take absolutely ages to grow enough hair. According to the American Academy of Dermatologists, most people lose about 50–100 hairs per day, and since we seem to have the same number of hairs from one day to the next, let’s assume we’re growing 100 hairs a day each. At 8 billion people on Earth, that’s $8 \times 10^{11}$ new hairs every day. It would take us $10^{10}$ days—or 27,000 years—to reach our goal. This is probably the main reason why our proposed plan of creating the fluffiest planet in the universe will never get funding. On the other hand, we do have enough hair to turn Manchester into Mane-chester. If anyone wants to get in on my grant application, please do let me know.


A day in the life: engineering

Engineering: without it, we’d have a hard time applying all the brilliant maths people do to the real world, and an even harder time pronouncing “Stem”.

But what does a career in engineering really look like? How much maths does it involve? Is it really just emails?

Fear not: Chalkdust has the answers. In this issue’s edition of our day in the life series, we hear from three people who put engineering mathematics into practice in their daily lives:

  • Isobel Voysey is a PhD student at the Edinburgh Centre for Robotics.
  • David Fairbairn is a mathematician at Tharsus, and a PhD student in Durham University’s department of mathematical sciences
  • Kirsten Ross is a design engineer, specialising in building service design.

From designing heating systems for schools to helping robots avoid crashing into each other, they’re all working on real-world problems every day. Read on to find your new dream job…

Continue reading


The Josephus problem

A group of children stand around in a circle. Each one puts a foot into the centre. One of the children starts tapping the feet around the circle in a clockwise direction and reciting one word per foot:

Ip dip dip,
My little ship,
Sailing on the water,
Like a cup and saucer,
But you are not in…

The child whose foot is tapped on the final it is eliminated, then the rhyme starts again. Eventually, there is only one child left: the winner. The winner might get some chocolate, or might get to be it in the next game of tag. Imagine there are 13 children to start with. They begin by standing in a circle:

In the first round, they count round starting from 1 until number 8 is eliminated:

They then continue from the next number 9,until eventually number 4 is eliminated:In the next few rounds, 2, 3, 7, 13, 12, 6, 9, 10 and 5 are eliminated (in that order), leaving just 1 and 11. The next ‘ip’ starts on 11; 1 is dip’; 11 is the second ‘dip’… this continues until finally 11 is ‘it’. The winner is 1. If you were one of the children, you might want to save a lot of time and work out who was in the winning position before the eliminations started. This is often called ‘the Josephus problem’.


In the example above, we worked out who would win if we started with 13 people and counted out the whole ‘ip dip dip’ rhyme. But we’d like to be able to work out the winner if we start with any number of people. To make it a little easier, we’re going to use a shorter rhyme with just two words (‘Ip it!’). We can start by doing some examples and looking for a pattern. We can choose some values for the number of children, $n$, and by following the rules of the Josephus problem find the location of the winner, $w$. Here is the table for the winner depending on the number of children:

$n$ $w$ $n$ $w$ $n$ $w$
1 1 7 7 13 11
2 1 8 1 14 13
3 3 9 3 15 15
4 1 10 5 16 1
5 3 11 7 17 3
6 5 12 9 18 5

From the table, we can notice a couple of patterns: the winner is always odd; if we start with $2^k$ players, the winner is always 1. We can prove that both of these patterns will also hold true for larger numbers. To show that the winner is always odd, we simply need to notice that all the even positions lose during the first set of rounds. We can show that the winner is always 1 when we start with $2^k$ players by induction: in this case, the $2^{k-1}$ even numbers are eliminated first, leaving $2^{k-1}$ numbers. We can then renumber the remaining players, and in the subsequent round of eliminations, the even numbers are eliminated again. The number 1 is never removed in these rounds of eliminations, so remains as the winner.

Modular arithmetic

Thinking about everyone standing in a circle suggests that modular arithmetic might be the light of our problem. But what is modular arithmetic? Modular arithmetic uses the modulo function—a function that generates the remainder when dividing by a number $k$. For example, when 7 is divided by 2, the remainder is 1, and when 8 is divided by 2, the remainder is 0. We can write these facts as
\begin{align*} 7\text{ mod }{2}&= 1,\\ 8\text{ mod }{2}&=0. \end{align*}
Notice that when we divide by 2, the remainder is either 0 or 1. We can write \[ a\text{ mod }{2}=b, \] where $b$ is 0 or 1. In general, the remainder of $a\div k$ can be from 0 as a minimum to $k-1$ as a maximum. Modular arithmetic is often called clock arithmetic, as a 12-hour clock counts in mod 12. This way of thinking about counting around a circle like on a clock suggests that it might be useful for our problem. Let’s see if it can help us work out what happens if we start with a number of players that isn’t a power of 2.

Finding a solution

We start by guessing that the solution for $n=2^k+r$ (with $r<2^k$) might be \[w = c + a(n\text{ mod }{2^k}),\] where $c$ and $a$ are positive constants. When $n=2^k$, we know from earlier that $w$ is equal to 1. We can deduce that \begin{align*} 1 &= c + a(2^k\text{ mod }{2^k})\\ &=c, \end{align*} and so if our guess was right, the solution is \[w = 1 + a(n \text{ mod} {2^k}).\] We can now try this formula using some values of $n \not={2^k}$. For example, when $n=11$ we know from the table that $w=7$. The nearest power of 2 less than 11 is 8, which is $2^3$. So $k=3$ and our solution formula becomes \begin{align*} 7 &= 1 + a(11\text{ mod }{2^3})\\ &= 1 + a(11\text{ mod }{8})\\ &= 1 + 3a. \end{align*} Our formula works if $a=2$, and so our solution is \[ w=1+2(n\text{ mod }{2^k}). \] If you try this formula out on the other numbers in the table, you’ll see that it always gives the right answer. But checking all of these examples isn’t enough to know that this formula will work for all values of $n$. For that, we need a proof. Luckily finding a proof isn’t too hard. First, we know from earlier that if we start with $2^k$ players then player 1 will win. From this we also know that if we’re in the middle of a game and have reached a point where there are $2^k$ players left, then the player who we are starting with next will win. So if we start with $n=2^k+r$ players, we can find the winner by working out who we’ll be pointing to once the extra $r$ players are eliminated. Noticing that $r\equiv n\mod 2^k$, and that we count out two people for each elimination tells us that in eliminating these $r$ players, we’ll have counted on $2(n\text{ mod }2^k)$ places from the 1 we started at. We have a general formula connecting the number of players, $n$, the winning position, $w$, and we’re not certain this it’s right. But maybe you’re wondering if there’s a nice observation we could use to save ourselves time? For that, we’re going to need binary numbers.

Binary numbers

In our usual decimal numbers, each digit represents the next power of 10. If instead, we use each digit to represent a power of 2, we have binary numbers. There are 2 digits in this number system: 0 and 1. For example, $1101_2$ is a binary number (we include the $_2$ to make it obvious that we’re using binary): it is equal to the decimal number 13 as it has ones in the 1, 4, and 8 positions. Every decimal number can be represented as a unique binary number: this is the key to our trick. Let’s try writing some values of $n$ and $w$ from the table in binary. First up, let’s try $n=10$ and $w=5$: \begin{align*} n &= 1010_2 & w_1 &= 0101_2. \end{align*} Another example: $n=14$ and $w=13$ becomes \begin{align*} n &= 1110_2 & w_1 &= 1101_2. \end{align*} Notice that if we take the 1 at the front of the binary number $n$, then move it to the end, we get $w$. This works for the other numbers too, for example for $n=8$:

This gives the correct value for $w$! You can prove that this version of the solution works for every value of $n$ using our formula from the previous section. I’ll leave you to think about how to do this. We’ve found two ways to work out who the winner is if we eliminate the second person. While I go away and enjoy all the chocolate I’ve won, you can try to work out the solution when we use a longer rhyme…


The magic of particle prediction

Do you ever find yourself wondering how the maths you learn helps us to understand and model small things we can’t even see? Yes, I thought so. I know you might hate how physicists mess with maths but bear with me on this one. Hold on tight as we take a joyride through the fusion of mathematics and subatomic wizardry—it’s time to unlock the secrets of perturbative unitarity and those elusive gravitons you may have heard about!

The gravity of the situation

Alright then, imagine this: particle physicists are the ultimate puzzle-solvers, working to understand the tiniest building blocks of the universe. But guess what? It’s like they’ve been handed a humongous jigsaw puzzle with no picture to refer to. The jigsaw pieces? Those are our fundamental particles, such as electrons, quarks and photons. The picture? Well, we aren’t fully sure yet, but the so-called standard model was a decent start and I will introduce it to you soon.

The standard model has some flaws in the sense that it does not include gravity, neutrino masses or dark matter. Among other issues! The standard model features the fundamental particles that exist in the universe. These are the smallest building blocks of matter, light and gravity. The standard model describes how atomic nuclei are held together via the strong force and it also describes what matter consists of. These particles are our building blocks for almost everything else in the universe.

Quarks come in six different flavours and these quarks combine in different ways to form other larger particles. The so called bosons in the standard model mediate the fundamental forces, photons which carry electromagnetism, which we see in light. The W and Z bosons are responsible for the weak nuclear force that is responsible for radioactive decay. The standard model is basically the particle physics version of the periodic table.

The different regimes of physics

So far so particle-like, but I heard particles are waves too, I hear you cry. Imagine you’re looking at a nice pond, and you throw a stone into it; ripples will spread out from where the stone landed and they interact with each other. These ripples create waves on the water’s surface. Now, think of this pond as the universe, and those ripples as our particles.

Quantum field theory, or QFT for short, is a way physicists try to understand how these tiny particles work and interact with each other. Instead of treating particles like billiard balls, QFT treats all these different particles as vibrations or disturbances in quantum fields. Think of this field as an invisible substance filling all of space, and particles are like the ripples or waves in this field. Quantum field theory can successfully describe three of the four fundamental interactions: the weak interaction, electromagnetism and the strong interaction. However, it needs some extra work to describe quantum gravity.

Gravity at large distances

When we place a bowling ball in the centre of a trampoline, it creates a dent in the trampoline’s surface. This is analogous to how massive objects, like planets, warp spacetime according to Einstein’s theory of general relativity. Now imagine our trampoline as spacetime and some kids bouncing around chaotically as our particles. They move around and interact with the dent created by the bowling ball, which is like how gravity acts. This interaction between the kids and the trampoline is what we usually describe using general relativity. Typically, when we think of gravity what comes to mind is the force that pulls things with mass together. It is what causes apples to fall from trees and keeps the planets in our solar system orbiting around the sun, but how do we describe gravity at very small length scales?

In everyday life, we experience gravity in the continuous sense, but on small scales gravity acts like a particle. Individual particles are responsible for these forces: we quantise continuous waves into particles. Quantising is like turning something continuous and smooth into tiny, discrete chunks. It’s like you are filling a glass with water, but instead of a smooth stream, the water comes out in little drops—each drop is a quantum of water. You can’t have half a drop; it’s the smallest possible amount. Quantum gravity works in the same way. We try to break up this continuous force of gravity which acts everywhere in the universe by introducing the quanta of gravity: the graviton particles.

From particles to waves

Many people think that quantum mechanics and general relativity are incompatible, and that the solution is to find some form of a grand unified theory that can account for all the fundamental forces simultaneously. A grand unified theory, often abbreviated as GUT, is like a big puzzle-solving idea in the world of physics such that maybe, there’s one big theory that explains everything: how electrons and quarks behave, how gravity works, and why the universe is the way it is. But this is a bit of an outdated idea. We all accept special relativity and classical mechanics to be different theories but valid in their own regimes (high speeds for SR). For speeds much less than that of light, we arrive at the same results as Newtonian mechanics. The same situation is applied to general relativity and quantum mechanics, we take quantum gravity as the high-energy short-distance limit of general relativity. The overall point of this is because quantum mechanics and general relativity are not fully compatible when dealing with more extreme limits, we introduce a bridge between them, which we call effective field theories (EFTs).

The standard model in all its glory

The actual description of the universe is like a big orchestra performing some very complex music at a busy stadium with lots of musicians (the particles). But when this orchestra plays at a smaller venue, they might use a smaller group of musicians, and the music is simplified. This is how EFTs work, they simplify our theories of the universe to focus on just the important pieces at a time. With these EFTs we do not need to choose, per se, a specific theory for quantum gravity—string theory or loop quantum gravity, for example—but merely just assume one exists. In the low-energy large-distance limit, the behaviour of quantum gravity is then described by general relativity, which is a classical theory, where spacetime curvature and gravity can be calculated using Einstein’s equations (as seen in Chalkdust‘s iconic hall of fail). EFTs are only valid within their energy regime, but this is OK because you wouldn’t expect to model a ball rolling down a hill with special relativity (unless you are crazy or the ball is extremely fast).

Some HEFTy techniques: perturbative unitarity to predict the Higgs boson

Maybe you have seen the standard model Lagrangian, where there are so many terms just to describe how particles move, their masses and how they interact. EFTs take terms in the Lagrangian densities which are relevant at the required energy scale to form approximations on particle interactions and scattering amplitudes. Think about a magnetic material: in principle we could describe it in terms of individual atoms and their spins, but this would be incredibly complicated. Instead we zoom out and describe the collective behaviour of the atoms through the magnetisation, a measure of how the material responds to external magnetic fields. This is an effective field theory, the effective part means averaging out unnecessary details. The aim is to average over short length scales to remove some of the intricacies of the theory, which the EFT does in terms of an action and a Lagrangian density.

In much of physics, the Lagrangian, like the energy, is a bookkeeping tool that helps us describe the behaviour of a system mathematically. Think of a mechanics problem like a ball rolling down a hill. The Lagrangian describing the ball’s motion is the difference between its kinetic and potential energies. Extremising this quantity leads to the familiar, Newtonian, equations describing the ball’s motion.The action is an integral over space and time of the Lagrangian density, a version of the Lagrangian for field theories, describing the motion of particles along different paths. Extremising the action leads to the equations of motion; it is an optimisation problem to find a particle’s most efficient trajectory.

In the quantum world all the paths that a particle can travel between two points, $A$ and $B$, are valid, those which extremise the action are just the most probable. The sum of all these probabilities must add up to $1$, which is encoded in the principle of unitarity.

Perturbations and probabilities

Unfortunately for many QFTs, the description in terms of a Lagrangian is still very complicated, and to extract any meaningful information we need an approximation scheme. However, this approximation scheme, known as perturbation theory, can break unitarity as it involves throwing information away. We use a technique called perturbative unitarity to ensure the total probability of all possible outcomes in a particle interaction or collision—like scattering in different directions—sums to 1.

Imagine trying to solve a quadratic equation like $\varepsilon x^{2}-2x+1=0$ for $\varepsilon \ll 1$, but you didn’t know the quadratic formula. How would you go about solving this equation? We could perhaps take a guess at an expansion in powers of $\varepsilon$, and then neglect terms of higher order, $x=x_{0}+\varepsilon x_{1} +\varepsilon^{2}x_{2}+\cdots$. We can then approximate a solution of this equation by comparing coefficients of $\varepsilon$, however we will only get one root of the equation, but it’s still a good way to approximate a solution. This is the essence of perturbation theory, we pick a small parameter and expand around it.

In classical mechanics, we can think of a collision between billiard balls as being alike to the scattering of fundamental particles. Consider two balls colliding with initial momenta $\boldsymbol{p}_{1}$ and $\boldsymbol{p}_{2}$ respectively. After the collision the momenta of the individual particles will have changed; they have final momenta $\boldsymbol{p}_{3}$ and $\boldsymbol{p}_{4}$. We can use Newton’s second law to find the forces acting on these balls and predict where they will travel to after the collision. However, in quantum mechanics our particles are not modelled by solid billiard balls, and instead we describe them by wavefunctions that represent probabilities of finding the particle in different positions and states. Instead of predicting the exact path of a particle like you would with billiard balls, quantum mechanics provides probabilities. We can calculate the likelihood of a particle scattering in a certain direction or having a specific outcome, but we can’t predict precisely where it will go.

You can use classical mechanics to work out where the cue ball’s going

We can think of the incoming and outgoing particles as the billiard balls, and where the billiard balls collide and fuse is the part where the graviton or Higgs particle enters existence. The billiard balls then move away from each other, becoming the outgoing particles.

S is for scattering

When particles collide, we model them using a tool called the S-matrix—the scattering matrix. Essentially we know what goes in to the collision and what comes out but we treat the collision itself as a mystery box where we do not know all the details. The S-matrix is a way of describing what we do know about the interaction. The S-matrix is such that $\boldsymbol{\mathsf{SS}}^{\dagger}=\boldsymbol{\mathsf{I}}$, where $\boldsymbol{\mathsf{S}}^{\dagger}$ is the Hermitian conjugate (complex conjugate transpose) of the S-matrix. This property gives us an equation for the scattering amplitude $\mathcal{A}$, which describes how particles interact during a collision. This matrix helped us predict the existence of the Higgs boson particle, postulated in 1964, which was eventually detected at the Large Hadron Collider in 2012. The scattering amplitude encodes the probability of a scattering interaction occurring.

The compact muon solenoid (CMS) detector played a part in detecting the Higgs boson. Image: Wikimedia commons user Tighef, CC BY-SA 3.0

Imagine that you’ve got a particle, maybe a tiny electron, and it’s moving from left to right along one direction. But suddenly, it encounters a localised potential, we’ll call it $V(x)$, at $x=0$. Our electron wants to get through this barrier and enter the right hand side where the party is kicking off. Our particle has what we call a wavefunction, $\Psi$, which is determined by the Schrödinger equation,
$$-\frac{{\hbar}^{2}}{2m}\nabla^{2}\Psi+V(x)\Psi=\mathrm{i}\hbar \frac{\partial\Psi}{\partial t}.$$ What’s the deal with this wavefunction, you ask? This wavefunction is like our particle’s ID card, holding all the information about where the particle might be hanging out and how it’s vibing with its momentum. A wavefunction encodes information about a particle’s position and momentum and allows us to calculate the probability of finding the particle in different states or locations. Solving the Schrödinger equation we obtain
A\mathrm{e}^{\mathrm{i}kx} + B\mathrm{e}^{-\mathrm{i}kx} & \text{if } x<0 \\ C\mathrm{e}^{\mathrm{i}kx} + D\mathrm{e}^{-\mathrm{i}kx} & \text{if } x>0.
If our particle is on the left side $x<0$, it's all about the coefficients $A$ and $B$, with $\Psi(x) = A\mathrm{e}^{\mathrm{i}kx} + B\mathrm{e}^{-\mathrm{i}kx}.$ $A$ and $B$ are coefficients telling us how much of the wave is coming and going. $A$ represents the incoming wave and $B$ stands for the reflected wave. But if our particle is on the right side of the potential $x>0$, it’s more about the coefficients $C$ and $D$, and we’ve got $\Psi(x) = C\mathrm{e}^{\mathrm{i}kx} + D\mathrm{e}^{-\mathrm{i}kx}.$ With $C$ representing the outgoing wave and $D$ the wave coming in from the right. Focusing on a particle starting to the left of the potential barrier and moving to the right, we can set $D$ to zero. Then $C$ tells us how much of the wave is transmitted through the barrier, while $B$ tells us how much has been reflected. We also have a wavevector $k=\sqrt{2mE}/\hbar$, which is like the DJ at our party. Our $k$ is the tempo based on the particle’s energy and mass; if $k$ is large, the music is fast, and the particle is zipping along at high energy. On the flip side, when $k$ is small, it’s a slow jam, and the particle is moving at lower energy.

Wave function scattering off a localised potential

The S-matrix is like the party host that connects the vibes from the incoming guests to the outgoing ones. Then $\boldsymbol{\Psi}_{\mathrm{out}}$ is the guest list for the party, with $B$ and $C$ rocking the party. There is also $\boldsymbol{\Psi}_{\mathrm{in}}$ which is on the left side trying to enter our party; $A$ makes it in but $D$ is still chilling at home.
These are related by
$$\boldsymbol{\Psi}_{\mathrm{out}}=\begin{pmatrix} B \\ C \end{pmatrix},\; \boldsymbol{\Psi}_{\mathrm{in}}=\begin{pmatrix} A \\ D \end{pmatrix}.$$
We use the unitarity of the S-matrix to form an upper bound on the real and imaginary parts of a scattering amplitude, which we can then convert to an upper bound on the energy, with the potential to violate unitarity for collisions above this energy. To preserve unitarity a new particle must be introduced to our theory!

Perturbative bound for gravity

For gravity, we now take on the challenge of finding the energy scale at which unitarity breaks down, indicating the existence of some new physics in the same way as for the Higgs. We need this new particle to stop unitarity from being violated at high energies. We use a trick called partial wave decomposition. Essentially we know that the scattering amplitude solves some equation, and we also know that this equation has a basis of solutions which we can expand in. Think of this as a fancy cousin of the Fourier series known as a partial wave decomposition. This expansion gives
$$\mathcal{A}(s,\theta)=16\pi\sum\limits_{n=0}^{\infty}(2n+1)a_{n}(s)P_{n}(\cos{\theta}).$$ Where $a(s)$ are partial wave amplitudes, which is analogous to the transmission component of a wave in 1D scattering, determined by $A/C$ in our quantum mechanics example.

The same idea as what solved the Higgs case works again for gravity where we integrate the S-matrix unitarity constraint to find a bound for distinguishable (different) particles with scattering mediated by a graviton. In terms of the partial waves, the bound is
$$|a(s)|^{2}+\mathrm{Im}(a(s))=0.$$ This bound is not satisfied for every possible energy, there are values of energy $s$ where this bound is violated. But we can’t have unitarity violated, so physicists introduced correction terms and the Higgs boson to mediate the scattering process. If we apply this bound to the amplitude that can be calculated for $W^{+}W^{+}\rightarrow W^{+}W^{+}$ scattering, we find that without the Higgs particle, unitarity would be violated at an energy around 1.7TeV. This bound means at energies above 1.7TeV, our physics would stop working, and probabilities would go above 1. Now, we see the same problem with quantum gravity, except this time it is more difficult as gravity requires an infinite number of correction terms, whereas the Higgs case only needed finitely many.

We can visualise the bound by displaying it on an Argand diagram in order to see where unitarity breaks down and hence where we need to introduce a new particle. Let $y=\mathrm{Im}(a(s))$ and $x=\mathrm{Re}(a(s))$ and using the properties of complex numbers that $|a(s)|^{2}=x^{2}+y^{2}$, we obtain a circle in the complex plane which bounds the scattering amplitude. In other words for energies such that we are inside the circle unitarity is preserved, while for other energies it is violated. Our circle in this case is $x^{2}+(y+1/2)^{2}=1/4$, and the bound is $$|\mathrm{Re}(a(s))|\leq \frac{1}{2}.$$

Adding new particles to our theory changes the scattering amplitude as there are now more ways that the particles can interact which changes the bound on energies where unitarity is preserved. If this new particle has the right mass then the energy, $s$, can grow arbitrarily large without violating unitarity. The new physics that we need to preserve unitarity is the introduction of the Higgs and the graviton respectively for each case. The new particle gives a new way, or channel in particle physics language, for particles to interact.

Argand diagram with a circle of radius $1/2$ and centre $(0, -i/2)$, bounding the real part of the amplitude and partial waves as $|\mathrm{Re}\{a_n(s)\}|\leq1/2$.


For quantum gravity, one new channel is not enough to preserve unitarity at every energy scale. Instead we will get a new bound and at energies greater than the new bound unitarity would still be violated. To fix this requires us to add more resonance terms to our amplitudes so that we do not violate unitarity. The difference between the Higgs case and the graviton case is that gravitons require an infinite number of correction terms, while the Higgs does not. Physicists are taking what was learnt about the Higgs case and applying it to scattering mediated by gravity.

How do we go about adding an infinite number of correction terms? A similar area of physics called string theory offers a perspective on the amplitudes of gravitons, as there we can use a specific amplitude—among others—known as the Veneziano amplitude to model graviton scattering. While at first our EFT did not assume any specific theory of quantum gravity, string theory becomes helpful in modelling graviton scattering amplitudes. Investigating these amplitudes and what they can tell us about new physics at higher energies is a current area of research.

To sum it all up, the standard model of particle physics has guided us to analyse the interactions of elementary particles, while effective field theories have opened doorways to explore the gaps in the standard model. We marvelled at the magic of perturbative unitarity, the safety net that constrains our probabilities, and the S-matrix which describes the interactions of particles, revealing the hidden secrets of the Higgs boson and paving the way for experimental discoveries at the LHC. This led onto gravitons whose scattering amplitudes help us to bridge the gap between the quantum and the cosmic scales. The next step is to actually detect these gravitons (easier said than done)… Maybe check back in 100 years?