In conversation with… Cédric Villani

Early on a February morning, we’re standing outside one of the many trendy cafes in Fitzrovia. Down the street we spot a man striding our way, wearing a full suit, a hat, a giant spider brooch and hastily tying a cravat. It could only be superstar mathematician Cédric Villani.

Cédric is passing through London on his way back from the US, but this is no holiday. In his two days here, he is attending a scientific conference, giving a public lecture, and taking part in a political meeting. His packed schedule leaves the increasingly-busy Fields medallist just enough time to join us for breakfast.

Fields medal

One afternoon in early 2010, Cédric was in his office at the Henri Poincaré Institute in Paris, getting ready to pose for publicity photographs. The photographer, from a popular science magazine, was setting up his tripod when the office phone rang. Cedric leant over and picked it up. It was Lázló Lovász, president of the International Mathematical Union.

Fields medal ceremonies are held every four years, and six months before each ceremony, the winners are alerted by telephone about their success. During these six months, they are sworn to secrecy, but with the photographer in the room, Cédric suddenly realised that he might be in possession of the shortest-kept secret ever. By some miracle, the tripod had proved sufficiently interesting for the photographer, or perhaps he didn’t follow the English conversation, and the secret remained safe.

If you try too hard to win a Fields medal, you will fail.

Cédric had first realised that winning the Fields medal was a possibility at some point in 2004, when he was 31. Fields medals are only awarded to mathematicians under the age of 40, and until the phone call arrived, Cédric only placed his chances of winning at around 40%. “The prospect of winning the medal does put some pressure on you during your 30s. But everybody knows—it’s part of the common mythology—that if you try too hard to win it, you will fail.”

In August 2010, Cédric was officially awarded the medal at the International Congress of Mathematicians in front of 4000 mathematicians and journalists. Finally, he was allowed to celebrate: he did so by taking a dozen colleagues to a fancy restaurant in Germany, thereby relieving himself of half the CAN$15,000 prize money.

The Boltzmann equation and Landau damping


The Boltzmann equation can be applied up where the air is clear less dense. Image: public domain

While enjoying a hearty breakfast, Cédric explains his research to us. “In this room, we are surrounded by air. You can use the Navier–Stokes equations to describe this air. But at higher altitutes, where the atmosphere is more dilute, these equations do not work so well. Here, it is better to use the Boltzmann equation.” The Boltzmann equation describes the statistical behaviour of a gas, and Cédric has worked on two areas related to this equation: the influence of grazing collisions, where two particles pass very close to each other; and on the increase in entropy as time passes.

Cédric’s other work, completed with Clément Mouhot, looked at the mechanics of plasmas: high-energy soups of electrons and positively-charged ions which are formed by superheating gases. Roughly speaking, if a plasma is exposed to a brief electric field, then the electric field will become very small as time goes by. This decay effect is called Landau damping. In the 1940s, Lev Landau proved that this damping occurs for a linearised approximation of a plasma. Cédric and Clément proved this result for the full non-linear system of equations.

It was the work in these two areas that led to Cédric being awarded the Fields medal, although he has worked in other areas as well. Imagine you have a large pile of sand and a hole to fill (with the same volume as the sand). How should you go about moving the sand to fill the hole, while minimising the total work you have to do? This is an example of an optimal transport problem.

He used the ideas of entropy from his study of the Boltzmann equation and applied them to this problem, and used this to establish a link between the non-Euclidean curvature of a manifold and properties of the entropy. This led to a “whole bunch of research related to non-Euclidean geometry”.

Career choices

Academia is where my heart belongs.

If the young Cédric had had his way, his research life would be very different. “When I was a kid, I wanted to go into palaeontology. I recently had a great discussion with Jack Horner, the world’s most famous expert on the subject—‘Mr Dinosaur’, and it was like reconnecting with my youth.”

So is he happy in mathematics academia? “Academia is where my heart belongs. I like industry, and I sit on the advisory boards of several companies, but I’m an academic guy. My research has not had an application so far that I am aware of. But, with applications, when they come it will be much later.”

Traces of Cédric’s early passion can still be spotted though. He owns a cuddly toy dinosaur called Philibert, and leaves maths books open to keep him entertained. Years later, he found that Alan Turing, one of his greatest heroes, used to do the same with his teddy bear at university.

Grumpy Gauss

Grumpy Gauss, oil on canvas. Christian Albrecht Jensen (1840)

In fact, Turing is the hero in his recently-penned graphic novel,  Les Rêveurs Lunaires. Excited readers will be disappointed, however, as “even though England is everywhere in the book, English publishers have not yet been interested in making an translation.” This is a double-shame, as you will remember from Chalkdust issue 04 that comic books about maths are `hot’.

He is, however, less sure whether he would like to travel back in time to work with Turing or other mathematicians. “People like Gauss—so fascinating, so superhuman. But he was known for being rather grumpy; maybe it would not be so pleasant! Then take Riemann—a genius! But a bit depressive; maybe he was not so fun to work with. I’m not sure if he would want to see me.”

A day in the life of a Fields medallist

Life is rarely routine for Cédric. In a usual year he travels to 20–25 countries, and has roughly 30 different appointments each week. When he can, he enjoys a quiet family breakfast at home. The contents of this breakfast have not changed since he was a child, and include bread, jam and hot unpasteurised milk. For today, however, he makes do with a full English with scrambled eggs.

I never give fashion advice. I always tell people: “find your own way”, as I did find my own way.

Dairy products seem to feature heavily in Cédric’s day-to-day life. Impressively, he is able to visualise every shelf in his favourite cheese shop and name, in turn, every item on sale. This is very important to him, as otherwise he could return home from grocery shopping to find himself without one of his many favourite cheeses.

He is in London to give a lecture to the public, something that he spends a large amount of time doing these days, “much more so than to mathematicians. But both are good: different feelings, different preparations.” Overall, since winning the Fields medal and gaining fame, Cédric claims that his time for research has been “divided by hundreds”.

Indeed, the public lecture is not his only commitment in London. He is currently attending a meeting at the Royal Society about the numerical abilities of animals. This meeting included great revelations about the mathematical abilities of frogs—evidenced through their calls involving sounds of varying number and length—as well as fish, bees and chimpanzees. “One of the crazy things that emerged from this conference is that the tendency to put small numbers on the left and large numbers on the right is not merely a side effect of how we write numbers. You can also find this—in some sense—in newborn chicks and fish.”

When in France, Cédric is recognised everywhere he goes, and is (still) posing for photographs. He is regularly featured on the covers of science magazines, and is often confronted by giant billboards of his face. If you are planning on winning a Fields medal, do not panic: he assures us that you will quickly get used to this.


Cédric enjoying a popular maths magazine

Cédric enjoying a popular maths magazine. Image: Chalkdust

When we meet Cédric, the French election is in full flow. As part of his stay in London, he is attending a meeting for the candidate he describes as the “young, centrist guy”. He is one of seven scientists on a board that provides scientific policy advice to the European Commission. However, he doesn’t recommend becoming too involved in politics, as he thinks there is no way to find time to pursue both a serious research career and a serious political career.

“The current political climate is far from science in general. Science, as a field, is much more respected by society than politics. So there is reputation to be lost by going into politics. But the most popular politician in French history is Napoleon, and he was keen on mathematics, and a big protector of mathematicians and scientists. He was elected to the academy of sciences, attending when he could, and enjoyed discussions with many of the best mathematicians of his time. But he was always late…”

Keen not to be late himself, Cédric finishes his eggs and heads off to his next commitment. It would seem, however, that Cédric does not always listen to his own advice: in June he became an elected member of the French parliament, as a member of the young, centrist guy’s party.

TD, Scroggs and Yiannis The Undergrad enjoy Cédric's company

TD, Scroggs and Yiannis The Undergrad enjoy Cédric’s company. Image: Chalkdust


Cardioids in coffee cups

Picture the scene. It’s 1am and you’re up late working on some long-winded calculations. The room around you is dark, a desk lamp the only source of light. Your eyelids start to droop. But the work must get done! Time to fall back on the saviour of many a mathematician: coffee.

But as you sit back down at your desk, you notice something weird. The light from your lamp is reflecting oddly from the edges of the cup, creating bright arcs—and it looks suspiciously like a cardioid curve! Time to investigate…

Work forgotten, you pull out a clean sheet of paper and—well, dear reader, you may have been more sensible than me and just gone to bed at this point, or finished the work you were meant to be doing. For me, though… well, let’s just say that sleep would be impossible until this mystery was resolved.

So. We have a cup. We have a light. We have an enigmatic looking curve. What’s going on?

This coffee is clearly demanding that we make theorems from it

Let’s shed some light on this

The paths of two collimated light rays and their different angles of reflection

To keep things simple, we’re going to model the base of the cup as a perfectly reflecting two-dimensional circle, and limit the incoming light to the plane that the circle is in. We can also set the radius of the cup to be 1 without loss of generality. Since the cup and the lamp are reasonably far apart, a decent assumption to make is that the light coming in is collimated, which means that all the light beams are parallel to each other, as if from a point source at ‘infinity’.

If we look at a single light ray coming in parallel to the $x$-axis, we know that the angle of incidence and the angle of reflection are equal, as measured from a line normal to the circle. If a second ray comes in parallel to the first, it will hit the surface at a different angle, so must reflect off at a different angle to the first, and so the reflected rays will no longer be parallel. Instead, they will overlap as shown above.

As we build up light rays, the shape of the envelope begins to emerge. The only difference between this diagram and the physical system is that here, the overlap makes a dark envelope, whereas in the cup the overlap makes a bright envelope

A rule of ray tracing tells us that all light rays parallel to the axis will go through the focal point of a curved surface. This is the point where the two light rays cross in the diagram above. We can therefore expect that to be the brightest point of the pattern that we see in the coffee, since it is hit by the most light.

But what about the rest of the curve? If we draw in a few more light rays, as in the diagram to the left, we start to see areas where many different rays overlap and can build up a picture of the curve. Treating the incident light rays as a family of curves, the bright pattern seen is their envelope. The envelope is a curve that at every point is tangent to one of the incoming rays, and so by moving along its length we move between the different members of the family. Due to the tangent property, it is also the boundary of the most dense area ‘swept out’ by the curves, so in many cases this corresponds to the curve you’d get by joining up all the points of intersection. If we can find the equation of this envelope, we can find out exactly what shape is being formed in the bottom of the cup.

Calculus to the rescue!

A hint about how to find the equation we need comes from the definition: we need to find a curve that is tangent to our family of curves at every point. So, perhaps unsurprisingly, a good way to do this is to use calculus. For a smooth family of curves, we first find a general equation for the curves by expressing them in terms of some parameter, say $a$. We can then find the equation of all their tangents by differentiating with respect to $a$. Since we need these two equations to match at every point along the envelope, to find the equation of the envelope we solve them simultaneously. For example, say we wanted to find the envelope of straight lines that enclose equal area between them and the axes—picture this as a ladder propped against a wall, but sliding down it.

The equation of a straight line in terms of both axis intercepts has the form

$$ \frac{x}{a} + \frac{y}{b} = 1, $$

where $a$ and $b$ are the intercept points. These are both parameters that describe the family of curves, so we use the fact that we want to keep the area $A= ab/2$ constant to eliminate $b$:

$$ \frac{x}{a} + \frac{ay}{2A}= 1. \qquad (1) $$

Differentiating this with respect to $a$ gives
$$ -\frac{x}{a\hspace{1pt}^2} + \frac{y}{2A} = 0. \qquad (2)$$

So equations (1) and (2) are what we want to solve simultaneously. In this case, it’s possible to do this by eliminating $a$ between the two equations, giving the equation of the envelope as

The curves and envelope formed for $A=1$

$$\sqrt{xy\hspace{1pt}} = \sqrt{\frac{A}{2}},$$
or, if we limit ourselves to the first quadrant,
$$xy = \frac{A}{2}.$$

This is the equation of a hyperbola. An example case for $A=1$ is shown to the right. In this case, we were able to eliminate the parameter from the equation to leave it only in terms of $x$ and $y$, but as we will see later, it may be easier to leave envelopes in their parametric form.

But what about the coffee?

We can use basic geometry to express our family of curves in terms of $\theta$

Now we know how to find the equation(s) for an envelope, we can apply this method to our cup scenario. We know the equations of the lines coming in, since they’re all just straight lines parallel to the axis, but we need to find out the equation of the reflected light beams. It turns out that this can be done by taking advantage of the fact that the cup is circular and throwing some trigonometry at it.

Consider a light beam coming from the right and striking the cup at a point $(x,y)$. The beam is at an angle $\theta\hspace{0.2mm}$ to the normal of the surface, so using

angle of incidence  =  angle of reflection

we know it will reflect at the same angle, as shown in the top right diagram. Using the fact that the red triangle is isosceles, the point $(x,y)$ is therefore at an angle $\theta$ from the negative $x$-axis, so we can parameterise the point as $(-\cos\theta, \sin\theta)$ as the radius of the cup is 1.

Plot of equation (3) for a few different values of $\theta$. Note that these are just the reflected rays

The slope of the reflected ray is then $-\tan2\theta$ and the equation of the line, in terms of $\theta$, is
$$ y – \sin\theta = -\tan2\theta\,(x+\cos\theta),$$
or, after some identity jiggery-pokery,
$$x\hspace{1pt}\sin2\theta+y\cos2\theta = -\sin\theta. \qquad (3)$$
A plot of a few of these curves, with different values for $\theta$, is shown in the middle right diagram and looks similar to what we see in the cup. That’s a good sign!

Differentiating the above equation with respect to $\theta$ gives
$$2x\hspace{1pt}\cos2\theta -2y\hspace{1pt}\sin2\theta = -\cos\theta. \qquad (4)$$

A plot of equation (3) with the calculated envelope drawn on

Now, eliminating $\theta$ between these would be downright disgusting, so expressing (3) and (4) in matrix form and solving for $x$ and $y$ gives
\begin{pmatrix} x \\ y \end{pmatrix}
\cos3\theta- 3\cos\theta \\
3\sin\theta – \sin3\theta
Plotting this, it matches up very nicely on one side of the cup. But, unlike the real pattern in the coffee cup, this one has an extra bit of curve! And this equation, alas, doesn’t describe the cardioid we’d hoped for—this is the equation of a nephroid. To add insult to injury, by the time I’d worked all this through, my coffee was ice cold. Yuck.

In fact, the extra bit of curve appears due to all values of $\theta$ being allowed. As the sides of the cup will block about half the light, this imposes a restriction on $\theta$, so we only see half the curve in our coffee. And a nephroid is, actually, correct—shapes like these that form when light reflects off a curved surface are called ‘caustics’, from the Greek word for ‘burnt’, as they can be used to focus sunlight to start fires. Both the nephroid and the cardioid belong to a larger family of curves called epicycloids, which are categorised according to the number of ‘cusps’ (sharp bits) that they have.

But I wanted a cardioid, dammit!

The angular setup for a point on the rim. To keep the parameterisation the same as the last case, the angles of incidence and reflection have been defined differently

If nephroids aren’t your thing, it’s possible to get a cardioid caustic in a cup if we change the setup slightly. Instead of having a point source at infinity, let’s put the point source on the rim of the cup and see what happens. The geometry of the incoming and outgoing rays is shown to the right.
This gives the equation of the line as
y\hspace{1pt}(1+\cos3\theta\hspace{1pt})+x\hspace{1pt}\sin3\theta = \sin\theta- \sin2\theta,
and differentiating, we get
-3y\hspace{1pt}\sin3\theta + 3x\hspace{1pt}\cos3\theta = \cos\theta- 2\cos2\theta.
Solving these simultaneously is a tad more fiddly than before, but working through gives the envelope as
\begin{pmatrix} x \\ y \end{pmatrix}
\cos2\theta- 2\cos\theta \\
2\sin\theta – \sin2\theta
This is a cardioid. Yay!

Reflected light beams from a point source located where the rim touches the positive $x$-axis

Additional complexities

What we see in a cup is unlikely to be exactly one of the two previous cases. If the caustic is bright enough to be visible, the light source is probably not far enough away to be at ‘infinity’, and people don’t tend to go around putting point sources on the rims of their cups. If we have a light source that’s a finite distance from the cup edge, the incident rays will be at an angle somewhere between the nephroid and cardioid cases, so the curve seen is somewhere between the two.

There is another large assumption we’ve made here that renders the situation somewhat unphysical. Our cup is a two-dimensional circle! And although that makes the maths nicer, it’s not great for holding coffee. The physical principles are the same in 3D, with an extra angle to worry about, so what you actually see in a coffee cup is the intersection of the surface in the diagram on the right with the bottom of your cup.

This surface is called the ‘cusp catastrophe’, and can be found using catastrophe theory, which, among other things, looks at the behaviour of manifolds with singularities in them.

The cusp catastrophe. Rich Morris (, CC BY-NC-SA 4.0

A change of focus

Since we can focus light using almost any process that changes its direction, reflection caustics (or catacaustics) such as the ones we have been considering here are not the only type possible. A common refraction caustic is the rippling pattern of light seen on the bottom of bodies of water, and a rainbow is a caustic caused by a combination of reflection and refraction. More exotically, gravitational forces bend space-time and therefore the light travelling through it, which means that gravitational lensing can give rise to caustics of astronomical scale. The shape of the caustic gives key information about the astronomical object, and this method has been used to identify and analyse exoplanets around distant stars.

Although these physical systems look completely different at first glance, they’re linked by a single phenomenon. The same flavour of physics that describes how the light in your morning cuppa behaves also describes the behaviour of light on ridiculously huge scales in the universe. And that’s pretty cool, don’t you think?

So, the next time you sit down to enjoy a hot beverage, take a moment to appreciate the awesome things happening, quite literally, right under our noses.


Mathematics for the three-fingered mathematician

We’re all familiar with using a couple of different bases to represent integers. Base ten for almost all purposes when we do our own calculations, and base two, or binary, for getting computers to do them for us. But there’s nothing special about ten and two. We could equally well use any integer, $b$, greater than two, so that the string of digits

$$ d_n d_{n-1} d_{n-1} \ldots d_0, $$

where each $d_i$ is positive and less than $b$, represents the integer

$$ \sum_{i=0}^n d_i b^i.$$

Some bases are slightly more convenient than others for doing arithmetic. Bases eight and sixteen are both used in various computer applications, and there is an active society, the dozenal society, devoted to using and promoting the arithmetical advantages of base twelve. Much less common, but far more interesting, is base three.

With base three, the digits are all 0, 1 or 2. But I want to look at a variation on this. Instead of using 1 and 2, I’ll use 1 and -1; but it’s not convenient to have minus signs in the middle of our numbers, so because of this and for reasons of symmetry I’ll represent them with 1 (for 1) and 1 (for -1). Base three is ternary, and this variation of it is called balanced ternary.

Continue reading


The mathematics of Maryam Mirzakhani

Maryam Mirzakhani, the first woman Fields medallist and an explorer of abstract surfaces, left us in the prime of her life. Rightly, the world press mourned her passing, but what I hope to do here is to write about the beautiful and difficult mathematics she loved working on. As a pure mathematician, she was usually driven by in-depth understanding of the different complex structures on abstract surfaces, rather than the search for application. Nevertheless, her work has been used in solving real life problems. But what I personally find fascinating about her is the courage and creativity she had in attempting and solving long standing problems and the variety of areas within mathematics she worked on; from complex geometry and topology to dynamical systems. Here is a more intuitive exposition of some of her achievements.


A tasty surface. Image: Descubra Sorocaba, CC BY 2.0

I am sure that if I asked you to give me an example of a surface you would be able to do so straight away. You might say the surface of the Earth is obviously one, and you would be right. However, defining what we mean by a surface mathematically is a little bit trickier. Let’s give it a go. A geometrical object is called a surface if, when we zoom in very closely at the points on the shape, we can see overlapping patches of the plane. If we were to use mathematical language we would say that a surface is locally homeomorphic to the plane.

A genus 1 surface

You might not find this definition particularly helpful so let us consider a few more examples. Oranges, tomatoes, apples and, for more delicious alternatives, cakes, cupcakes, ring doughnuts and pretzels are all surfaces. Well, almost! In order for them to be surfaces we need to picture them hollow (or like a balloon), rather than solid. If we consider those objects geometrically, meaning that we differentiate between different angles and size lengths, we notice that there are infinitely many of them. This is why we consider them topologically. Using continuous deformations we can turn almost all of our examples into a sphere, except the ones that have holes in them. They are considered to be in a class of their own. Thus we can classify the surfaces up to deformations (topological equivalence) by the number of holes, which we call the genus. We can see that the sphere has genus 0, the torus (ring doughnut) genus 1 and the 3-fold torus (pretzel) genus 3, thus these are all inequivalent surfaces.

Mirzakhani’s work was on Riemann surfaces. To turn a surface into a Riemann surface we need to give it additional geometric structure. For example, we can give the surface geometric structure that allows us to measure angles, lengths and area. An example of such geometry is hyperbolic geometry. It is the first example of non-Euclidean geometry; the only way it differentiates from Euclidean geometry is that given a line $\ell$ and a point $P$, that is not on the line, we can draw at least 2 distinct lines through $P$ that are parallel to $\ell$.

Parallel lines and a triangle on a hyperbolic surface

One peculiar consequence of this new axiom is that rectangles do not exist in hyperbolic geometry. Moreover, the angle sum for a triangle is always less than $\pi$. Mirzakhani’s early work was on hyperbolic surfaces, which are Riemann surfaces with hyperbolic structure. The problem with hyperbolic surfaces is that we cannot really visualise them, because the hyperbolic structure on the Riemann surface can’t be embedded in $\mathbb{R}^3$. However, we can try and describe roughly how you put the structure on the surface. Imagine our surface is made out of rubber and we can bend it and fold it in all dimensions. Now we add the hyperbolic structure, but for that we need, according to John Nash, 17 dimensions. If we next dip it in cement it becomes solid, and we can no longer stuff it into 3 dimensions, hence we can no longer visualise it completely.

On these surfaces, Mirzakhani studied special objects called closed geodesics. Roughly speaking, a geodesic is a generalisation of the notion of a straight line that we have on the Euclidean plane. We can define a geodesic more rigorously as a path between two points on the surface, whose length cannot be shortened by deforming it. For example, on the sphere, the geodesics are called great circles. These are simply the intersections of a plane going through the origin and the sphere itself.

A sphere sliced in half along a great circle

A closed geodesic is a geodesic that starts and ends at the same point. The simplest example of a closed geodesic is a circle.We also allow intersections, that is, geodesics that look, for example, like a figure-of-eight and are much more complicated. Using the hyperbolic structure on the Riemann surface, we can compute the lengths of these closed geodesics. A natural question we can ask is how many such closed geodesics are there on any hyperbolic surface of length $\leq L$? The answer was established in the 1940s by Delsarte, Huber and Selberg and it was named the prime number theorem for hyperbolic surfaces, because of the resemblance to the prime number theorem. That is, the number of closed geodesics, denoted by $\pi$, satisfies\begin{align*} \pi(X, L)\sim \mathrm{e}^L/L,\end{align*}as $L\rightarrow \infty$. Roughly speaking, the number of closed geodesics on a hyperbolic surface $X$ of length $\leq L$ gets closer to $\mathrm{e}^{L}/L$ as $L$ becomes very big. We can see that their number grows exponentially, meaning very quickly, but more importantly we also see that the formula does not depend on the surface we are on.

The next question to consider is what would happen if we no longer allow our geodesics to intersect themselves? Would our formula change much? Will the growth rate be significantly different? That is, we wish to compute the number of simple closed geodesics (simple meaning no intersections are allowed) on a hyperbolic surface $X$ of length $\leq L$, denoted $\sigma(X, L)$. In 2004, Mirzakhani proved, as part of her PhD thesis, that \begin{align*} \sigma(X,L)\sim C_{X} L^{6g-6},\quad\text{as }L\to\infty,\end{align*}where $g$ denotes the genus of the surface $X$, and $C_{X}$ is some constant dependent on the geometry (hyperbolic structure) of the surface. It is important to make clear that surfaces of a given genus can be given many different hyperbolic structures. As a consequence the number grows much slower (polynomially) but it also depends on the surface we are on. The surface may be the same, but the different structure implies that we would have different geodesics and their lengths would also be different. Whilst she was computing $\sigma(X,L)$, she discovered formulae for the frequencies of different topological types of simple closed curves on $X$. The formulae are a bit too complicated to explain here, but let us consider an example: suppose $X$ is a surface of genus 2; there is a probability of 1/7 that a random simple closed geodesic will cut the surface into two genus 1 pieces. How cool is that?!

Flight paths follow geodesics on the Earth’s surface

Even though these results are for a given hyperbolic structure, Mirzakhani proved it by considering all structures at the same time. We know that we can continuously deform surfaces of the same genus $g$ and they will be topologically the same, however geometrically they may be different. These deformations depend on $6g-6$ parameters, which was known to Riemann. We call these parameters moduli and we can consider their space, the so-called moduli space of all hyperbolic structures on a given topological surface. By definition, a moduli space is a space of solutions of geometric classification problems up to deformations. This is a bit abstract, so let us illustrate it with a simple example. Suppose our geometric classification problem is to classify the circles in Euclidean space up to equivalence. We would say that two circles are equivalent if they have the same radius, no matter where their centre lies. That is, our modulus (parameter) is the radius $r$ of the circle, and we know that $r\in\mathbb{R}^{+}$. Hence the moduli space will be the positive real numbers. So what can we do with these new spaces? Greg McShane observed that you can add a new structure; a so-called symplectic structure which, roughly speaking, allows us to measure volumes on moduli spaces. Mirzakhani found a connection between volumes on moduli spaces and the number of simple closed geodesics on one surface. She computed some specific volumes on moduli spaces and her celebrated result followed.

Dynamical systems

“Pot as many balls as you can.” Image: Curtis Perry, CC BY-NC-SA 2.0

In recent years, Mirzakhani focused her attention on dynamical systems on moduli spaces. A dynamical system is simply a system that evolves with time. Originally, dynamical systems arose in physics by looking at the movements of particles in a chamber or planets in the solar system. It was observed that these large systems are similar to smaller ones, and by studying toy models we might shed some light on the actual physical dynamical systems.

One such toy model is the dynamical system of billiard balls on a polygonal table (not necessarily rectangular). Bear in mind that in this version of billiards we only use one ball and it can travel forever on a path as long as it doesn’t reach a corner. The billiard balls will take the shortest paths, thus they travel via geodesics, and this is where Mirzakhani’s research come into play. As we know by now, she studied surfaces rather than polygons, but if you orient the edges of the table in pairs and glue them together then you can turn it into a surface.

Even though billiard dynamics might seem simple, there are difficult problems that are still unsolved. One might ask if there are any periodic billiard paths, and if so, would the answer change if we change the shape of the table $T$? This problem has been solved: it is known that there is always at least one periodic billiard path for a rational polygonal table (by a rational polygon, we mean a polygon whose angles are rational multiples of $\pi$). But what if we now ask what is the number of such periodic billiard paths of length $\leq L$ on a table $T$, denoted by $N(T, L)$? It is conjectured that the following asymptotic formula holds.

\begin{align*} N(T, L)\sim \frac{C_T L^2}{\pi \text{ Area}(T)},\end{align*}where $C_T$ is some constant depending on the table. Alongside Alex Eskin and Amir Mohammadi, Mirzakhani made some progresstowards this result. They showed that $\lim_{n\to\infty}N(T,L)/L^2$ exists and is non-zero. Mirzakhani’s work unfortunately does not provide a solution, however she brought progress by showing that this number satisfies the following asymptotic formulaMoreover, she showed that for the asymptotic formula to even exist in the form above, there exist only countably many numbers $C_T$. Recently, Mirzakhani and Eskin’s work on billiard paths was applied to the sight lines of security guards in complexes of mirrored rooms.

Another example of her impact on dynamical systems is her work on Thurston’s earthquake flow. Suppose we have a Riemann surface $X$ of genus $g$, a simple closed geodesic $\gamma$ on $X$ and a real number $t$. Then we obtain a new Riemann surface $X_t= \text{tw}_{\gamma}^t(X)$ by cutting $X$ along $\gamma$, twisting it to the right by $t$ and re-gluing. Then we can define the flow at time $t$ to be

\begin{align*} \text{tw}^{t}(\lambda, X\hspace{0.3mm})=(\lambda, \text{tw}_{\lambda}^t(X\hspace{0.3mm}))\end{align*}

where $\lambda$ is geodesic lamination. The definition of geodesic lamination is quite technical, so in this article we can simply think of it as a disjoint collection of simple geodesics on $X$. Intuitively, we have a dynamical system like the movement of planets in time $t$, but in our case the objects that move with time are moduli spaces.

We get some sort of periodicity, because if $\gamma$ has length $L$, then $X_{L+t}=X_t$. Mirzakhani showed something truly remarkable: The earthquake flow is ergodic.
This means that if we follow the laminations along we would be very close to any point on the surface with probability 1. This came as a surprise, because until then there was not a single known example showing that the earthquake paths are dense.

Maryam Mirzakhani by Mehrzad Hatami

She might not have always wanted to be a mathematician, aspiring to be a novelist when she was younger, but she left a big mark on mathematics. As the first Iranian and first woman to win the Fields medal, I believe she has been an inspiration to many young girls and women, including me, to go into research and be optimistic when solving problems because the “beauty of mathematics only shows itself to the more patient followers”.


Biography of Sophie Bryant

In 1884, Sophie Bryant’s paper, On the ideal geometrical form of natural cell structure, was published by the London Mathematical Society (LMS). It was ambitious, logical and descriptive: it looked at the phenomenon of the honeycomb.

Her insight was that the complex and beautiful honeycomb shape was a product of the natural activity of bees. All that was needed was for each bee to excavate its own cell at approximately the same rate as the others, and to use the excavated material to build up the walls of its cell. Bryant’s conclusion, that elongated rhombic semi-dodecahedra are the natural form of honeycomb cells, had been observed by Kepler.

A rhombic semi-dodecahedron can be made by putting square-based pyramids on the faces of a cube

In the eighteenth century, it was believed that the honeycomb was the most efficient cell shape possible, but this is now known not to be the case. In 1964, the Hungarian mathematician Fejes Tóth observed in his paper, What the bees know and what they do not know, that there are in fact more efficient cell shapes which have yet to be determined.

Kepler conjectured in 1611 that no packing of balls of the same radius in three dimensions has density greater than the face-centred cubic packing—the cannonball packing—with a density of about 74%. Bryant’s paper assumed this conjecture to be true, as it had appeared obvious for centuries and many had attempted proofs.

Cannonball packing. Image: Wikimedia Commons user Greg L, CC BY-SA 3.0

The conjecture was eventually proved by Hales et al in 1998. Their computer-assisted proof was so huge that it took 12 referees to check
it. After five years, the referees said that they were 99% sure that the proof was correct. Unusually, Annals of Mathematics published the paper in 2005 without complete certification from the referees. It was finally accepted as proven in 2014, and then only with the aid of massive amounts of computer time.

Bryant’s approach to the subject was not unusual at that time. Abstract proofs, so essential to us now, were not as common as general discussion of mathematical phenomena. She wrote “The form of a natural structure is a logical result of its mode of genesis, and that form is ideal of which the mode of genesis is perfectly regular”. She states that there are only three possible arrangements without explaining why these are the only ones.

Bryant’s paper is notable since it is the first published paper written by a woman member of the LMS. However, she was not the first woman to be elected to membership, being preceded by two remarkable women, Charlotte Angas Scott and Christine Ladd Franklin.

Charlotte Angas Scott

Though Bryant was the first woman member of the LMS to publish a paper, she was not the first woman member. That honour goes to Charlotte Angas Scott (1858—1931), an algebraic geometer, who became a member in 1881.

Scott had been aided in her mathematical education by an enlightened father. This resulted in her obtaining a scholarship to Girton College, Cambridge. However, women in Cambridge were not granted degrees until 1948 and she had to be content with the accolades of her peers.

She was appointed a lecturer at Girton and received an external BSc degree from London University, and later a doctorate. Scott moved to the newly opened Bryn Mawr College for women in the USA where she was appointed head of mathematics, and remained for forty years.

Being first

Being the ‘first woman’ was not unusual for Bryant. She was the first woman to receive a DSc degree in England, studying what was then mental and moral philosophy, but today would be referred to as psychology and ethics. She was also one of the first three women to be appointed to a Royal Commission—the Bryce Commission on Secondary Education in 1894–95—and she was one of the first three women to be appointed to the senate of London University.

While on the senate she advocated setting up a day training college for teachers, which eventually became the Institute of Education. Later in 1904, when Trinity College, Dublin opened its degrees to women, Bryant was one of the first to be awarded an honorary doctorate. In Cambridge, she was also instrumental in setting up the Cambridge Training College for Women which eventually became Hughes Hall, the first postgraduate college for women in Cambridge.

She was also, it seems, one of the first women to own a bicycle.

Beginnings and early widowhood

Bryant was born in Ireland, and was fortunate to learn mathematics as well as other academic subjects with her five siblings in a very natural way from their father, the Rev WA Willock DD. A keen educationalist, he had been a fellow and tutor at Trinity College, Dublin and had gained high honours in mathematics and mental sciences.

When Bryant was about thirteen, her family moved to England and her family education continued until she attended Bedford College, where she was awarded the Arnott scholarship for science in 1866. She sat the Cambridge local examination for girls in 1867 and was the only one to be placed in the first class of the senior division.

In 1869, Bryant married the surgeon Dr William Hicks Bryant, only to be widowed the following year when he died of cirrhosis at the early age of 30.

Christine Ladd Franklin

The second woman member of the LMS, who also joined in 1881, was Christine Ladd Franklin (1847—1930), an American mathematical logician.

Though Johns Hopkins University was not open to women, UCL’s JJ Sylvester, then professor of mathematics, urged not only that she be admitted, but arranged for her to do graduate work under his supervision and to be granted a fellowship. As Johns Hopkins did not award degrees to women, she left without a PhD for her dissertation on symbolic logic.

She was finally awarded a PhD by Johns Hopkins forty-four years after she submitted her dissertation, when she was seventy-eight years old.

Schoolteacher and doctorate

After a short interval, Sophie Bryant returned to her studies. While she had been sitting her examinations, she was introduced to Frances Buss, the headmistress and founder of North London Collegiate School (NLCS), an excellent school then and still highly regarded today. It had been founded in 1850, the year of Bryant’s birth.

Bryant arranged to meet Buss who, in 1875, invited her to teach mathematics at NLCS and encouraged her to take a training course as well. Three years later, London University opened its degrees to women. As Bryant had not had a conventional education, she had to learn Latin and biology to matriculate before she could sit for her degree. In 1881, she earned a BSc degree, gaining a first class in mental and moral science and second in mathematics.

In 1884, she received a science doctorate. The NLCS, where she had continued to teach, presented her with scarlet doctoral robes. Bryant was influential in improving the education system and introduced a scheme of enlightened and serious study.

In 1885, Buss died and Bryant became the headmistress of NLCS until her retirement.


Bryant receiving her
doctorate. Image: North London Collegiate School

Meanwhile, she continued to publish ambitious papers. In her 1884 paper in Mind, The double effect of mental stimuli; a contrast of types, Bryant attempted to analyse the difference between reflex actions, which are performed without conscious thought, and consciously controlled actions.

She was grappling with a contemporary problem: the understanding of consciousness. Unfortunately, her arguments are too diffuse to shed much light on the problem.

In 1885, she published a paper in the Journal of the Anthropological Institute, Experiments in testing the characters of school children. This study, undertaken at the suggestion of Francis Galton, produced an early account of the use of open-ended psychometric tests to deduce personality types. Bryant claimed that her results agreed with the observations of teachers familiar with the children but did not provide any supporting evidence. Despite incomplete analysis, this was a pioneering study.

Later life

Bryant was interested in Irish politics, and wrote books on Irish history and ancient Irish law. She was an ardent Protestant Irish nationalist and was active in the Home Rule movement, which pressed for Irish self-government within the United Kingdom. She wrote on women’s suffrage in 1879 but later advocated postponement until women were better educated in politics.

She enjoyed mountain climbing and she summited the Matterhorn twice. Her death in 1922 was both tragic and unexpected. Only four years after retirement, she was on a mountain hike near Chamonix, in France, when she went missing. Her body was found thirteen days later with several injuries.


Although Bryant’s direct contribution to mathematical scholarship was not substantial, her influence as a teacher and educationalist was immense. The rising number of women mathematicians today is a lasting tribute to her work.


Geographic profiling

Imagine you’re a police officer working on a huge case of serial crime. You’ve been handed the list of suspects, but to your horror 268,000 names are on it! You need to come up with a way of working through this list as efficiently as possible to catch your criminal. Along with the thousands of names, you’re also given a map with the locations of where bodies have been found (the map above). Given these two pieces of intel, how exactly would you prioritise your list of suspects? Have a go! Where exactly would you search for the criminal? We will reveal the answer at the end of article!

Peter Sutcliffe, also known as the Yorkshire Ripper, was the name on a list of 268,000 suspects generated by this investigation in the late 1970s. But how were the team investigating these crimes meant to cope with such an overload of information? These are the fundamental problems that geographic profiling is trying to solve.

How exactly does geographic profiling work? This article will introduce you to the fundamental ideas behind the subject. We will also look at the various applications, just like the Yorkshire Ripper case, along the way. These examples aren’t just in criminology though. The applications span ecology and epidemiology too!

The first model

Geographic profiling uses the spatial relationship between crimes to try and find the most likely area in which a criminal is based; this can be a home, a work place or even a local pub. Collectively we refer to these as anchor points. The pioneer of the subject, Kim Rossmo, once a detective inspector but now director of geospatial intelligence/investigation at Texas State University, created the criminal geographic targeting model in his thesis in 1987. The criminal geographic targeting model aims to do exactly what we struggled with at the beginning of this article: prioritise a huge list of suspects.

A gridded-up map. Alistair Marshall, CC BY 2.0

It starts by breaking up your map, populated with crime, into a grid, much like on the left. We assume that each crime that occurs, does so independently from every other. We then score each grid cell; the one with the highest score is likeliest to contain the criminal’s potential anchor point.

How do we calculate this score? An important factor is the distance between crimes and anchor points. We choose to use the Manhattan metric as our measure of distance. In this metric, the distance between points $\boldsymbol{a}$ and $\boldsymbol{b}$ is the sum of the horizontal and vertical changes in distance. This is written as:
$$d(\boldsymbol{a},\boldsymbol{b}) = \lvert x_a-x_b \rvert + \lvert y_a-y_b\rvert, \qquad \boldsymbol{a} = (x_a, y_a), \quad \boldsymbol{b} = (x_b, y_b).$$

The Manhattan metric is so-called because it resembles the distance you have to travel to get between two points in a gridded city like Manhattan.

This is the most suitable metric for our work, but it’s worth noting there are more that can be used (depending on the system you’re studying). Now we could just start searching at the spatial mean of our crimes and work radially outward from that point, however one rogue crime occurring far away from the rest could easily throw a spanner in the works. Instead we use something called a buffer/distance decay function.

$$ f(d) =
\dfrac{k}{d^{h}}, & d > B \\
\dfrac{kB^{g-h}}{(2B-d)^g}, & d\leq B\\

A criminal isn’t likely to commit a crime close to an anchor point, out of fear of being recognised, so we place a buffer around it. In addition, to commit a crime far away from home is a lot of hassle, so the chance of a crime decays as we move away from the anchor point. This is why our buffer/decay function looks a bit like a cross-section of a volcano. The explicit function, $f(d)$, is written on the right, where $k$ is constant, $B$ is the buffer zone radius and $g$ and $h$ are values describing the criminal’s attributes, eg what mode of travel they use. With our distance metric, $d$, and buffer/decay function, $f$, we are now able to compute a score for each grid cell.

For $n$ crimes, the score we give to cell $\boldsymbol{p}$ is
$$ S(\boldsymbol{p}) = \sum_{i=1}^{n}f(d(\boldsymbol{p}, \boldsymbol{c}_i)), $$
where $\boldsymbol{c}_i$ is the location of crime $i$. So finally we have a score for each grid cell and we can prioritise our list!

An example of the geographic profile created using the criminal geographic targeting model

Plotting these scores on the $z$-axis produces a mountain range of values, like on the right. We can now prioritise by checking residencies at the peak of this mountain range and working our way down. Notice the collection of peaks around a particular area: this gives us an indication that perhaps the criminal uses more than one anchor point.

An important question: how can we be sure this even works? Does it really identify anchor points efficiently? What do we even mean by “efficient”? This is answered with a quantity called the hit score. This is

$$\text{hit score} = \frac{\text{number of grid cells searched before finding the criminal}}{\text{total number of grid cells}}. $$

So ironically, the lower our hit score, the better our model performs. This is sensible, since we want to search as little space as possible to catch our criminal.

The Gestapo case

Otto and Elise Hampel distributed hundreds of anti-Nazi postcards during the second world war. The Gestapo’s intuition on where the Hampel duo might live was based on themes almost exactly the same as geographic profiling. Inspired by a classic German novel, Alone in Berlin, our group revisited the Gestapo investigation and published our findings in a journal that is so highly classified we are not able to read it.

By analysing the drop-sites of the postcards and letters we were able to show that geographic profiling successfully prioritises the area where the Hampels lived in Berlin. Crucially, this study actually showed the importance of analysing minor terrorism related or subversive acts to identify terrorist bases before more serious crimes occur.

A statistical approach

The criminal geographic targeting model is an incredibly useful tool and is used to this day by the CIA, the Metropolitan Police and even the Canadian Mounted Police. Mike O’Leary, professor at Towson University, Maryland asked why the criminal geographic targeting model only produces a score, when we require a probability. So he developed a way of using geographic profiling under the laws of Bayesian probability.

Bayes’ rule is better in neon. Image: Wikimedia Commons user Mattbuck, CC BY-SA 3.0

O’Leary uses Bayes’ rule as seen on the right. How do we apply it to criminology? We want to know: what is the probability that an offender is based at an anchor point given the crimes they have committed? Using Bayes’ rule, instead we pretend we know where the anchor point is and ask; what is the probability of the crimes occurring given our anchor point? We use the formulation
$$\Pr(\boldsymbol{c}_1, \boldsymbol{c}_2, \boldsymbol{c}_3, \boldsymbol{c}_4\text{…}\;|\;\boldsymbol{p})\; = \;\prod_{i=1}^{n}\Pr(\boldsymbol{c}_i\;|\;\boldsymbol{p}),$$
where the equality derives from the assumption of independent crimes.

Below, we can see a comparison between Rossmo’s criminal geographic targeting model and O’Leary’s simple Bayesian model. The problem with O’Leary’s model is he assumes that a criminal only has one anchor point. Unfortunately this is rarely the case. As we mentioned earlier, an anchor point could be a home, a workplace, a local pub or even all of the above. So we obtain a probability surface, but we only consider one anchor point. The criminal geographic targeting model entertains the idea that multiple anchor points exist, but doesn’t give us an explicit probability. What we really need is a way of combining both methods. Does such a method exist?

(a) The criminal geographic targetting model

(b) The simple Bayesian model

Examples of the geographic profiles created using the criminal geographic targeting and simple Bayesian models

The elusive tarsiers

Image: Callum Pearson

South-east Asia, specifically Sulawesi, houses a huge number of endemic species. Often habitat assessments of cryptic and elusive animals such as the tarsier (right) are overlooked, primarily due to the difficulties of locating them in challenging habitats. Traditional assessment techniques are often limited by time constraints, costs and challenging logistics of certain habitats such as dense rainforest.

Using only the GPS location of tarsier vocalisations as input into the geographic profiling model we were able to identify the location of tarsier sleeping trees. The model found 10 of the 26 known sleeping sites by searching less than 5% of the total area (3.4 km$^2$). In addition, the model located all but one of the sleeping sites by searching less than 15% of the area. The results strongly suggest that this technique can be successfully applied to locating nests, dens or roosts of elusive animals, and as such be further used within ecological research.

The best of both worlds

The Dirichlet process mixture model is the best of both the criminal geographic targeting and the simple Bayesian models. So far we’ve only stated that we’re either working with one anchor point, or many. The beauty of the Dirichlet process mixture model is that we don’t need to specify the number of anchor points we are searching for. Instead, there is always some non-zero probability that each crime comes from a separate anchor point. So multiple anchor points can be identified while using a probabilistic framework. Introducing multiple anchor points is challenging since we need to know:

  1. How are all the crimes clustered together?
  2. In each cluster of crimes, where is the anchor point?

Actually, what would be really useful is if we knew the answer to just one of these questions. If we knew how the crimes were clustered, finding the anchor points is easy (we use the simple Bayesian model to find the source in each cluster). But also, if we knew where the anchor points were, allocating crimes to clusters is easy (and of course we know where our criminal lives!). The solution to this problem is to use something called a Gibbs sampler. We use a Gibbs sampler in cases where we want to sample a set of events that are conditional on one another. In our case, anchor point locations depend on the clustering of crimes, but the clustering of crimes also depends on the anchor point locations. The steps the Gibbs sampler will take are:

  1. Randomly assign each crime an anchor point (even though we don’t yet know where the anchor points are).
  2. Find each anchor point by using the simple Bayesian model on each assignment.
  3. Throw out the assignments of crimes to anchor points and now re-assign crimes but using the locations found in previous step. Throw out the old anchor point locations and find new ones using this new assignment.
  4. Repeat steps 3 and 4 many, many times.

This produces a new profile like on the right below. We can now compare this to our other two models on the left. We can see the Dirichlet process mixture model displays fewer peaks than the criminal geographic targeting model, but that these peaks are tighter. This in turn will reduce the hit score of our search.

(a) The criminal geographic targetting model

(b) The simple Bayesian model

(c) The Dirichlet process mixture model

A comparison of the three main geographic profiling models

The malaria case

Water bodies with mosquito larvae. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0

Throughout history, infectious diseases have been a major cause of death, with three in particular (malaria, HIV and tuberculosis) accounting for 3.9 million deaths a year. Targeted interventions are crucial in the fight against infectious diseases as they are more efficient and, importantly, more cost effective. They are even more crucial when the transmission rate is strongly dependent on particular locations. For example, we were tasked with finding the source(s) of malaria outbreaks in Cairo by considering the breeding site locations of mosquitos.

All accessible water bodies within the study area were recorded between April and September 2005, and 59 of these were harbouring at least one mosquito larva. Of these 59 sites, seven tested positive for An. sergentii, well-established as the most dangerous malaria vector in Egypt. Using only the spatial locations of 139 disease case locations as input into the model, we were able to rank six of these seven sites in the top 2% of the geoprofile.

Applying the method

The geoprofile associated with the Yorkshire Ripper body dump sites (black dots). The anchor points of Peter Sutcliffe are labelled as red squares. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0

We’ve done it! We now have a robust method for searching for our criminal. A list of 268,000 suspects is no longer so intimidating. Without this technique in 1975-1981, however, there was a lot more work for the team investigating the Yorkshire Ripper case. On top of a huge list of suspects, 27,000 houses were visited and 31,000 statements were taken during the investigation.

If we apply our model to the crime sites we were given at the start of this article, we produce the contour map on the right. In this case the areas in white describe the highest points on our probability surface, whilst areas in red describe the lowest. In addition to the contours, we also see two red squares right at the top of the map. These are the two homes Peter Sutcliffe resided at during the period of his crimes. The hit scores for his two residences are 24% and 5% respectively. So by searching only 24% of our total search area, we’ve managed to find both residences. This is far better than a random search which would find them after searching, on average, 50% of our area.

Peter Sutcliffe’s homes are clearly marked on this map but we must remember an important point about geographic profiling: that it is not an ‘X marks the spot’ kind of model, but rather a method of prioritisation.

Investigating an old case

Dramatic scenes covered the newspaper front pages, such as this from 1888. Image: The Illustrated Police News

We can’t talk about the Yorkshire Ripper without mentioning the notorious 1888 London serial killer, Jack the Ripper. The five locations around Whitechapel where bodies were dumped were studied using geographic profiling to try and gain a better idea of where Jack the Ripper may have lived.

The map overleaf shows us the associated geoprofile, with Jack’s suspected anchor point obtaining a hit score between 10-20%, much better than a non-prioritised search!

This is just one example of many cases where we can utilise our new model to study cases from the past where such tools were not available.

Geographic profiling began in criminology, but now spans ecology (catching invasive species) and epidemiology (identifying sources of infectious disease) too. This means saving a hefty chunk of time and money, as well as developing prevention strategies to minimise any negative impacts these problems may cause.

The geoprofile associated with the body dump sites (black dots) of Jack the Ripper. Jack’s anchor point (the red square) is suspected to be around Flower and Dean Street. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0


Pretty pictures in the complex plane

Some of the greatest works of art in history have been produced by mathematicians. One fascinating source of mathematical artwork is fractals: infinitely complex shapes, with similar patterns at different scales. Fractal geometry has dramatically altered how we see the world. Technology has many uses for fractals, one of which is the production of beautiful computer graphics. These pretty pictures are used to present a large amount of information about a function in a clear and comprehensible manner,  and the simplicity of the maths involved in producing these pictures is fascinating.

Pretty pictures in the $z$-plane are widely used as computer graphics, book covers and even sold as works of art.

Modern art studies have often been dismissive of the power of beauty in mathematics, with the idea that “beauty is not in itself sufficient to create a work of art”. Mathematics produces rigid and inflexible answers, whereas art is free-moving and open to interpretation. However, it is undeniable that these pretty pictures demonstrate true beauty, not only in the images but also in the mathematics behind them.

The mathematics behind pretty pictures

Extremely simple functions can be used to produce these pictures. For example, let’s consider the quadratic function $f\hspace{0.4mm}(x\hspace{0.3mm})=x\hspace{0.3mm}^2+c$, for some constant $c$. An iterative method is applied to the function. First, a seed (let’s call it $x_0$) is selected to be the initial value for iteration. The solution of the function is then subsequently recycled as the new input value, $x$. In this way:

x\hspace{0.3mm}_1&=x\hspace{0.3mm}_{0}^2 + c,\\
x\hspace{0.3mm}_2&=x\hspace{0.3mm}_{1}^2 + c = (x\hspace{0.3mm}_{0}^2 + c)^2 +c,\\
x\hspace{0.3mm}_3&=x\hspace{0.3mm}_{2}^2 + c = \cdots\\
\text{and in general, }x\hspace{0.3mm}_n&=x\hspace{0.3mm}_{n-1}^2+c.

We continue until the iteration either converges to a fixed point or cycle, or diverges to infinity. The orbit is the sequence of numbers generated during the process of iteration: $x_0,x_1,x_2,x_3,\ldots,x_n$. If we only apply real numbers to the quadratic function we limit the graphical representation of the iterations to a line. To produce pictures in the plane, we use complex numbers instead.

The abundant beauty in the plots is somehow increased when the simplicity of the mathematics is understood.

Through the process of iteration, each seed will either converge or diverge, and so for a given function we can divide the plane into an escaping set $E_c =\{ z_0 : |z_n| \rightarrow \infty \, \mbox{as} \, n \rightarrow \infty \}$ (that is, all the seeds that end up at infinity) and prisoner set, where the iteration tends to a point or becomes periodic.

The Julia set of a function

To go from the iterative procedure described above to the vivid images to the right, we need to introduce the idea of the Julia set of a function, named after the French mathematician Gaston Julia. Julia was an extraordinary man, who tragically lost his nose while fighting in the first world war. Despite the substantial injury, he made immense progress in the field of complex iteration and published the book Mémoire sur l’itération des fonctions rationnelles in 1918, which began the study of what we now call a Julia set.

The filled-in Julia set is the collection of points in the complex plane that form the prisoner set of a function, while the Julia set itself is the boundary of this region. The points within the filled-in Julia set remain bounded under the iteration since their orbits converge to an attracting point or cycle.

Connected and unconnected Julia sets of the quadratic function for different values of $c$

Conventionally, when pictures of the Julia set are shown, the filled-in Julia set is shaded black and varying colours are used to show the rate at which the escaping set diverges to infinity. The Julia set is therefore the edge of the black region. Maps 1–7 above show the Julia sets of the quadratic function for different values of $c$, with the escaping set colour-coded as follows: red areas represent points that slowly escape from the set, while blue areas signify points that quickly escape to infinity. The value of the complex constant $c$ influences the shape of the Julia set.

Maps 1, 4 and 5 all have black centres, which indicate that the Julia set is connected, while maps 3, 6 and 7 demonstrate unconnected sets. For these images, the Julia sets have no black regions and instead the pictures are just flurries of colour. It is not always easy to spot whether a Julia set is connected, however. In map 2, there is no obvious black region, but neither are there colourful individual flurries and instead we see a spiky line. In fact the set is connected, it is just that the filled in Julia set is so slender that the black line points are not visible in the image.

During the initial study of these sets, a fascinating criterion for connectivity was discovered concerning the critical point, $z_0=0$. If the critical point is used as the seed, we produce the critical orbit, which is bounded if and only if the Julia set is connected.

Fractal patterns appear in all plots, apart from when $c=0$ or $-2$. The picture below displays examples of magnified sections of the fractals, for $c=-0.7$ (maps 9–12), $c=-0.12 + 0.75 \,\mathrm{i}$ (maps 13–16), $c=0.1 + 0.7 \, \mathrm{i} $ (maps 17–20) and $c=-0.1 + 1 \, \mathrm{i} $ (maps 21–24). Each enhancement of a section produces what appears to be copies of the whole section, not just in overall shape but also with smaller embellishments on every “limb”. For connected plots, these fractals appear as loopy ovals and circles or thin, almost stick-like, sections. For disconnected plots, however, the fractals are grouped together in intricate floral patterns, revealing the same shape and pattern with each level of magnification.

Magnified sections of fractals for different values of $c$

Prior to computer technology, Julia had to rely on his imagination and manually carry out the iterations by hand. Fifty years later, another mathematician applied modern computing power to plot these pretty pictures, finally showing the sets in all their beauty…

The Mandelbrot set

The Mandelbrot set is named after the Polish mathematician Benoit B Mandelbrot, known for being the founder of fractal geometry. The word fractal is derived from the Latin fractus, which means broken, and describes the shape of a stone after it has been smashed.

Mandelbrot discovered that fractals appear not only in mathematics but also in nature, through crystal formation, the growth of plants and landscapes, as well as in the structure of the human body. In 1945, Mandelbrot read Julia’s 1918 book. He was fascinated and, with the aid of computer graphics, was able to show that Julia’s work contained some of the most beautiful fractals known today.

To create the Mandelbrot set, each complex value of $c$ is used as the constant term in the quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.3mm}^2+c$ and iterated with the critical point $z_0=0$ as the seed. If the orbit escapes to infinity, the number of iterations taken for the modulus of the function to exceed a specified value is used to decide on the colour of the map at that point, $c$. Otherwise, when the orbit converges, the point is coloured black. The Mandelbrot set is the set of black points.

For example, if we let $c=-0.15+0.3 \, \mathrm{i}$ then we have the complex quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.3mm}^2-0.15+0.3 \, \mathrm{i}$. We start with $z_0=0$ as the seed and the sequence of iteration (to 5 significant figures) is as follows:

z_1&={0}^2 -0.15 +0.3 \, \mathrm{i} &&\Rightarrow &z_1&= -0.15 +0.3 \, \mathrm{i},\\
z_2&=(-0.15+0.3 \, \mathrm{i})^2-0.15+0.3 \, \mathrm{i} &&\Rightarrow &z_2&= 0.2175 +0.21 \, \mathrm{i},\\
&&&&z_3&=-0.14679+0.20865 \, \mathrm{i},\\
&&&&z_4&=-0.17199+0.23874 \, \mathrm{i},\\
&&&&z_5&=-0.17742+0.21788 \, \mathrm{i}.

Continuing to 30 iterations, the orbit has not escaped to infinity and instead converges to the point   $z=-0.17082+0.22361\, \mathrm{i}$ (again to 5 significant figures). Therefore, $c=-0.15+0.3 \, \mathrm{i}$ is within the Mandelbrot set and is coloured black.

On the other hand, if we take $c=-1.85+1.2 \, \mathrm{i}$, and hence the complex quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.4mm}^2-1.85+1.2 \, \mathrm{i}$, then the sequence of iterations (to 5 sf) is as follows:

z_1&={0}^2 -1.85 +1.2 \, \mathrm{i} &&\Rightarrow &z_1&= -1.85 +1.2 \, \mathrm{i},\\
z_2&=(-1.85 +1.2 \, \mathrm{i})^2-1.85 +1.2 \, \mathrm{i} &&\Rightarrow &z_2&= 0.1325 -3.24 \, \mathrm{i},\\
&&&&z_3&= -12.33+0.3414 \, \mathrm{i},\\
&&&&z_4&= 150.06 – 7.2189 \, \mathrm{i},\\
&&&&z_5&= 22465 – 2165.4 \, \mathrm{i}.

The characteristic segments of the Mandelbrot set

If the modulus of $z$ exceeds 100, then it has been proven that the orbit escapes to infinity. This occurs on the fourth iteration, so the colour chosen to represent the value of 4 would be plotted at the point $(-1.85,1.2)$ in the complex plane. The resulting image is shown in map 8, and also in the picture to the left.

The largest segment of the set is called the cardioid due to its heart-like shape. Attached to this are adornments called bulbs, upon closer inspection of which it is possible to see many smaller, somewhat similar, embellishments. The bulbs are not completely identical, although most exhibit a similar shape, and the main differences can be seen in their filaments. The filaments are the thin strings of bounded points that sprout like sticks from the tops of the bulbs. These sticks are extremely narrow and they appear to be coloured red, which would indicate they are not part of the set. However, if we were to zoom in closer on these regions, we would actually see black lines!

The self-similarity of the Mandelbrot set

The Mandelbrot set is self-similar, consisting of miniature Mandelbrot sets within the boundary of the largest set. By enhancing the filaments, smaller copies of the overall set appear in ‘Russian-doll’ like fashion, as seen in maps 26–30 above. Closer inspection of map 27 shows many more self-similar sets within the filaments around the perimeter of the Mandelbrot set. Magnifying the small copies of these Mandelbrot sets would yield infinite layers of self-similar sets.

Other fascinating and intricate shapes occur, for example the “seahorse valley” that is visible in maps 31–34 above. By enhancing the plot within this region we see two rows of seahorse shaped embellishments, each with “eyes” and “tails”. Further magnification of the “eyes” reveals spiral constellations of more “seahorses”.

Connection between Julia sets and the Mandelbrot set

The orbit of the critical point $z_0=0$ can be used to test the connectivity of the Julia set, and the Mandelbrot set shows the boundedness of these critical orbits. Hence, the Mandelbrot set itself indicates the connectivity of the Julia sets of all the different complex quadratics. The Mandelbrot set can be described as $M = \{ c \in \mathbb{C} \, | \, J_c \, \mbox{is connected}\} $, where $J_c$ is the Julia set of the function $z\hspace{0.3mm}^2+c$. The Julia set is a connected structure if $c$ is within the Mandelbrot set, and will be broken into an infinite number of pieces if $c$ lies outside the Mandelbrot set.

The cardioid-shaped main body contains all values of $c$ for which the Julia set is roughly a deformed circle (figure below: maps 35, 37, 38 and 40). The values of $c$ which lie in a bulb of the Mandelbrot set produce a Julia set consisting of multiple deformed circles surrounding the points of a periodic attractor. The number of subsections sprouting from a point on the Julia set is equal to the period of the bulb in the Mandelbrot set (below; maps 36, 39, 41–44).

A specific Julia set can be defined by a point in the Mandelbrot set

The nature of the convergence of points within the Mandelbrot set depends on the segment in which the point resides. Seeds within the cardioid converge to an attractive point, whereas orbits starting in the bulb lead to an attracting cycle.

Three particularly interesting cases of Julia sets are shown below. The first is when $c=0$, where the filled-in Julia set comprises of all the values within the unit circle (circle of radius 1, centred on the origin) and each of these points converges to $0$ when iterated. The Julia set is the boundary of the circle, the points of which, when iterated, remain on the boundary.

Three remarkable examples of the Julia set with $c=0$, $c=\mathrm{i}$ and $c=-2$

The second interesting case is when $c=\mathrm{i}$. Here, the Julia set is a dendrite, meaning there are no interior points. Instead, the set is just a branch of points. For this complex constant the dendrite is a single line in an almost lightning-bolt shape. The final case is $c=-2$, where the Julia set is a dendrite that lies directly on the horizontal axis between  $-2<x<2$.

Explore the sets yourself

I hope to have displayed the beauty behind these pictures by emphasising the extraordinary quantity of information contained in such a simple procedure, as well as through highlighting the complexity of each image, in the variety of fractals and colours visible, which further enhances the beauty.

If this article has sparked an interest in fractals, then why not try exploring these sets for yourself? You could do this by magnifying different sections of the Mandelbrot set to explore the countless shapes and patterns that exist within. You could also go deeper into exploring individual orbits.

All of these pictures are generated using simple quadratic formula. However, the Julia and Mandelbrot sets can be produced for a wide variety of functions in a similar manner to obtain countless pretty pictures.

These images are already becoming dated, having been taken for granted for so many years since they were first produced on the big bulky computers of the 1980s. The Julia set of the quadratic function, and the corresponding Mandelbrot set, could be inspiration for pretty pictures which are yet to be fully explored, or even discovered. Largely, the discoveries discussed here have been recorded in recent years. Furthermore, there could still be vast amounts of information within these sets that are yet to be discovered. Could you be the one to make a discovery?


  1. Franke, H. (1986). Refractions of Science into Art. In: H. Peitgen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Springer-Verlag, pp.181-187.
  2. Mandelbrot, B. (2004). Fractals and chaos. 1st ed. New York: Springer.
  3. Peitgen, H., Jurgens, H. and Saupe, D. (1992). Fractals for the Classroom: Part 2: Complex Systems and Mandelbrot Set. 1st ed. New York: Springer-Verlag, pp.353-473.
  4. Hall, N. (1992). The New Scientist guide to chaos. 1st ed. London: Professional Books.
  5. Douady, A. (1986). Julia Sets and the Mandelbrot Set. In: H. Peitgen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Springer-Verlag, pp.161-173.
  6. Fraser, J. (2009). An Introduction to Julia Sets. 1st ed. [ebook] Available at: [Accessed 4 Apr. 2017].
  7. Peitgen, H., Jurgens, H. and Saupe, D. (1992). Fractals for the Classroom: Strategic Activities Volume two. 1st ed. New York: Springer-Verlag.
  8. Devaney, R. (2006). Unveiling the Mandelbrot set | [online] Available at: [Accessed 4 Apr. 2017].
  9. Moler, C. (2011). Experiments with MATLAB. 1st ed. [ebook] MathWorks, p.Chapter 13 Mandelbrot Set. Available at: [Accessed 20 Apr. 2017].
  10. Peitgen, H. and Richter, P. (1986). The Beauty of Fractals. 1st ed. Berlin: Springer-Verlag.
  11. Mandelbrot, B. (1986). Fractals and the Rebirth of Iteration Theory. In: H. Petigen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Spring-Verlag, pp.151-160.
  12. (2017). Illustrating Three Approaches to GPU Computing: The Mandelbrot Set – MATLAB & Simulink Example – MathWorks United Kingdom. [online] Available at: [Accessed 20 Apr. 2017].


Pretend numbers

Some years ago when one of my nieces was quite small, she loved to play a guessing game with the family. She would think of a number (for us that means a whole positive number) and then we we would ask her questions with yes/no answers, trying to guess what her number was. For example, we could ask ‘is it even?’ or ‘is it bigger than 100?’, etc.

Now this game was going on for some time and I was starting to wonder if she actually had a number or was she just pretending (this was by far my cheekiest niece). Pretending to have a number is not too hard when the questions are coming at you one at a time. You just need to make sure you don’t box yourself in. For example, if one of us asked ‘is it bigger than 100?’, then she would have to reply ‘yes’. Otherwise we could just go through all the numbers from 1 to 100 asking ‘is it 1?’, ‘is it 2?’ etc, and when she answered ‘no’ to all of them she would have been caught out.  My niece could definitely pull it off.

Pretend numbers

A much deeper question is whether she could have decided in advance the answer to every possible yes/no question, without actually having a number.  In order for her to do that her  answers would have to be consistent.  That is, for any possible finite sequence of questions that we might ask, her answers could not contradict themselves.  We will call such a consistent set of answers a pretend number (the technical term is a non-principal ultrafilter).

So do pretend numbers even exist? The answer turns out to be more philosophical than mathematical. A pretend number requires an infinite set of answers to every possible yes/no question. Thus their existence is tied up with the axiom of choice. The axiom of choice is traditionally explained in terms of pairs of identical socks. If I had $3$ such pairs and I wanted to pick one sock from each pair, then I would have $2  \times 2\times2 =8$ ways of doing so.  If I had $5$ pairs of socks then I would have $32$ ways of picking one from each pair. The more pairs of socks I have, the more ways of choosing one from each pair. So it seems reasonable that if I had infinitely many pairs there would certainly be at least one way of choosing one sock from each pair.

However, this would require infinitely many choices to be made and it is not clear that this is possible.  Note that the socks are identical, so I cannot for example just pick the larger one from each pair and explain all my choices in one sentence.  The axiom of choice says that infinitely many choices can be made. It cannot be proven or disproven mathematically, so we are free to believe it or not.  Whilst the sock example may make it sound reasonable, it has some pretty unintuitive consequences if you believe it.  For example, the freedom to make infinitely many choices would allow you to cut a solid object into such fuzzy and strange pieces, that you could put them back together and have twice the volume that you started with!

Moreover, if you believe that it is possible to make infinitely many choices then pretend numbers do exist.  In fact, for any consistent set of answers to some yes/no questions there is a pretend number with those properties.

So there is a pretend number $U$, with leading digit 2 such that for any number $n$, if I ask, as shown above, ‘Does $n+U$ have leading digit 2?’, the answer is ‘yes’.

Note that no actual number has this property!  If you had selected an actual number, say $2341$, then it would fail as $1000+2341$ does not have leading digit 2.  However these answers are consistent, as for any finite set of values of $n$ that I could ask about, there is always an actual number $m$ that $U$ could be.  That is, $n+m$ all have leading digit 2.  For example, if I say ‘yes, $1000+U$ has leading digit 2, and so does $123456+U$ and $7101202+U$’, then I haven’t contradicted myself as $U$ could be 20,000,000.

On the other hand, if we do not believe in the axiom of choice, then there need be no pretend numbers at all.  The surprising and somewhat controversial thing is that both answers are mathematically correct.  As my undergraduate lecturer put it, if you believe in a mind that can make infinitely many choices then you are more likely to believe in the axiom of choice, whereas if you only believe in the concrete and tangible, you may be more inclined not to.

My niece never told us what her number was.  Did she have a number?  Was she tricking us?  Had she found a way around the philosophical ambiguity of the axiom of choice and come up with a pretend number?  We will never know.

Properties of pretend numbers

Given the philosophical and even religious ambiguity that surrounds their existence, one can perform surprisingly concrete operations on pretend numbers, such as arithmetic.  If I want to add an actual number $n$ to a pretend number $U$ then I get a pretend number $n+U$.  But what is $n+U$?  Recall that a pretend number is just a collection of consistent answers to every yes/no question about it, so to say what $n+U$ is I just need to be able to answer yes/no questions about it.  But a question about $n+U$ is really just a question about $U$, and we know all the answers to yes/no questions about $U$, since $U$ is a pretend number.  For example, if you ask ‘Is $3+U$ a square number?’, then the answer is just the same as the answer to the question ‘Is $U$ three less than a square?’.

Some examples of numbers presented in a very colourful way (Mark Morgan, CC-BY 2.0)

Now for the mind-bending part.  How do I add two pretend numbers? What is $U+V$?  Again I just need to be able to answer yes/no questions about $U+V$. Consider a yes/no question: ‘Does the number have property $A$?’ (eg $A$ could be the property of being prime or being a square).  For any actual number $u$, we have just seen how to answer for $u+V$.  So we can ask ‘Is $U$ one of the actual numbers $u$, such that $u+V$ has property $A$?’  We know the answer, as by definition we know the answer to all yes/no questions about $U$.

So we say that $U+V$ has property $A$ precisely when $U$ has the property of actual numbers $u$ that $u+V$ has property $A$.

This takes a bit of getting your head around, but is not as bad as it seems. Let us look at an example. Suppose Umar is playing the guessing game and is thinking of the number $2$, and Vicki is playing the game and thinking of the number $3$.  Then if we add their numbers via the process just described we would hope to get $5$ (so our notion of addition agrees with addition of actual numbers).  Let’s see if that is indeed the case:

The sum has a property $A$ precisely when Umar’s number has the property of numbers $u$ that $u+V$ has property $A$ (where $V$ denotes Vicki’s number).  As Umar is thinking of the number $2$, he would answer yes precisely if $2+V$ has property $A$.  This is the case precisely when $V$ has the property of actual numbers $v$ that $2+v$ has property $A$.  As Vicki is thinking of $3$, she would answer yes precisely if $2+3$ has property $A$.

In summary, the sum of Umar and Vicki’s numbers has a property precisely when $2+3=5$ has that property.  In particular, if you ask if the sum is $5$, then the answer is yes.  Thus the sum of Umar and Vicki’s numbers is indeed $5$ as we had hoped.

Addition of pretend numbers behaves quite sensibly in many ways.  For pretend numbers $U, V, W$ we have $$(U+V)+W= U+(V+W)$$ as you would expect.  Also for an actual number $n$ we have $n+U=U+n$.  The big shock is that there exist pretend numbers $U$ and $V$ with $$U+V \neq V+U.$$
For example, we have already seen that we can have a pretend number $U$ with leading digit 2, such that for any actual number $n$ the leading digit of $n+U$ is still 2.  Similarly we can have a pretend number $V$ with leading digit 5, such that for any actual number $n$ the leading digit of $n+V$ is still 5. So what is the leading digit of $U+V$?

Once we accept the axiom of choice we are able to assume the existence of pretend numbers…

For any actual number $u$ the leading digit of $u+V$ is 5, so if you ask if $U$ has this property, I would have to say `yes’, or you would have caught me out as not having an actual number.  So the leading digit of $U+V$ is 5.  But by the same logic the leading digit of $V+U$ is 2, so they cannot be the same!

Once we accept the axiom of choice we are able to assume the existence of pretend numbers with desired properties and do calculations with them as above.  However we should bear in mind that we can never actually specify a pretend number.  We cannot specify consistent answers to all the infinitely many yes/no questions one might ask about it.  To actually produce a pretend number would contradict the fact that it is not mathematically wrong to say that there are no pretend numbers.


Given this intangibleness, it is natural to assume that pretend numbers are an intellectual curiosity, of no relevance to anything else. In fact nothing could be further from the truth.  They play a prominent role in many fields such as mathematical logic, non-standard analysis, point set topology, topological dynamics, geometric group theory, combinatorics and field extensions and products of algebras.

One especially amazing application is in the field of Ramsey theory.  Ramsey theory considers the painting of structures  (such as the connections in a network or whole numbers) with different colours. In particular, it determines which structures are guaranteed to be found in a part painted a single colour.  The usual idea is that if so little of the structure is painted one colour so that what you are looking for cannot be found in that colour, then enough must be painted in the remaining colours that what you are looking for can be found in one of them.  A simple example is that if you paint $11$ balls each either red or green, then $6$ of them must be the same colour.  You do not know if there are $6$ green balls or $6$ red balls, but if you don’t have one then you must have the other.

In order to appreciate the application of pretend numbers to Ramsey theory, I invite you to solve the following puzzle before reading on:

In the box below the numbers from 1 to 30 have been coloured with different colours.  Can you find three distinct numbers that are all the same colour and all the same colour as any sum of some of them?

Hopefully that wasn’t too hard.  Now suppose that the numbers from 1 to 1,000,000 are coloured red, green, blue and yellow.  Could you guarantee that no matter how they are coloured, you can still find three distinct numbers, all the same colour and whose sums are all the same colour?  It is much harder when you do not know how the numbers are coloured, as you must account for every possible colouring.

University of California, Berkley (Charlie Nguyen, CC-BY 2.0)

Now suppose that all the numbers, $1,2,3,\ldots,$ are coloured with a finite number of colours.  Instead of finding just three numbers, could you guarantee that there is an infinite set of numbers, so that if you add any finite combination of them together, the result is always the same colour?

In 1970 this was an open problem that combinatorialists were seeking a solution to.  Around this time a young mathematician, Steven Glazer of Berkeley, thought of a short and elegant proof that however all the numbers are coloured, such an infinite set can always be found.  However his proof would only work if there were a pretend number $U$ with $U+U=U$.

Not knowing if a pretend number with the required property existed, Steven Glazer did not publish his work.  In 1974 Neil Hindman finally proved that such an infinite set can always be found, via a much more complicated proof.  The result now bears his name: Hindman’s Theorem.

A year later Steven Glazer mentioned his proof to another mathematician, Fred Galvin.  Now, Fred Galvin knew about compact semigroups.  These structures arose in an area of mathematics known as dynamical systems, which grew out of the study of how physical systems evolve over time-far away from the world of colouring in whole numbers, one might think.

Don’t worry if you don’t know what a non-empty compact semicontinuous semigroup is.  What is important is that the pretend numbers are one (assuming the axiom of choice) and Fred knew it.  Further he knew that in  any  such semigroup, one can solve $U+U=U$.

Note no actual number has this property, as when you add an actual number to itself it gets bigger.  However, using mathematics from this very different field, one can deduce that there is a pretend number with this property. Thus Steven Glazer’s proof was valid.

Glazer’s proof

Suppose all the positive whole numbers have been coloured with some finite set of colours.  Let $U$ be a pretend number with the property that $U+U=U$.  We can ask if $U$ is red.  If not we can ask if it is green.  Proceeding through all the colours, the answer must at some point be yes (or the answers would not be consistent and the pretend number $U$ would be revealed as fake).

Suppose then that $U$ is blue.  As $U+U=U$, it is not saying anything extra to add that $U+U$ is blue.  Thus $U$ satisfies the property of actual numbers $u$ that ‘$u$ is blue and $u+U$ is blue’.  Thus there must actually be a number $x_1$ with this property (or $U$ is revealed as fake).
Now we know the following are blue:

$U$, $U+U$, $x_1$, $x_1+U$, $x_1+U+U$.

Rearranging we have the following all blue:

$U$, $U+U$, $x_1$, $U+x_1$, $U+ x_1+U$.

Let $FS$ of a sequence be the set of sums of distinct terms.  So we may make the above statement more succinctly by saying that the $FS(x_1,U,U)$ are all blue. So $U$, has the property of actual numbers $u$ that the following are all blue:

$u$, $u+U$, $x_1$, $u+x_1$, $u+x_1+U$.

Again there must actually be such a number $x_2$, or $U$ would be revealed as fake. Thus  $FS(x_1,x_2,U)$ are blue. We proceed in this way by induction.

Suppose we have established that $FS(x_1,\ldots, x_n,U)$ are all blue. As $U=U+U$, we know that  $FS(x_1,\ldots, x_n,U,U)$ are all blue.  Thus there must actually be a number $x_{n+1}$ so that  $$FS(x_1,\ldots, x_n,x_{n+1}) \qquad  {\rm and} \qquad  FS(x_1,\ldots, x_n,x_{n+1}+U)$$ are all blue.  We already knew that  $FS(x_1,\ldots, x_n,U)$ are all blue, so combining all three sets we get that  $FS(x_1,\ldots, x_n,x_{n+1},U)$ are all blue.

Thus we obtain an infinite sequence $x_1,x_2,x_3,\ldots$, whose sums of distinct terms are all blue.  Either we have a subsequence of infinitely many distinct $x_i$ or some value $x$ gets repeated infinitely often.  In the latter case all finite sums of distinct terms of the sequence $x,2x,3x,4x,\ldots$ are blue.

Either way we have infinitely many distinct numbers, with any sum of distinct numbers being blue and the proof is complete.

And so…

Steven Glazer found out that his solution works, one year after the problem had been solved by someone else.  Nonetheless his proof is much celebrated by mathematicians for its shortness, its elegance and how it demonstrates the importance of mathematicians sharing ideas from different fields.  In the words of George Bernard Shaw: “If you have an apple and I have an apple and we exchange apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”


Linear algebra… with diagrams

A succinct—if somewhat reductive—description of linear algebra is that it is the study of vector spaces over a field, and the associated structure-preserving maps known as linear transformations. These concepts are by now so standard that they are practically fossilised, appearing unchanged in textbooks for the best part of a century. 

While modern mathematics has moved to more abstract pastures, the theorems of linear algebra are behind a surprising number of world-changing technologies: from quantum computing and quantum information, through control and systems theory, to big data and machine learning. All rely on various kinds of circuit diagrams, eg electrical circuits, quantum circuits or signal flow graphs. Circuits are geometric/topological entities, but have a vital connection to (linear) algebra, where the calculations are usually carried out.

In this article, we cut out the middle man and rediscover linear algebra itself as an algebra of circuit diagrams. The result is called graphical linear algebra and, instead of using traditional definitions, we will draw lots of pictures. Mathematicians often get nervous when given pictures, but relax: these ones are rigorous enough to replace formulas.

Continue reading


Debugging insect dynamics

Social dynamics are complex and have evolved over many generations. One strategy that is used is that of altruism: the act of helping someone else at a cost to yourself. In some insects, this takes on an extreme case where workers sacrifice their own fertility to help raise the queen’s eggs instead. While this may seem to go against the idea that animals want to pass on as many of their genes as possible, we’ll see why this is actually a viable strategy.

Game theory examines how the frequencies of different strategies played in a game change over time. Evolutionary game theory looks at the special case where players cannot change the strategy they play: they’re stuck with the strategy they’ve inherited from the previous generation. In each round, players are randomly matched up and play a game, leading to an outcome dependent on their strategies. This outcome is called a payoff, and it affects their fitness and thus the number of offspring they produce. In the 1970s, John Maynard Smith and George Price developed evolutionary game theory to investigate ritualistic fighting behaviour in animals. A good example is how male stags will compete for territory during the mating season. Physical contact is actually unlikely to occur and the stags can spend hours staring and roaring at each other to determine who is the strongest. If things do escalate, many species of stags have branched antlers allowing them to wrestle rather than impale each other. Species with straighter antlers will tend not to use them in fights, but will resort to biting and kicking, which is far less dangerous. This strategy of assessing your opponent first and picking your fights carefully is clearly beneficial for the species as a whole, but it wasn’t originally clear why an aggressive strategy (where you kill your opponent and pass on your genes) isn’t more common in the animal kingdom.

Smith and Price sought to examine this and they devised the hawk–dove game. A population (of the same species) is split into two groups: hawks and doves. Hawks are aggressive and will play until they win or are seriously injured. On the other hand, a dove is a pacifist and will surrender if its opponent gets aggressive (so it will never get injured). In this game, two players are matched up and compete for a resource (eg food), and the outcome depends on the strategies they play. These dynamics are shown in the following payoff matrix:

meets hawk meets dove
If hawk $(v-c)/2$ $v$
If dove $0$ $v/2$

Here, $v$ is the value of the resource, while $c$ is the cost of injury (from a hawk losing to another hawk). Typically in nature, we find that $c$ is much larger than $v$. To explain the entries of the payoff matrix: when a hawk meets another hawk, there is a 50% chance it will win, gaining $v$; but a 50% chance it will lose, losing $c$. When a hawk meets a dove, it will always win, gaining $v$, while the dove always loses without injury, receiving 0. When a dove meets another dove, there is a 50% chance it will win, gaining $v$ and, if it loses, it does not get injured but receives 0.

These dynamics can be analysed by the replicator equations. The change in proportion of a strategy $i$ ($x_i$) is given by the fitness of the strategy ($f_i$), minus the average fitness of the population, all multiplied by the proportion of strategy $i$: \[
\frac{\mathrm{d}x_i}{\mathrm{d}t}=x_i\Big(f_i(\mathbf{x})- \sum_{j=1}^n x_j\hspace{2pt} f_j(\mathbf{x}) \Big).\]
The distribution of the population into the $n$ strategies is given by the vector $\mathbf{x}=(x_1,\ldots,x_n)$ which, since they are proportions, has entries summing to 1. Using the values from the payoff matrix above leads to a single differential equation (since $x_2=1-x_1$), with a globally stable steady state where the proportion of hawks is $v/c$, which is closer to 0 than 1. While this can explain the lack of aggressive strategies seen in nature, an extension of this is the hawk–dove–assessor game.

An assessor plays as a hawk if they are stronger than their opponent, and as a dove if they are weaker. This is precisely what we tend to see in ritualistic fights in nature. The payoff matrix is given below, which you can verify yourself.

meets hawk meets dove meets assessor
If hawk $(v-c)/2$ $v$ $(v-c)/2$
If dove $0$ $v/2$ $v/4$
If assessor $v/2$ $3v/4$ $v/2$

We find that the strategy of being an assessor is an evolutionary stable strategy. This means that if the entire population is playing as assessors, then any invasion by another strategy cannot succeed and spread. The assessors will always dominate eventually.

Many different strategies can be represented in game theory, including cooperation, spite, selfishness and altruism. Altruism is when a player does something beneficial to the recipient at a cost to itself. This can be seen in games with a memory or reputation system: if the cost is low but the benefit is high, then an individual may act altruistically in the hope of the other player returning the favour later on, leading to a net benefit for them both. Perhaps the most obvious reason for an altruistic act, though, is if the players are related, meaning that they have genes in common. This stems from the idea of inclusive fitness:\begin{align}\text{inclusive fitness}&=\text{individual fitness}+(\text{relatedness}\times\text{relative’s individual fitness}),\\
\omega_i&=f_i+\sum_{j \neq i} R_{ij}\hspace{2pt} f_j.\end{align}
This means that when examining the dynamics of gene frequencies, the fitness of an individual’s family should also play a role since some of the genes will be shared. Relatedness is defined to be the probability that a gene picked randomly from each individual at the same locus (position) is identical by descent. This works out intuitively: 0.5 between you and a sibling/parent/son/daughter (since each parent gives half of their genes to their offspring), 0.25 between you and an uncle/aunt, 0.125 between you and a cousin. Another name for inclusive fitness theory is selfish gene theory, popularised by Richard Dawkins’ book The Selfish Gene, which was influenced by ideas from fellow biologist George Williams. The term ‘selfish’ refers to how some genes may prioritise their own survival (over many generations) over that of the individual or even species. This gene-centred view of evolution helps explain altruism.

William Donald “Bill” Hamilton was an evolutionary biologist who completed his PhD at the London School of Economics and University College London. He claimed that altruistic acts are favoured (and as a strategy, can spread through the population) if the relatedness between the players is greater than the cost to benefit ratio of the act, ie \[R>\frac{C}{B}.\]
This became known as Hamilton’s rule and, while it can be hard to quantify and test, a simple example occurs in prairie dogs. When these rodents are above ground and spot a predator, an individual is more likely to sound an alarm call when relatives are close by: an action that is costly as the individual draws attention to itself.

Colony bee-hive-iour

Atta cephalotes castes (Gamekeeper, CC BY-SA 3.0)

An extreme case of altruism is eusociality:  the highest level of organisation in animal social structure. This structure is what you probably think of when you consider an insect colony: a queen laying eggs and thousands of workers maintaining the nest. Some other traits of this include cooperative brood care and overlapping generations; however, its most distinct trait is the division of labour into castes. These castes are specialised in that individuals from one caste lose the ability to perform the tasks of individuals from another caste. These castes don’t just vary physically, they can express different behaviours and even instincts! In the figure on the previous page, we can see many types of workers on the left, a soldier in the middle and two queens on the right. The soldier is larger than the workers and has a stronger jaw—interestingly this is mainly used for foraging heavy objects and not to defend the nest: that job is up to the workers. Most eusocial animals can be found in the third largest order of insects, called Hymenoptera, which includes bees, wasps and ants.

Hymenopterans are also haplodiploids, ie males have one set of chromosomes (haploids) and females have the usual two (diploids). This bizarre fact means that males actually hatch from unfertilised eggs and females from fertilised eggs. Some strange relationships can come from this. For example, males have no father or sons, but have a grandfather and can have grandsons! But the most intriguing fact is that the workers (who are all female and make up the vast majority of a colony) help raise new brothers and sisters produced by the queen instead of having their own offspring. This fact puzzled Charles Darwin, as it wasn’t clear how a trait that leads to an individual not reproducing and passing on their own genes can be so prevalent in a population. This would mean that their individual fitness is zero; however, going back to inclusive fitness theory, we can see that they can still benefit from their relatives. In fact, we’ll see that this benefit outweighs that of having their own offspring.

Fig. 1: Family tree of haplodiploid insects

A mated queen can produce fertilised and unfertilised eggs for the rest of her life. From the figure to the left, we can see that female workers are more related to their sisters than any other relative (including possible offspring). Explicitly, two female workers share exactly the same set of blue chromosomes (from their father), which is already a relatedness of $0.5$. The other set (in red) can either be exactly the same or different, depending on which was inherited from the queen, giving on average another $0.25$, and making their relatedness $R=0.75$. Following Hamilton’s rule, it indeed makes sense to help raise new sisters instead of offspring, who only have a relatedness of $0.5$. In most species of eusocial Hymenoptera, the queen is aggressive and releases pheromones to discourage workers from laying eggs and, in some cases, from even ovulating. However, it seems that not all workers are happy with this arrangement, and this can lead to several different types of conflict.

Conflict: sex allocation

The queen is equally related to male and female offspring (with a ratio of 1:1) but female workers prefer sisters to brothers (with a ratio of 3:1). Consequently, it is common for the female workers to eat or kill male larvae so that the colony’s resources are not used in raising them. This means that the queen has wasted energy producing and laying the male eggs. A common solution to this is to produce a brood of all the same sex, reducing the motivation for the males to be destroyed. In reality, the ratio is between the two ideals for the queen and female workers. For most of these species, males (AKA drones) do not help with work in the colony: their only role is to mate with a young queen—so, apart from the genetics, female workers may benefit even less from having brothers. For the queen, however, males are still important to help spread her genes even further: if the males are able to mate with a new queen who then starts her own colony, all of the new workers will carry one set of chromosomes from the original queen. In some species, it has been noticed that the queen will lay batches of male eggs when food supplies are low, as female workers’ fitnesses are more affected by food during their developmental stages. However, since males don’t even help with foraging for food, this strategy would only work in the short term.

Conflict: male rearing

Apis mellifera: queen and workers (Jessica Lawrence, CC BY 3.0)

In honey bees, 7% of male eggs are from workers but only 0.1% of adult males are a worker’s son. Why would workers lay eggs in the first place? Well, they are more related to their own offspring ($R=0.5$) than to any brothers ($R=0.25$) that the queen produces. However, laying workers are less hard-working and the queen would rather spend resources raising new children than grandchildren (to whom she is less related), so she tries to prevent this using pheromones. These pheromones inhibit the workers’ ability to lay eggs but are less effective in large colonies, especially if the queen is old.

Worker policing occurs through workers destroying other worker-laid eggs. We can calculate from the family tree (Fig. 1) that workers are more related to new sisters ($R=0.75$) than any nephews ($R=0.33$). The calculation for the second relation works as follows: comparing a female worker to her nephew, there are three possible cases involving the three different sets of chromosomes in the family tree (the blue set of chromosomes from the queen’s mate and the two red sets of chromosomes from the queen herself). One case is where the nephew has the blue set of chromosomes, which his aunt will also have, meaning she shares 50% of her genes with him. If the nephew has a red set of chromosomes, they can either be the same set as his aunt’s (again, 50% of her genes will be shared), or they can be the ‘other’ red set originally from the queen (so his aunt will share none of her genes in this case). Thus on average, a worker has a relatedness of $R=(0.5+0.5+0)/3$ to her nephew. Workers can determine whom the eggs belong to through chemical markers. You may think that there is a selection pressure (in evolution) towards workers who can lay eggs that mimic the queen’s… and you’d be right! These are called ‘anarchic workers’ and in the Cape honey bee, this trait is affected by multiple genes; but, for various reasons, the cost of having these altered genes outweighs their benefit. Note that this conflict does not appear in every eusocial hymenopteran: in some species, the castes are so specialised that workers aren’t even capable of reproducing in the first place.

Conflict: caste fate

Apis mellifera pupae (Waugsberg, CC BY-SA 3.0)

Why wouldn’t a worker choose to develop as a queen instead? Doing so would be selfish but it yields a much greater inclusive fitness. In many species, the determining factor as to whether a female larva develops into a queen or worker is the amount and type of food it receives. Too many queens could cause the colony to run inefficiently or break down, so to prevent this, some species raise their larvae in space-restricted cells where the food supply is controlled by the workers. This generally prevents the larvae from developing into queens. In species where this does not occur, excess queens are killed immediately after emerging.

A queen can actually mate with several males to produce offspring that may not be full siblings. A higher relatedness between workers, however, can reduce the incentive for such selfish acts. Focusing only on the genetics, an already-developed worker would benefit more from raising new sister workers than raising a new sister queen, as this could lead to the raising of nephews/nieces.

Conflict: matricide

There are three parties in favour of the queen’s death: laying workers, non-laying workers and, surprisingly, the queen herself. Going back to Hamilton’s rule, a model was developed by Bourke quantifying the cost–benefit ratio of the queen’s death. These are based on multiple variables including the number of offspring a laying worker can have between the queen’s hypothetical death and the death of the colony. The colony wouldn’t survive long without a queen, and raising new queens isn’t always successful due to the risky mating flight, where a virgin queen flies away from the safety of the nest to mate with males. Bourke’s model applies particularly well to annual colonies where eggs are laid together once a year, after which the colony members all die (apart from young mated queens).

After ordering the cost–benefit ratios of the queen’s death, it was found that laying workers have the most to gain, followed by non-laying workers, and then the queen herself. This is intuitive as, without the queen, laying workers can raise their own offspring. As we have seen in Fig. 1, this shouldn’t happen in the case where the queen has only mated once. However, if she has mated multiple times, the new workers may not be full sisters with the previous workers, and the relatedness would be less than $0.75$. Additionally, if the queen is old and laying fewer eggs (or unfertilised eggs due to food shortages), the benefit of a worker laying their own eggs increases. Once laying workers start attacking the queen, it creates a positive feedback loop: when the queen is injured, she will produce fewer eggs and the critical ratio for the non-laying workers can be surpassed, meaning that they start attacking the queen too. When the queen is seriously injured and no longer able to lay many eggs at all, she would gain a higher inclusive fitness from allowing the workers to lay eggs, since they would all be her grandsons. Thus, when the queen’s own critical ratio is met, she allows herself to be killed by her workers.

Female naked mole rat (Jedimentat44, CC BY 2.0)

Surprisingly, eusociality is also observed in mammals: specifically, two species of mole rat. The naked mole rat is an odd species, famous for its hairless appearance and high resistance to cancer. In captivity, they can live up to 31 years — an astonishing amount of time for a rodent! They live in underground nests under the rule of an aggressive queen who releases pheromones to discourage the workers from reproducing. Unlike the insects, the queen naked mole rat wasn’t born into her position, she had to fight for it! Her reign is also unstable: she will have to defend her crown from female workers. Mole rats (along with almost all mammals) are diploids, so at first glance, it seems that they are just as related to their offspring as to their siblings. However, naked mole rats are infamous for high rates of inbreeding, causing a high level of relatedness between the workers. A few of the males will have the role of mating with the queen, while the others function as workers. It still isn’t certain whether a high level of relatedness is required for eusociality to evolve, or whether it is a consequence of it, but clearly the two are linked.

Does altruism, the act of helping others at a cost to yourself, truly exist in the animal world? In the case of eusocial insects, sacrificing your own fertility to raise the queen’s eggs seems like a noble gesture, but we have seen that the workers only do this in order to pass on more of their genes. Once this benefit decreases, conflict occurs and the workers will continue to pursue their own optimal strategy — killing the queen to lay their own eggs. Can any of this theory be applied to humans? The majority of interactions occur between ‘unrelated’ individuals. For social dynamics, game theory would explain altruistic acts in terms of good karma, expecting others to return the favour later on (which is risky). Perhaps having this expectation can be selfish; however, a common assumption in game theory is that players act ‘rationally’ in the sense that they always try to maximise their payoff. I’d like to believe, though, that people don’t actually think like this and that acts of uncalculated kindness in humans occur more often than the theory would suggest.

[Banner image: Todd Huffman, CC BY 2.0;]