In conversation with Cédric Villani

Early on a February morning, we’re standing outside one of the many trendy cafes in Fitzrovia. Down the street we spot a man striding our way, wearing a full suit, a hat, a giant spider brooch and hastily tying a cravat. It could only be superstar mathematician Cédric Villani.

Cédric is passing through London on his way back from the US, but this is no holiday. In his two days here, he is attending a scientific conference, giving a public lecture, and taking part in a political meeting. His packed schedule leaves the increasingly-busy Fields medallist just enough time to join us for breakfast. Continue reading


Cardioids in coffee cups

Picture the scene. It’s 1am and you’re up late working on some long-winded calculations. The room around you is dark, a desk lamp the only source of light. Your eyelids start to droop. But the work must get done! Time to fall back on the saviour of many a mathematician: coffee.

But as you sit back down at your desk, you notice something weird. The light from your lamp is reflecting oddly from the edges of the cup, creating bright arcs—and it looks suspiciously like a cardioid curve! Time to investigate…

Work forgotten, you pull out a clean sheet of paper and—well, dear reader, you may have been more sensible than me and just gone to bed at this point, or finished the work you were meant to be doing. For me, though… well, let’s just say that sleep would be impossible until this mystery was resolved.

So. We have a cup. We have a light. We have an enigmatic looking curve. What’s going on?

This coffee is clearly demanding that we make theorems from it

Let’s shed some light on this

The paths of two collimated light rays and their different angles of reflection

To keep things simple, we’re going to model the base of the cup as a perfectly reflecting two-dimensional circle, and limit the incoming light to the plane that the circle is in. We can also set the radius of the cup to be 1 without loss of generality. Since the cup and the lamp are reasonably far apart, a decent assumption to make is that the light coming in is collimated, which means that all the light beams are parallel to each other, as if from a point source at ‘infinity’.

If we look at a single light ray coming in parallel to the $x$-axis, we know that the angle of incidence and the angle of reflection are equal, as measured from a line normal to the circle. If a second ray comes in parallel to the first, it will hit the surface at a different angle, so must reflect off at a different angle to the first, and so the reflected rays will no longer be parallel. Instead, they will overlap as shown above.

As we build up light rays, the shape of the envelope begins to emerge. The only difference between this diagram and the physical system is that here, the overlap makes a dark envelope, whereas in the cup the overlap makes a bright envelope

A rule of ray tracing tells us that all light rays parallel to the axis will go through the focal point of a curved surface. This is the point where the two light rays cross in the diagram above. We can therefore expect that to be the brightest point of the pattern that we see in the coffee, since it is hit by the most light.

But what about the rest of the curve? If we draw in a few more light rays, as in the diagram to the left, we start to see areas where many different rays overlap and can build up a picture of the curve. Treating the incident light rays as a family of curves, the bright pattern seen is their envelope. The envelope is a curve that at every point is tangent to one of the incoming rays, and so by moving along its length we move between the different members of the family. Due to the tangent property, it is also the boundary of the most dense area ‘swept out’ by the curves, so in many cases this corresponds to the curve you’d get by joining up all the points of intersection. If we can find the equation of this envelope, we can find out exactly what shape is being formed in the bottom of the cup.

Calculus to the rescue!

A hint about how to find the equation we need comes from the definition: we need to find a curve that is tangent to our family of curves at every point. So, perhaps unsurprisingly, a good way to do this is to use calculus. For a smooth family of curves, we first find a general equation for the curves by expressing them in terms of some parameter, say $a$. We can then find the equation of all their tangents by differentiating with respect to $a$. Since we need these two equations to match at every point along the envelope, to find the equation of the envelope we solve them simultaneously. For example, say we wanted to find the envelope of straight lines that enclose equal area between them and the axes—picture this as a ladder propped against a wall, but sliding down it.

The equation of a straight line in terms of both axis intercepts has the form

$$ \frac{x}{a} + \frac{y}{b} = 1, $$

where $a$ and $b$ are the intercept points. These are both parameters that describe the family of curves, so we use the fact that we want to keep the area $A= ab/2$ constant to eliminate $b$:

$$ \frac{x}{a} + \frac{ay}{2A}= 1. \qquad (1) $$

Differentiating this with respect to $a$ gives
$$ -\frac{x}{a\hspace{1pt}^2} + \frac{y}{2A} = 0. \qquad (2)$$

So equations (1) and (2) are what we want to solve simultaneously. In this case, it’s possible to do this by eliminating $a$ between the two equations, giving the equation of the envelope as

The curves and envelope formed for $A=1$

$$\sqrt{xy\hspace{1pt}} = \sqrt{\frac{A}{2}},$$
or, if we limit ourselves to the first quadrant,
$$xy = \frac{A}{2}.$$

This is the equation of a hyperbola. An example case for $A=1$ is shown to the right. In this case, we were able to eliminate the parameter from the equation to leave it only in terms of $x$ and $y$, but as we will see later, it may be easier to leave envelopes in their parametric form.

But what about the coffee?

We can use basic geometry to express our family of curves in terms of $\theta$

Now we know how to find the equation(s) for an envelope, we can apply this method to our cup scenario. We know the equations of the lines coming in, since they’re all just straight lines parallel to the axis, but we need to find out the equation of the reflected light beams. It turns out that this can be done by taking advantage of the fact that the cup is circular and throwing some trigonometry at it.

Consider a light beam coming from the right and striking the cup at a point $(x,y)$. The beam is at an angle $\theta\hspace{0.2mm}$ to the normal of the surface, so using

angle of incidence  =  angle of reflection

we know it will reflect at the same angle, as shown in the top right diagram. Using the fact that the red triangle is isosceles, the point $(x,y)$ is therefore at an angle $\theta$ from the negative $x$-axis, so we can parameterise the point as $(-\cos\theta, \sin\theta)$ as the radius of the cup is 1.

Plot of equation (3) for a few different values of $\theta$. Note that these are just the reflected rays

The slope of the reflected ray is then $-\tan2\theta$ and the equation of the line, in terms of $\theta$, is
$$ y – \sin\theta = -\tan2\theta\,(x+\cos\theta),$$
or, after some identity jiggery-pokery,
$$x\hspace{1pt}\sin2\theta+y\cos2\theta = -\sin\theta. \qquad (3)$$
A plot of a few of these curves, with different values for $\theta$, is shown in the middle right diagram and looks similar to what we see in the cup. That’s a good sign!

Differentiating the above equation with respect to $\theta$ gives
$$2x\hspace{1pt}\cos2\theta -2y\hspace{1pt}\sin2\theta = -\cos\theta. \qquad (4)$$

A plot of equation (3) with the calculated envelope drawn on

Now, eliminating $\theta$ between these would be downright disgusting, so expressing (3) and (4) in matrix form and solving for $x$ and $y$ gives
\begin{pmatrix} x \\ y \end{pmatrix}
\cos3\theta- 3\cos\theta \\
3\sin\theta – \sin3\theta
Plotting this, it matches up very nicely on one side of the cup. But, unlike the real pattern in the coffee cup, this one has an extra bit of curve! And this equation, alas, doesn’t describe the cardioid we’d hoped for—this is the equation of a nephroid. To add insult to injury, by the time I’d worked all this through, my coffee was ice cold. Yuck.

In fact, the extra bit of curve appears due to all values of $\theta$ being allowed. As the sides of the cup will block about half the light, this imposes a restriction on $\theta$, so we only see half the curve in our coffee. And a nephroid is, actually, correct—shapes like these that form when light reflects off a curved surface are called ‘caustics’, from the Greek word for ‘burnt’, as they can be used to focus sunlight to start fires. Both the nephroid and the cardioid belong to a larger family of curves called epicycloids, which are categorised according to the number of ‘cusps’ (sharp bits) that they have.

But I wanted a cardioid, dammit!

The angular setup for a point on the rim. To keep the parameterisation the same as the last case, the angles of incidence and reflection have been defined differently

If nephroids aren’t your thing, it’s possible to get a cardioid caustic in a cup if we change the setup slightly. Instead of having a point source at infinity, let’s put the point source on the rim of the cup and see what happens. The geometry of the incoming and outgoing rays is shown to the right.
This gives the equation of the line as
y\hspace{1pt}(1+\cos3\theta\hspace{1pt})+x\hspace{1pt}\sin3\theta = \sin\theta- \sin2\theta,
and differentiating, we get
-3y\hspace{1pt}\sin3\theta + 3x\hspace{1pt}\cos3\theta = \cos\theta- 2\cos2\theta.
Solving these simultaneously is a tad more fiddly than before, but working through gives the envelope as
\begin{pmatrix} x \\ y \end{pmatrix}
\cos2\theta- 2\cos\theta \\
2\sin\theta – \sin2\theta
This is a cardioid. Yay!

Reflected light beams from a point source located where the rim touches the positive $x$-axis

Additional complexities

What we see in a cup is unlikely to be exactly one of the two previous cases. If the caustic is bright enough to be visible, the light source is probably not far enough away to be at ‘infinity’, and people don’t tend to go around putting point sources on the rims of their cups. If we have a light source that’s a finite distance from the cup edge, the incident rays will be at an angle somewhere between the nephroid and cardioid cases, so the curve seen is somewhere between the two.

There is another large assumption we’ve made here that renders the situation somewhat unphysical. Our cup is a two-dimensional circle! And although that makes the maths nicer, it’s not great for holding coffee. The physical principles are the same in 3D, with an extra angle to worry about, so what you actually see in a coffee cup is the intersection of the surface in the diagram on the right with the bottom of your cup.

This surface is called the ‘cusp catastrophe’, and can be found using catastrophe theory, which, among other things, looks at the behaviour of manifolds with singularities in them.

The cusp catastrophe. Rich Morris (, CC BY-NC-SA 4.0

A change of focus

Since we can focus light using almost any process that changes its direction, reflection caustics (or catacaustics) such as the ones we have been considering here are not the only type possible. A common refraction caustic is the rippling pattern of light seen on the bottom of bodies of water, and a rainbow is a caustic caused by a combination of reflection and refraction. More exotically, gravitational forces bend space-time and therefore the light travelling through it, which means that gravitational lensing can give rise to caustics of astronomical scale. The shape of the caustic gives key information about the astronomical object, and this method has been used to identify and analyse exoplanets around distant stars.

Although these physical systems look completely different at first glance, they’re linked by a single phenomenon. The same flavour of physics that describes how the light in your morning cuppa behaves also describes the behaviour of light on ridiculously huge scales in the universe. And that’s pretty cool, don’t you think?

So, the next time you sit down to enjoy a hot beverage, take a moment to appreciate the awesome things happening, quite literally, right under our noses.


Mathematics for the three-fingered mathematician

We’re all familiar with using a couple of different bases to represent integers. Base ten for almost all purposes when we do our own calculations, and base two, or binary, for getting computers to do them for us. But there’s nothing special about ten and two. We could equally well use any integer, $b$, greater than two, so that the string of digits

$$ d_n d_{n-1} d_{n-1} \ldots d_0, $$

where each $d_i$ is positive and less than $b$, represents the integer

$$ \sum_{i=0}^n d_i b^i.$$

Some bases are slightly more convenient than others for doing arithmetic. Bases eight and sixteen are both used in various computer applications, and there is an active society, the dozenal society, devoted to using and promoting the arithmetical advantages of base twelve. Much less common, but far more interesting, is base three.

With base three, the digits are all 0, 1 or 2. But I want to look at a variation on this. Instead of using 1 and 2, I’ll use 1 and -1; but it’s not convenient to have minus signs in the middle of our numbers, so because of this and for reasons of symmetry I’ll represent them with 1 (for 1) and 1 (for -1). Base three is ternary, and this variation of it is called balanced ternary.

Continue reading


The mathematics of Maryam Mirzakhani

Maryam Mirzakhani, the first woman Fields medallist and an explorer of abstract surfaces, left us in the prime of her life. Rightly, the world press mourned her passing, but what I hope to do here is to write about the beautiful and difficult mathematics she loved working on. As a pure mathematician, she was usually driven by in-depth understanding of the different complex structures on abstract surfaces, rather than the search for application. Nevertheless, her work has been used in solving real life problems. But what I personally find fascinating about her is the courage and creativity she had in attempting and solving long standing problems and the variety of areas within mathematics she worked on; from complex geometry and topology to dynamical systems. Here is a more intuitive exposition of some of her achievements.


A tasty surface. Image: Descubra Sorocaba, CC BY 2.0

I am sure that if I asked you to give me an example of a surface you would be able to do so straight away. You might say the surface of the Earth is obviously one, and you would be right. However, defining what we mean by a surface mathematically is a little bit trickier. Let’s give it a go. A geometrical object is called a surface if, when we zoom in very closely at the points on the shape, we can see overlapping patches of the plane. If we were to use mathematical language we would say that a surface is locally homeomorphic to the plane.

A genus 1 surface

You might not find this definition particularly helpful so let us consider a few more examples. Oranges, tomatoes, apples and, for more delicious alternatives, cakes, cupcakes, ring doughnuts and pretzels are all surfaces. Well, almost! In order for them to be surfaces we need to picture them hollow (or like a balloon), rather than solid. If we consider those objects geometrically, meaning that we differentiate between different angles and size lengths, we notice that there are infinitely many of them. This is why we consider them topologically. Using continuous deformations we can turn almost all of our examples into a sphere, except the ones that have holes in them. They are considered to be in a class of their own. Thus we can classify the surfaces up to deformations (topological equivalence) by the number of holes, which we call the genus. We can see that the sphere has genus 0, the torus (ring doughnut) genus 1 and the 3-fold torus (pretzel) genus 3, thus these are all inequivalent surfaces.

Mirzakhani’s work was on Riemann surfaces. To turn a surface into a Riemann surface we need to give it additional geometric structure. For example, we can give the surface geometric structure that allows us to measure angles, lengths and area. An example of such geometry is hyperbolic geometry. It is the first example of non-Euclidean geometry; the only way it differentiates from Euclidean geometry is that given a line $\ell$ and a point $P$, that is not on the line, we can draw at least 2 distinct lines through $P$ that are parallel to $\ell$.

Parallel lines and a triangle on a hyperbolic surface

One peculiar consequence of this new axiom is that rectangles do not exist in hyperbolic geometry. Moreover, the angle sum for a triangle is always less than $\pi$. Mirzakhani’s early work was on hyperbolic surfaces, which are Riemann surfaces with hyperbolic structure. The problem with hyperbolic surfaces is that we cannot really visualise them, because the hyperbolic structure on the Riemann surface can’t be embedded in $\mathbb{R}^3$. However, we can try and describe roughly how you put the structure on the surface. Imagine our surface is made out of rubber and we can bend it and fold it in all dimensions. Now we add the hyperbolic structure, but for that we need, according to John Nash, 17 dimensions. If we next dip it in cement it becomes solid, and we can no longer stuff it into 3 dimensions, hence we can no longer visualise it completely.

On these surfaces, Mirzakhani studied special objects called closed geodesics. Roughly speaking, a geodesic is a generalisation of the notion of a straight line that we have on the Euclidean plane. We can define a geodesic more rigorously as a path between two points on the surface, whose length cannot be shortened by deforming it. For example, on the sphere, the geodesics are called great circles. These are simply the intersections of a plane going through the origin and the sphere itself.

A sphere sliced in half along a great circle

A closed geodesic is a geodesic that starts and ends at the same point. The simplest example of a closed geodesic is a circle.We also allow intersections, that is, geodesics that look, for example, like a figure-of-eight and are much more complicated. Using the hyperbolic structure on the Riemann surface, we can compute the lengths of these closed geodesics. A natural question we can ask is how many such closed geodesics are there on any hyperbolic surface of length $\leq L$? The answer was established in the 1940s by Delsarte, Huber and Selberg and it was named the prime number theorem for hyperbolic surfaces, because of the resemblance to the prime number theorem. That is, the number of closed geodesics, denoted by $\pi$, satisfies\begin{align*} \pi(X, L)\sim \mathrm{e}^L/L,\end{align*}as $L\rightarrow \infty$. Roughly speaking, the number of closed geodesics on a hyperbolic surface $X$ of length $\leq L$ gets closer to $\mathrm{e}^{L}/L$ as $L$ becomes very big. We can see that their number grows exponentially, meaning very quickly, but more importantly we also see that the formula does not depend on the surface we are on.

The next question to consider is what would happen if we no longer allow our geodesics to intersect themselves? Would our formula change much? Will the growth rate be significantly different? That is, we wish to compute the number of simple closed geodesics (simple meaning no intersections are allowed) on a hyperbolic surface $X$ of length $\leq L$, denoted $\sigma(X, L)$. In 2004, Mirzakhani proved, as part of her PhD thesis, that \begin{align*} \sigma(X,L)\sim C_{X} L^{6g-6},\quad\text{as }L\to\infty,\end{align*}where $g$ denotes the genus of the surface $X$, and $C_{X}$ is some constant dependent on the geometry (hyperbolic structure) of the surface. It is important to make clear that surfaces of a given genus can be given many different hyperbolic structures. As a consequence the number grows much slower (polynomially) but it also depends on the surface we are on. The surface may be the same, but the different structure implies that we would have different geodesics and their lengths would also be different. Whilst she was computing $\sigma(X,L)$, she discovered formulae for the frequencies of different topological types of simple closed curves on $X$. The formulae are a bit too complicated to explain here, but let us consider an example: suppose $X$ is a surface of genus 2; there is a probability of 1/7 that a random simple closed geodesic will cut the surface into two genus 1 pieces. How cool is that?!

Flight paths follow geodesics on the Earth’s surface

Even though these results are for a given hyperbolic structure, Mirzakhani proved it by considering all structures at the same time. We know that we can continuously deform surfaces of the same genus $g$ and they will be topologically the same, however geometrically they may be different. These deformations depend on $6g-6$ parameters, which was known to Riemann. We call these parameters moduli and we can consider their space, the so-called moduli space of all hyperbolic structures on a given topological surface. By definition, a moduli space is a space of solutions of geometric classification problems up to deformations. This is a bit abstract, so let us illustrate it with a simple example. Suppose our geometric classification problem is to classify the circles in Euclidean space up to equivalence. We would say that two circles are equivalent if they have the same radius, no matter where their centre lies. That is, our modulus (parameter) is the radius $r$ of the circle, and we know that $r\in\mathbb{R}^{+}$. Hence the moduli space will be the positive real numbers. So what can we do with these new spaces? Greg McShane observed that you can add a new structure; a so-called symplectic structure which, roughly speaking, allows us to measure volumes on moduli spaces. Mirzakhani found a connection between volumes on moduli spaces and the number of simple closed geodesics on one surface. She computed some specific volumes on moduli spaces and her celebrated result followed.

Dynamical systems

“Pot as many balls as you can.” Image: Curtis Perry, CC BY-NC-SA 2.0

In recent years, Mirzakhani focused her attention on dynamical systems on moduli spaces. A dynamical system is simply a system that evolves with time. Originally, dynamical systems arose in physics by looking at the movements of particles in a chamber or planets in the solar system. It was observed that these large systems are similar to smaller ones, and by studying toy models we might shed some light on the actual physical dynamical systems.

One such toy model is the dynamical system of billiard balls on a polygonal table (not necessarily rectangular). Bear in mind that in this version of billiards we only use one ball and it can travel forever on a path as long as it doesn’t reach a corner. The billiard balls will take the shortest paths, thus they travel via geodesics, and this is where Mirzakhani’s research come into play. As we know by now, she studied surfaces rather than polygons, but if you orient the edges of the table in pairs and glue them together then you can turn it into a surface.

Even though billiard dynamics might seem simple, there are difficult problems that are still unsolved. One might ask if there are any periodic billiard paths, and if so, would the answer change if we change the shape of the table $T$? This problem has been solved: it is known that there is always at least one periodic billiard path for a rational polygonal table (by a rational polygon, we mean a polygon whose angles are rational multiples of $\pi$). But what if we now ask what is the number of such periodic billiard paths of length $\leq L$ on a table $T$, denoted by $N(T, L)$? It is conjectured that the following asymptotic formula holds.

\begin{align*} N(T, L)\sim \frac{C_T L^2}{\pi \text{ Area}(T)},\end{align*}where $C_T$ is some constant depending on the table. Alongside Alex Eskin and Amir Mohammadi, Mirzakhani made some progresstowards this result. They showed that $\lim_{n\to\infty}N(T,L)/L^2$ exists and is non-zero. Mirzakhani’s work unfortunately does not provide a solution, however she brought progress by showing that this number satisfies the following asymptotic formulaMoreover, she showed that for the asymptotic formula to even exist in the form above, there exist only countably many numbers $C_T$. Recently, Mirzakhani and Eskin’s work on billiard paths was applied to the sight lines of security guards in complexes of mirrored rooms.

Another example of her impact on dynamical systems is her work on Thurston’s earthquake flow. Suppose we have a Riemann surface $X$ of genus $g$, a simple closed geodesic $\gamma$ on $X$ and a real number $t$. Then we obtain a new Riemann surface $X_t= \text{tw}_{\gamma}^t(X)$ by cutting $X$ along $\gamma$, twisting it to the right by $t$ and re-gluing. Then we can define the flow at time $t$ to be

\begin{align*} \text{tw}^{t}(\lambda, X\hspace{0.3mm})=(\lambda, \text{tw}_{\lambda}^t(X\hspace{0.3mm}))\end{align*}

where $\lambda$ is geodesic lamination. The definition of geodesic lamination is quite technical, so in this article we can simply think of it as a disjoint collection of simple geodesics on $X$. Intuitively, we have a dynamical system like the movement of planets in time $t$, but in our case the objects that move with time are moduli spaces.

We get some sort of periodicity, because if $\gamma$ has length $L$, then $X_{L+t}=X_t$. Mirzakhani showed something truly remarkable: The earthquake flow is ergodic.
This means that if we follow the laminations along we would be very close to any point on the surface with probability 1. This came as a surprise, because until then there was not a single known example showing that the earthquake paths are dense.

Maryam Mirzakhani by Mehrzad Hatami

She might not have always wanted to be a mathematician, aspiring to be a novelist when she was younger, but she left a big mark on mathematics. As the first Iranian and first woman to win the Fields medal, I believe she has been an inspiration to many young girls and women, including me, to go into research and be optimistic when solving problems because the “beauty of mathematics only shows itself to the more patient followers”.


Biography of Sophie Bryant

In 1884, Sophie Bryant’s paper, On the ideal geometrical form of natural cell structure, was published by the London Mathematical Society (LMS). It was ambitious, logical and descriptive: it looked at the phenomenon of the honeycomb.

Her insight was that the complex and beautiful honeycomb shape was a product of the natural activity of bees. All that was needed was for each bee to excavate its own cell at approximately the same rate as the others, and to use the excavated material to build up the walls of its cell. Bryant’s conclusion, that elongated rhombic semi-dodecahedra are the natural form of honeycomb cells, had been observed by Kepler.

A rhombic semi-dodecahedron can be made by putting square-based pyramids on the faces of a cube

In the eighteenth century, it was believed that the honeycomb was the most efficient cell shape possible, but this is now known not to be the case. In 1964, the Hungarian mathematician Fejes Tóth observed in his paper, What the bees know and what they do not know, that there are in fact more efficient cell shapes which have yet to be determined.

Kepler conjectured in 1611 that no packing of balls of the same radius in three dimensions has density greater than the face-centred cubic packing—the cannonball packing—with a density of about 74%. Bryant’s paper assumed this conjecture to be true, as it had appeared obvious for centuries and many had attempted proofs.

Cannonball packing. Image: Wikimedia Commons user Greg L, CC BY-SA 3.0

The conjecture was eventually proved by Hales et al in 1998. Their computer-assisted proof was so huge that it took 12 referees to check
it. After five years, the referees said that they were 99% sure that the proof was correct. Unusually, Annals of Mathematics published the paper in 2005 without complete certification from the referees. It was finally accepted as proven in 2014, and then only with the aid of massive amounts of computer time.

Bryant’s approach to the subject was not unusual at that time. Abstract proofs, so essential to us now, were not as common as general discussion of mathematical phenomena. She wrote “The form of a natural structure is a logical result of its mode of genesis, and that form is ideal of which the mode of genesis is perfectly regular”. She states that there are only three possible arrangements without explaining why these are the only ones.

Bryant’s paper is notable since it is the first published paper written by a woman member of the LMS. However, she was not the first woman to be elected to membership, being preceded by two remarkable women, Charlotte Angas Scott and Christine Ladd Franklin.

Charlotte Angas Scott

Though Bryant was the first woman member of the LMS to publish a paper, she was not the first woman member. That honour goes to Charlotte Angas Scott (1858—1931), an algebraic geometer, who became a member in 1881.

Scott had been aided in her mathematical education by an enlightened father. This resulted in her obtaining a scholarship to Girton College, Cambridge. However, women in Cambridge were not granted degrees until 1948 and she had to be content with the accolades of her peers.

She was appointed a lecturer at Girton and received an external BSc degree from London University, and later a doctorate. Scott moved to the newly opened Bryn Mawr College for women in the USA where she was appointed head of mathematics, and remained for forty years.

Being first

Being the ‘first woman’ was not unusual for Bryant. She was the first woman to receive a DSc degree in England, studying what was then mental and moral philosophy, but today would be referred to as psychology and ethics. She was also one of the first three women to be appointed to a Royal Commission—the Bryce Commission on Secondary Education in 1894–95—and she was one of the first three women to be appointed to the senate of London University.

While on the senate she advocated setting up a day training college for teachers, which eventually became the Institute of Education. Later in 1904, when Trinity College, Dublin opened its degrees to women, Bryant was one of the first to be awarded an honorary doctorate. In Cambridge, she was also instrumental in setting up the Cambridge Training College for Women which eventually became Hughes Hall, the first postgraduate college for women in Cambridge.

She was also, it seems, one of the first women to own a bicycle.

Beginnings and early widowhood

Bryant was born in Ireland, and was fortunate to learn mathematics as well as other academic subjects with her five siblings in a very natural way from their father, the Rev WA Willock DD. A keen educationalist, he had been a fellow and tutor at Trinity College, Dublin and had gained high honours in mathematics and mental sciences.

When Bryant was about thirteen, her family moved to England and her family education continued until she attended Bedford College, where she was awarded the Arnott scholarship for science in 1866. She sat the Cambridge local examination for girls in 1867 and was the only one to be placed in the first class of the senior division.

In 1869, Bryant married the surgeon Dr William Hicks Bryant, only to be widowed the following year when he died of cirrhosis at the early age of 30.

Christine Ladd Franklin

The second woman member of the LMS, who also joined in 1881, was Christine Ladd Franklin (1847—1930), an American mathematical logician.

Though Johns Hopkins University was not open to women, UCL’s JJ Sylvester, then professor of mathematics, urged not only that she be admitted, but arranged for her to do graduate work under his supervision and to be granted a fellowship. As Johns Hopkins did not award degrees to women, she left without a PhD for her dissertation on symbolic logic.

She was finally awarded a PhD by Johns Hopkins forty-four years after she submitted her dissertation, when she was seventy-eight years old.

Schoolteacher and doctorate

After a short interval, Sophie Bryant returned to her studies. While she had been sitting her examinations, she was introduced to Frances Buss, the headmistress and founder of North London Collegiate School (NLCS), an excellent school then and still highly regarded today. It had been founded in 1850, the year of Bryant’s birth.

Bryant arranged to meet Buss who, in 1875, invited her to teach mathematics at NLCS and encouraged her to take a training course as well. Three years later, London University opened its degrees to women. As Bryant had not had a conventional education, she had to learn Latin and biology to matriculate before she could sit for her degree. In 1881, she earned a BSc degree, gaining a first class in mental and moral science and second in mathematics.

In 1884, she received a science doctorate. The NLCS, where she had continued to teach, presented her with scarlet doctoral robes. Bryant was influential in improving the education system and introduced a scheme of enlightened and serious study.

In 1885, Buss died and Bryant became the headmistress of NLCS until her retirement.


Bryant receiving her
doctorate. Image: North London Collegiate School

Meanwhile, she continued to publish ambitious papers. In her 1884 paper in Mind, The double effect of mental stimuli; a contrast of types, Bryant attempted to analyse the difference between reflex actions, which are performed without conscious thought, and consciously controlled actions.

She was grappling with a contemporary problem: the understanding of consciousness. Unfortunately, her arguments are too diffuse to shed much light on the problem.

In 1885, she published a paper in the Journal of the Anthropological Institute, Experiments in testing the characters of school children. This study, undertaken at the suggestion of Francis Galton, produced an early account of the use of open-ended psychometric tests to deduce personality types. Bryant claimed that her results agreed with the observations of teachers familiar with the children but did not provide any supporting evidence. Despite incomplete analysis, this was a pioneering study.

Later life

Bryant was interested in Irish politics, and wrote books on Irish history and ancient Irish law. She was an ardent Protestant Irish nationalist and was active in the Home Rule movement, which pressed for Irish self-government within the United Kingdom. She wrote on women’s suffrage in 1879 but later advocated postponement until women were better educated in politics.

She enjoyed mountain climbing and she summited the Matterhorn twice. Her death in 1922 was both tragic and unexpected. Only four years after retirement, she was on a mountain hike near Chamonix, in France, when she went missing. Her body was found thirteen days later with several injuries.


Although Bryant’s direct contribution to mathematical scholarship was not substantial, her influence as a teacher and educationalist was immense. The rising number of women mathematicians today is a lasting tribute to her work.


Geographic profiling

Imagine you’re a police officer working on a huge case of serial crime. You’ve been handed the list of suspects, but to your horror 268,000 names are on it! You need to come up with a way of working through this list as efficiently as possible to catch your criminal. Along with the thousands of names, you’re also given a map with the locations of where bodies have been found (the map above). Given these two pieces of intel, how exactly would you prioritise your list of suspects? Have a go! Where exactly would you search for the criminal? We will reveal the answer at the end of article!

Peter Sutcliffe, also known as the Yorkshire Ripper, was the name on a list of 268,000 suspects generated by this investigation in the late 1970s. But how were the team investigating these crimes meant to cope with such an overload of information? These are the fundamental problems that geographic profiling is trying to solve.

How exactly does geographic profiling work? This article will introduce you to the fundamental ideas behind the subject. We will also look at the various applications, just like the Yorkshire Ripper case, along the way. These examples aren’t just in criminology though. The applications span ecology and epidemiology too!

The first model

Geographic profiling uses the spatial relationship between crimes to try and find the most likely area in which a criminal is based; this can be a home, a work place or even a local pub. Collectively we refer to these as anchor points. The pioneer of the subject, Kim Rossmo, once a detective inspector but now director of geospatial intelligence/investigation at Texas State University, created the criminal geographic targeting model in his thesis in 1987. The criminal geographic targeting model aims to do exactly what we struggled with at the beginning of this article: prioritise a huge list of suspects.

A gridded-up map. Alistair Marshall, CC BY 2.0

It starts by breaking up your map, populated with crime, into a grid, much like on the left. We assume that each crime that occurs, does so independently from every other. We then score each grid cell; the one with the highest score is likeliest to contain the criminal’s potential anchor point.

How do we calculate this score? An important factor is the distance between crimes and anchor points. We choose to use the Manhattan metric as our measure of distance. In this metric, the distance between points $\boldsymbol{a}$ and $\boldsymbol{b}$ is the sum of the horizontal and vertical changes in distance. This is written as:
$$d(\boldsymbol{a},\boldsymbol{b}) = \lvert x_a-x_b \rvert + \lvert y_a-y_b\rvert, \qquad \boldsymbol{a} = (x_a, y_a), \quad \boldsymbol{b} = (x_b, y_b).$$

The Manhattan metric is so-called because it resembles the distance you have to travel to get between two points in a gridded city like Manhattan.

This is the most suitable metric for our work, but it’s worth noting there are more that can be used (depending on the system you’re studying). Now we could just start searching at the spatial mean of our crimes and work radially outward from that point, however one rogue crime occurring far away from the rest could easily throw a spanner in the works. Instead we use something called a buffer/distance decay function.

$$ f(d) =
\dfrac{k}{d^{h}}, & d > B \\
\dfrac{kB^{g-h}}{(2B-d)^g}, & d\leq B\\

A criminal isn’t likely to commit a crime close to an anchor point, out of fear of being recognised, so we place a buffer around it. In addition, to commit a crime far away from home is a lot of hassle, so the chance of a crime decays as we move away from the anchor point. This is why our buffer/decay function looks a bit like a cross-section of a volcano. The explicit function, $f(d)$, is written on the right, where $k$ is constant, $B$ is the buffer zone radius and $g$ and $h$ are values describing the criminal’s attributes, eg what mode of travel they use. With our distance metric, $d$, and buffer/decay function, $f$, we are now able to compute a score for each grid cell.

For $n$ crimes, the score we give to cell $\boldsymbol{p}$ is
$$ S(\boldsymbol{p}) = \sum_{i=1}^{n}f(d(\boldsymbol{p}, \boldsymbol{c}_i)), $$
where $\boldsymbol{c}_i$ is the location of crime $i$. So finally we have a score for each grid cell and we can prioritise our list!

An example of the geographic profile created using the criminal geographic targeting model

Plotting these scores on the $z$-axis produces a mountain range of values, like on the right. We can now prioritise by checking residencies at the peak of this mountain range and working our way down. Notice the collection of peaks around a particular area: this gives us an indication that perhaps the criminal uses more than one anchor point.

An important question: how can we be sure this even works? Does it really identify anchor points efficiently? What do we even mean by “efficient”? This is answered with a quantity called the hit score. This is

$$\text{hit score} = \frac{\text{number of grid cells searched before finding the criminal}}{\text{total number of grid cells}}. $$

So ironically, the lower our hit score, the better our model performs. This is sensible, since we want to search as little space as possible to catch our criminal.

The Gestapo case

Otto and Elise Hampel distributed hundreds of anti-Nazi postcards during the second world war. The Gestapo’s intuition on where the Hampel duo might live was based on themes almost exactly the same as geographic profiling. Inspired by a classic German novel, Alone in Berlin, our group revisited the Gestapo investigation and published our findings in a journal that is so highly classified we are not able to read it.

By analysing the drop-sites of the postcards and letters we were able to show that geographic profiling successfully prioritises the area where the Hampels lived in Berlin. Crucially, this study actually showed the importance of analysing minor terrorism related or subversive acts to identify terrorist bases before more serious crimes occur.

A statistical approach

The criminal geographic targeting model is an incredibly useful tool and is used to this day by the CIA, the Metropolitan Police and even the Canadian Mounted Police. Mike O’Leary, professor at Towson University, Maryland asked why the criminal geographic targeting model only produces a score, when we require a probability. So he developed a way of using geographic profiling under the laws of Bayesian probability.

Bayes’ rule is better in neon. Image: Wikimedia Commons user Mattbuck, CC BY-SA 3.0

O’Leary uses Bayes’ rule as seen on the right. How do we apply it to criminology? We want to know: what is the probability that an offender is based at an anchor point given the crimes they have committed? Using Bayes’ rule, instead we pretend we know where the anchor point is and ask; what is the probability of the crimes occurring given our anchor point? We use the formulation
$$\Pr(\boldsymbol{c}_1, \boldsymbol{c}_2, \boldsymbol{c}_3, \boldsymbol{c}_4\text{…}\;|\;\boldsymbol{p})\; = \;\prod_{i=1}^{n}\Pr(\boldsymbol{c}_i\;|\;\boldsymbol{p}),$$
where the equality derives from the assumption of independent crimes.

Below, we can see a comparison between Rossmo’s criminal geographic targeting model and O’Leary’s simple Bayesian model. The problem with O’Leary’s model is he assumes that a criminal only has one anchor point. Unfortunately this is rarely the case. As we mentioned earlier, an anchor point could be a home, a workplace, a local pub or even all of the above. So we obtain a probability surface, but we only consider one anchor point. The criminal geographic targeting model entertains the idea that multiple anchor points exist, but doesn’t give us an explicit probability. What we really need is a way of combining both methods. Does such a method exist?

(a) The criminal geographic targetting model

(b) The simple Bayesian model

Examples of the geographic profiles created using the criminal geographic targeting and simple Bayesian models

The elusive tarsiers

Image: Callum Pearson

South-east Asia, specifically Sulawesi, houses a huge number of endemic species. Often habitat assessments of cryptic and elusive animals such as the tarsier (right) are overlooked, primarily due to the difficulties of locating them in challenging habitats. Traditional assessment techniques are often limited by time constraints, costs and challenging logistics of certain habitats such as dense rainforest.

Using only the GPS location of tarsier vocalisations as input into the geographic profiling model we were able to identify the location of tarsier sleeping trees. The model found 10 of the 26 known sleeping sites by searching less than 5% of the total area (3.4 km$^2$). In addition, the model located all but one of the sleeping sites by searching less than 15% of the area. The results strongly suggest that this technique can be successfully applied to locating nests, dens or roosts of elusive animals, and as such be further used within ecological research.

The best of both worlds

The Dirichlet process mixture model is the best of both the criminal geographic targeting and the simple Bayesian models. So far we’ve only stated that we’re either working with one anchor point, or many. The beauty of the Dirichlet process mixture model is that we don’t need to specify the number of anchor points we are searching for. Instead, there is always some non-zero probability that each crime comes from a separate anchor point. So multiple anchor points can be identified while using a probabilistic framework. Introducing multiple anchor points is challenging since we need to know:

  1. How are all the crimes clustered together?
  2. In each cluster of crimes, where is the anchor point?

Actually, what would be really useful is if we knew the answer to just one of these questions. If we knew how the crimes were clustered, finding the anchor points is easy (we use the simple Bayesian model to find the source in each cluster). But also, if we knew where the anchor points were, allocating crimes to clusters is easy (and of course we know where our criminal lives!). The solution to this problem is to use something called a Gibbs sampler. We use a Gibbs sampler in cases where we want to sample a set of events that are conditional on one another. In our case, anchor point locations depend on the clustering of crimes, but the clustering of crimes also depends on the anchor point locations. The steps the Gibbs sampler will take are:

  1. Randomly assign each crime an anchor point (even though we don’t yet know where the anchor points are).
  2. Find each anchor point by using the simple Bayesian model on each assignment.
  3. Throw out the assignments of crimes to anchor points and now re-assign crimes but using the locations found in previous step. Throw out the old anchor point locations and find new ones using this new assignment.
  4. Repeat steps 3 and 4 many, many times.

This produces a new profile like on the right below. We can now compare this to our other two models on the left. We can see the Dirichlet process mixture model displays fewer peaks than the criminal geographic targeting model, but that these peaks are tighter. This in turn will reduce the hit score of our search.

(a) The criminal geographic targetting model

(b) The simple Bayesian model

(c) The Dirichlet process mixture model

A comparison of the three main geographic profiling models

The malaria case

Water bodies with mosquito larvae. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0

Throughout history, infectious diseases have been a major cause of death, with three in particular (malaria, HIV and tuberculosis) accounting for 3.9 million deaths a year. Targeted interventions are crucial in the fight against infectious diseases as they are more efficient and, importantly, more cost effective. They are even more crucial when the transmission rate is strongly dependent on particular locations. For example, we were tasked with finding the source(s) of malaria outbreaks in Cairo by considering the breeding site locations of mosquitos.

All accessible water bodies within the study area were recorded between April and September 2005, and 59 of these were harbouring at least one mosquito larva. Of these 59 sites, seven tested positive for An. sergentii, well-established as the most dangerous malaria vector in Egypt. Using only the spatial locations of 139 disease case locations as input into the model, we were able to rank six of these seven sites in the top 2% of the geoprofile.

Applying the method

The geoprofile associated with the Yorkshire Ripper body dump sites (black dots). The anchor points of Peter Sutcliffe are labelled as red squares. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0

We’ve done it! We now have a robust method for searching for our criminal. A list of 268,000 suspects is no longer so intimidating. Without this technique in 1975-1981, however, there was a lot more work for the team investigating the Yorkshire Ripper case. On top of a huge list of suspects, 27,000 houses were visited and 31,000 statements were taken during the investigation.

If we apply our model to the crime sites we were given at the start of this article, we produce the contour map on the right. In this case the areas in white describe the highest points on our probability surface, whilst areas in red describe the lowest. In addition to the contours, we also see two red squares right at the top of the map. These are the two homes Peter Sutcliffe resided at during the period of his crimes. The hit scores for his two residences are 24% and 5% respectively. So by searching only 24% of our total search area, we’ve managed to find both residences. This is far better than a random search which would find them after searching, on average, 50% of our area.

Peter Sutcliffe’s homes are clearly marked on this map but we must remember an important point about geographic profiling: that it is not an ‘X marks the spot’ kind of model, but rather a method of prioritisation.

Investigating an old case

Dramatic scenes covered the newspaper front pages, such as this from 1888. Image: The Illustrated Police News

We can’t talk about the Yorkshire Ripper without mentioning the notorious 1888 London serial killer, Jack the Ripper. The five locations around Whitechapel where bodies were dumped were studied using geographic profiling to try and gain a better idea of where Jack the Ripper may have lived.

The map overleaf shows us the associated geoprofile, with Jack’s suspected anchor point obtaining a hit score between 10-20%, much better than a non-prioritised search!

This is just one example of many cases where we can utilise our new model to study cases from the past where such tools were not available.

Geographic profiling began in criminology, but now spans ecology (catching invasive species) and epidemiology (identifying sources of infectious disease) too. This means saving a hefty chunk of time and money, as well as developing prevention strategies to minimise any negative impacts these problems may cause.

The geoprofile associated with the body dump sites (black dots) of Jack the Ripper. Jack’s anchor point (the red square) is suspected to be around Flower and Dean Street. Image: © OpenStreetMap contributors. Cartography CC BY-SA 2.0


Roots: Blaise Pascal

The influence of Blaise Pascal is most keenly felt in his work on probability and the binomial theorem, illustrated by the famous Pascal’s triangle. It cannot be denied that Pascal’s triangle is a thing of mathematical beauty. However, this array of numbers was not discovered by its namesake, rather its applications and importance were highlighted by Pascal in his work, akin to Pythagoras’ theorem, which was certainly not invented by Pythagoras himself. However, Pascal’s legacy to mathematics goes further than the instantly recognisable triangle …   Continue reading


On the cover: Euclidean Egg III

Throughout my life I have made an informal study of natural phenomena, through drawing or just looking, in a spirit of curiosity. This long but unsystematic practice has given me an impression of the world around us as a dynamic and fertile system, driven by a ubiquitous tendency for spontaneous pattern formation (best understood in terms of the laws of physics) mitigated by an equally strong tendency for seemingly random variation.

It could be argued that the evolutionary process itself is driven by this tension between pattern and randomness, structure and chaos, order and disorder, theme and variation; without random mutation there would be stasis. Continue reading


Pretty pictures in the complex plane

Some of the greatest works of art in history have been produced by mathematicians. One fascinating source of mathematical artwork is fractals: infinitely complex shapes, with similar patterns at different scales. Fractal geometry has dramatically altered how we see the world. Technology has many uses for fractals, one of which is the production of beautiful computer graphics. These pretty pictures are used to present a large amount of information about a function in a clear and comprehensible manner,  and the simplicity of the maths involved in producing these pictures is fascinating.

Pretty pictures in the $z$-plane are widely used as computer graphics, book covers and even sold as works of art.

Modern art studies have often been dismissive of the power of beauty in mathematics, with the idea that “beauty is not in itself sufficient to create a work of art”. Mathematics produces rigid and inflexible answers, whereas art is free-moving and open to interpretation. However, it is undeniable that these pretty pictures demonstrate true beauty, not only in the images but also in the mathematics behind them.

The mathematics behind pretty pictures

Extremely simple functions can be used to produce these pictures. For example, let’s consider the quadratic function $f\hspace{0.4mm}(x\hspace{0.3mm})=x\hspace{0.3mm}^2+c$, for some constant $c$. An iterative method is applied to the function. First, a seed (let’s call it $x_0$) is selected to be the initial value for iteration. The solution of the function is then subsequently recycled as the new input value, $x$. In this way:

x\hspace{0.3mm}_1&=x\hspace{0.3mm}_{0}^2 + c,\\
x\hspace{0.3mm}_2&=x\hspace{0.3mm}_{1}^2 + c = (x\hspace{0.3mm}_{0}^2 + c)^2 +c,\\
x\hspace{0.3mm}_3&=x\hspace{0.3mm}_{2}^2 + c = \cdots\\
\text{and in general, }x\hspace{0.3mm}_n&=x\hspace{0.3mm}_{n-1}^2+c.

We continue until the iteration either converges to a fixed point or cycle, or diverges to infinity. The orbit is the sequence of numbers generated during the process of iteration: $x_0,x_1,x_2,x_3,\ldots,x_n$. If we only apply real numbers to the quadratic function we limit the graphical representation of the iterations to a line. To produce pictures in the plane, we use complex numbers instead.

The abundant beauty in the plots is somehow increased when the simplicity of the mathematics is understood.

Through the process of iteration, each seed will either converge or diverge, and so for a given function we can divide the plane into an escaping set $E_c =\{ z_0 : |z_n| \rightarrow \infty \, \mbox{as} \, n \rightarrow \infty \}$ (that is, all the seeds that end up at infinity) and prisoner set, where the iteration tends to a point or becomes periodic.

The Julia set of a function

To go from the iterative procedure described above to the vivid images to the right, we need to introduce the idea of the Julia set of a function, named after the French mathematician Gaston Julia. Julia was an extraordinary man, who tragically lost his nose while fighting in the first world war. Despite the substantial injury, he made immense progress in the field of complex iteration and published the book Mémoire sur l’itération des fonctions rationnelles in 1918, which began the study of what we now call a Julia set.

The filled-in Julia set is the collection of points in the complex plane that form the prisoner set of a function, while the Julia set itself is the boundary of this region. The points within the filled-in Julia set remain bounded under the iteration since their orbits converge to an attracting point or cycle.

Connected and unconnected Julia sets of the quadratic function for different values of $c$

Conventionally, when pictures of the Julia set are shown, the filled-in Julia set is shaded black and varying colours are used to show the rate at which the escaping set diverges to infinity. The Julia set is therefore the edge of the black region. Maps 1–7 above show the Julia sets of the quadratic function for different values of $c$, with the escaping set colour-coded as follows: red areas represent points that slowly escape from the set, while blue areas signify points that quickly escape to infinity. The value of the complex constant $c$ influences the shape of the Julia set.

Maps 1, 4 and 5 all have black centres, which indicate that the Julia set is connected, while maps 3, 6 and 7 demonstrate unconnected sets. For these images, the Julia sets have no black regions and instead the pictures are just flurries of colour. It is not always easy to spot whether a Julia set is connected, however. In map 2, there is no obvious black region, but neither are there colourful individual flurries and instead we see a spiky line. In fact the set is connected, it is just that the filled in Julia set is so slender that the black line points are not visible in the image.

During the initial study of these sets, a fascinating criterion for connectivity was discovered concerning the critical point, $z_0=0$. If the critical point is used as the seed, we produce the critical orbit, which is bounded if and only if the Julia set is connected.

Fractal patterns appear in all plots, apart from when $c=0$ or $-2$. The picture below displays examples of magnified sections of the fractals, for $c=-0.7$ (maps 9–12), $c=-0.12 + 0.75 \,\mathrm{i}$ (maps 13–16), $c=0.1 + 0.7 \, \mathrm{i} $ (maps 17–20) and $c=-0.1 + 1 \, \mathrm{i} $ (maps 21–24). Each enhancement of a section produces what appears to be copies of the whole section, not just in overall shape but also with smaller embellishments on every “limb”. For connected plots, these fractals appear as loopy ovals and circles or thin, almost stick-like, sections. For disconnected plots, however, the fractals are grouped together in intricate floral patterns, revealing the same shape and pattern with each level of magnification.

Magnified sections of fractals for different values of $c$

Prior to computer technology, Julia had to rely on his imagination and manually carry out the iterations by hand. Fifty years later, another mathematician applied modern computing power to plot these pretty pictures, finally showing the sets in all their beauty…

The Mandelbrot set

The Mandelbrot set is named after the Polish mathematician Benoit B Mandelbrot, known for being the founder of fractal geometry. The word fractal is derived from the Latin fractus, which means broken, and describes the shape of a stone after it has been smashed.

Mandelbrot discovered that fractals appear not only in mathematics but also in nature, through crystal formation, the growth of plants and landscapes, as well as in the structure of the human body. In 1945, Mandelbrot read Julia’s 1918 book. He was fascinated and, with the aid of computer graphics, was able to show that Julia’s work contained some of the most beautiful fractals known today.

To create the Mandelbrot set, each complex value of $c$ is used as the constant term in the quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.3mm}^2+c$ and iterated with the critical point $z_0=0$ as the seed. If the orbit escapes to infinity, the number of iterations taken for the modulus of the function to exceed a specified value is used to decide on the colour of the map at that point, $c$. Otherwise, when the orbit converges, the point is coloured black. The Mandelbrot set is the set of black points.

For example, if we let $c=-0.15+0.3 \, \mathrm{i}$ then we have the complex quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.3mm}^2-0.15+0.3 \, \mathrm{i}$. We start with $z_0=0$ as the seed and the sequence of iteration (to 5 significant figures) is as follows:

z_1&={0}^2 -0.15 +0.3 \, \mathrm{i} &&\Rightarrow &z_1&= -0.15 +0.3 \, \mathrm{i},\\
z_2&=(-0.15+0.3 \, \mathrm{i})^2-0.15+0.3 \, \mathrm{i} &&\Rightarrow &z_2&= 0.2175 +0.21 \, \mathrm{i},\\
&&&&z_3&=-0.14679+0.20865 \, \mathrm{i},\\
&&&&z_4&=-0.17199+0.23874 \, \mathrm{i},\\
&&&&z_5&=-0.17742+0.21788 \, \mathrm{i}.

Continuing to 30 iterations, the orbit has not escaped to infinity and instead converges to the point   $z=-0.17082+0.22361\, \mathrm{i}$ (again to 5 significant figures). Therefore, $c=-0.15+0.3 \, \mathrm{i}$ is within the Mandelbrot set and is coloured black.

On the other hand, if we take $c=-1.85+1.2 \, \mathrm{i}$, and hence the complex quadratic function $f\hspace{0.3mm}(z\hspace{0.2mm})=z\hspace{0.4mm}^2-1.85+1.2 \, \mathrm{i}$, then the sequence of iterations (to 5 sf) is as follows:

z_1&={0}^2 -1.85 +1.2 \, \mathrm{i} &&\Rightarrow &z_1&= -1.85 +1.2 \, \mathrm{i},\\
z_2&=(-1.85 +1.2 \, \mathrm{i})^2-1.85 +1.2 \, \mathrm{i} &&\Rightarrow &z_2&= 0.1325 -3.24 \, \mathrm{i},\\
&&&&z_3&= -12.33+0.3414 \, \mathrm{i},\\
&&&&z_4&= 150.06 – 7.2189 \, \mathrm{i},\\
&&&&z_5&= 22465 – 2165.4 \, \mathrm{i}.

The characteristic segments of the Mandelbrot set

If the modulus of $z$ exceeds 100, then it has been proven that the orbit escapes to infinity. This occurs on the fourth iteration, so the colour chosen to represent the value of 4 would be plotted at the point $(-1.85,1.2)$ in the complex plane. The resulting image is shown in map 8, and also in the picture to the left.

The largest segment of the set is called the cardioid due to its heart-like shape. Attached to this are adornments called bulbs, upon closer inspection of which it is possible to see many smaller, somewhat similar, embellishments. The bulbs are not completely identical, although most exhibit a similar shape, and the main differences can be seen in their filaments. The filaments are the thin strings of bounded points that sprout like sticks from the tops of the bulbs. These sticks are extremely narrow and they appear to be coloured red, which would indicate they are not part of the set. However, if we were to zoom in closer on these regions, we would actually see black lines!

The self-similarity of the Mandelbrot set

The Mandelbrot set is self-similar, consisting of miniature Mandelbrot sets within the boundary of the largest set. By enhancing the filaments, smaller copies of the overall set appear in ‘Russian-doll’ like fashion, as seen in maps 26–30 above. Closer inspection of map 27 shows many more self-similar sets within the filaments around the perimeter of the Mandelbrot set. Magnifying the small copies of these Mandelbrot sets would yield infinite layers of self-similar sets.

Other fascinating and intricate shapes occur, for example the “seahorse valley” that is visible in maps 31–34 above. By enhancing the plot within this region we see two rows of seahorse shaped embellishments, each with “eyes” and “tails”. Further magnification of the “eyes” reveals spiral constellations of more “seahorses”.

Connection between Julia sets and the Mandelbrot set

The orbit of the critical point $z_0=0$ can be used to test the connectivity of the Julia set, and the Mandelbrot set shows the boundedness of these critical orbits. Hence, the Mandelbrot set itself indicates the connectivity of the Julia sets of all the different complex quadratics. The Mandelbrot set can be described as $M = \{ c \in \mathbb{C} \, | \, J_c \, \mbox{is connected}\} $, where $J_c$ is the Julia set of the function $z\hspace{0.3mm}^2+c$. The Julia set is a connected structure if $c$ is within the Mandelbrot set, and will be broken into an infinite number of pieces if $c$ lies outside the Mandelbrot set.

The cardioid-shaped main body contains all values of $c$ for which the Julia set is roughly a deformed circle (figure below: maps 35, 37, 38 and 40). The values of $c$ which lie in a bulb of the Mandelbrot set produce a Julia set consisting of multiple deformed circles surrounding the points of a periodic attractor. The number of subsections sprouting from a point on the Julia set is equal to the period of the bulb in the Mandelbrot set (below; maps 36, 39, 41–44).

A specific Julia set can be defined by a point in the Mandelbrot set

The nature of the convergence of points within the Mandelbrot set depends on the segment in which the point resides. Seeds within the cardioid converge to an attractive point, whereas orbits starting in the bulb lead to an attracting cycle.

Three particularly interesting cases of Julia sets are shown below. The first is when $c=0$, where the filled-in Julia set comprises of all the values within the unit circle (circle of radius 1, centred on the origin) and each of these points converges to $0$ when iterated. The Julia set is the boundary of the circle, the points of which, when iterated, remain on the boundary.

Three remarkable examples of the Julia set with $c=0$, $c=\mathrm{i}$ and $c=-2$

The second interesting case is when $c=\mathrm{i}$. Here, the Julia set is a dendrite, meaning there are no interior points. Instead, the set is just a branch of points. For this complex constant the dendrite is a single line in an almost lightning-bolt shape. The final case is $c=-2$, where the Julia set is a dendrite that lies directly on the horizontal axis between  $-2<x<2$.

Explore the sets yourself

I hope to have displayed the beauty behind these pictures by emphasising the extraordinary quantity of information contained in such a simple procedure, as well as through highlighting the complexity of each image, in the variety of fractals and colours visible, which further enhances the beauty.

If this article has sparked an interest in fractals, then why not try exploring these sets for yourself? You could do this by magnifying different sections of the Mandelbrot set to explore the countless shapes and patterns that exist within. You could also go deeper into exploring individual orbits.

All of these pictures are generated using simple quadratic formula. However, the Julia and Mandelbrot sets can be produced for a wide variety of functions in a similar manner to obtain countless pretty pictures.

These images are already becoming dated, having been taken for granted for so many years since they were first produced on the big bulky computers of the 1980s. The Julia set of the quadratic function, and the corresponding Mandelbrot set, could be inspiration for pretty pictures which are yet to be fully explored, or even discovered. Largely, the discoveries discussed here have been recorded in recent years. Furthermore, there could still be vast amounts of information within these sets that are yet to be discovered. Could you be the one to make a discovery?


  1. Franke, H. (1986). Refractions of Science into Art. In: H. Peitgen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Springer-Verlag, pp.181-187.
  2. Mandelbrot, B. (2004). Fractals and chaos. 1st ed. New York: Springer.
  3. Peitgen, H., Jurgens, H. and Saupe, D. (1992). Fractals for the Classroom: Part 2: Complex Systems and Mandelbrot Set. 1st ed. New York: Springer-Verlag, pp.353-473.
  4. Hall, N. (1992). The New Scientist guide to chaos. 1st ed. London: Professional Books.
  5. Douady, A. (1986). Julia Sets and the Mandelbrot Set. In: H. Peitgen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Springer-Verlag, pp.161-173.
  6. Fraser, J. (2009). An Introduction to Julia Sets. 1st ed. [ebook] Available at: [Accessed 4 Apr. 2017].
  7. Peitgen, H., Jurgens, H. and Saupe, D. (1992). Fractals for the Classroom: Strategic Activities Volume two. 1st ed. New York: Springer-Verlag.
  8. Devaney, R. (2006). Unveiling the Mandelbrot set | [online] Available at: [Accessed 4 Apr. 2017].
  9. Moler, C. (2011). Experiments with MATLAB. 1st ed. [ebook] MathWorks, p.Chapter 13 Mandelbrot Set. Available at: [Accessed 20 Apr. 2017].
  10. Peitgen, H. and Richter, P. (1986). The Beauty of Fractals. 1st ed. Berlin: Springer-Verlag.
  11. Mandelbrot, B. (1986). Fractals and the Rebirth of Iteration Theory. In: H. Petigen and P. Richter, ed., The Beauty of Fractals, 1st ed. Berlin: Spring-Verlag, pp.151-160.
  12. (2017). Illustrating Three Approaches to GPU Computing: The Mandelbrot Set – MATLAB & Simulink Example – MathWorks United Kingdom. [online] Available at: [Accessed 20 Apr. 2017].


Pretend numbers

Some years ago when one of my nieces was quite small, she loved to play a guessing game with the family. She would think of a number (for us that means a whole positive number) and then we we would ask her questions with yes/no answers, trying to guess what her number was. For example, we could ask ‘is it even?’ or ‘is it bigger than 100?’, etc.

Now this game was going on for some time and I was starting to wonder if she actually had a number or was she just pretending (this was by far my cheekiest niece). Pretending to have a number is not too hard when the questions are coming at you one at a time. You just need to make sure you don’t box yourself in. For example, if one of us asked ‘is it bigger than 100?’, then she would have to reply ‘yes’. Otherwise we could just go through all the numbers from 1 to 100 asking ‘is it 1?’, ‘is it 2?’ etc, and when she answered ‘no’ to all of them she would have been caught out.  My niece could definitely pull it off.

Pretend numbers

A much deeper question is whether she could have decided in advance the answer to every possible yes/no question, without actually having a number.  In order for her to do that her  answers would have to be consistent.  That is, for any possible finite sequence of questions that we might ask, her answers could not contradict themselves.  We will call such a consistent set of answers a pretend number (the technical term is a non-principal ultrafilter).

So do pretend numbers even exist? The answer turns out to be more philosophical than mathematical. A pretend number requires an infinite set of answers to every possible yes/no question. Thus their existence is tied up with the axiom of choice. The axiom of choice is traditionally explained in terms of pairs of identical socks. If I had $3$ such pairs and I wanted to pick one sock from each pair, then I would have $2  \times 2\times2 =8$ ways of doing so.  If I had $5$ pairs of socks then I would have $32$ ways of picking one from each pair. The more pairs of socks I have, the more ways of choosing one from each pair. So it seems reasonable that if I had infinitely many pairs there would certainly be at least one way of choosing one sock from each pair.

However, this would require infinitely many choices to be made and it is not clear that this is possible.  Note that the socks are identical, so I cannot for example just pick the larger one from each pair and explain all my choices in one sentence.  The axiom of choice says that infinitely many choices can be made. It cannot be proven or disproven mathematically, so we are free to believe it or not.  Whilst the sock example may make it sound reasonable, it has some pretty unintuitive consequences if you believe it.  For example, the freedom to make infinitely many choices would allow you to cut a solid object into such fuzzy and strange pieces, that you could put them back together and have twice the volume that you started with!

Moreover, if you believe that it is possible to make infinitely many choices then pretend numbers do exist.  In fact, for any consistent set of answers to some yes/no questions there is a pretend number with those properties.

So there is a pretend number $U$, with leading digit 2 such that for any number $n$, if I ask, as shown above, ‘Does $n+U$ have leading digit 2?’, the answer is ‘yes’.

Note that no actual number has this property!  If you had selected an actual number, say $2341$, then it would fail as $1000+2341$ does not have leading digit 2.  However these answers are consistent, as for any finite set of values of $n$ that I could ask about, there is always an actual number $m$ that $U$ could be.  That is, $n+m$ all have leading digit 2.  For example, if I say ‘yes, $1000+U$ has leading digit 2, and so does $123456+U$ and $7101202+U$’, then I haven’t contradicted myself as $U$ could be 20,000,000.

On the other hand, if we do not believe in the axiom of choice, then there need be no pretend numbers at all.  The surprising and somewhat controversial thing is that both answers are mathematically correct.  As my undergraduate lecturer put it, if you believe in a mind that can make infinitely many choices then you are more likely to believe in the axiom of choice, whereas if you only believe in the concrete and tangible, you may be more inclined not to.

My niece never told us what her number was.  Did she have a number?  Was she tricking us?  Had she found a way around the philosophical ambiguity of the axiom of choice and come up with a pretend number?  We will never know.

Properties of pretend numbers

Given the philosophical and even religious ambiguity that surrounds their existence, one can perform surprisingly concrete operations on pretend numbers, such as arithmetic.  If I want to add an actual number $n$ to a pretend number $U$ then I get a pretend number $n+U$.  But what is $n+U$?  Recall that a pretend number is just a collection of consistent answers to every yes/no question about it, so to say what $n+U$ is I just need to be able to answer yes/no questions about it.  But a question about $n+U$ is really just a question about $U$, and we know all the answers to yes/no questions about $U$, since $U$ is a pretend number.  For example, if you ask ‘Is $3+U$ a square number?’, then the answer is just the same as the answer to the question ‘Is $U$ three less than a square?’.

Some examples of numbers presented in a very colourful way (Mark Morgan, CC-BY 2.0)

Now for the mind-bending part.  How do I add two pretend numbers? What is $U+V$?  Again I just need to be able to answer yes/no questions about $U+V$. Consider a yes/no question: ‘Does the number have property $A$?’ (eg $A$ could be the property of being prime or being a square).  For any actual number $u$, we have just seen how to answer for $u+V$.  So we can ask ‘Is $U$ one of the actual numbers $u$, such that $u+V$ has property $A$?’  We know the answer, as by definition we know the answer to all yes/no questions about $U$.

So we say that $U+V$ has property $A$ precisely when $U$ has the property of actual numbers $u$ that $u+V$ has property $A$.

This takes a bit of getting your head around, but is not as bad as it seems. Let us look at an example. Suppose Umar is playing the guessing game and is thinking of the number $2$, and Vicki is playing the game and thinking of the number $3$.  Then if we add their numbers via the process just described we would hope to get $5$ (so our notion of addition agrees with addition of actual numbers).  Let’s see if that is indeed the case:

The sum has a property $A$ precisely when Umar’s number has the property of numbers $u$ that $u+V$ has property $A$ (where $V$ denotes Vicki’s number).  As Umar is thinking of the number $2$, he would answer yes precisely if $2+V$ has property $A$.  This is the case precisely when $V$ has the property of actual numbers $v$ that $2+v$ has property $A$.  As Vicki is thinking of $3$, she would answer yes precisely if $2+3$ has property $A$.

In summary, the sum of Umar and Vicki’s numbers has a property precisely when $2+3=5$ has that property.  In particular, if you ask if the sum is $5$, then the answer is yes.  Thus the sum of Umar and Vicki’s numbers is indeed $5$ as we had hoped.

Addition of pretend numbers behaves quite sensibly in many ways.  For pretend numbers $U, V, W$ we have $$(U+V)+W= U+(V+W)$$ as you would expect.  Also for an actual number $n$ we have $n+U=U+n$.  The big shock is that there exist pretend numbers $U$ and $V$ with $$U+V \neq V+U.$$
For example, we have already seen that we can have a pretend number $U$ with leading digit 2, such that for any actual number $n$ the leading digit of $n+U$ is still 2.  Similarly we can have a pretend number $V$ with leading digit 5, such that for any actual number $n$ the leading digit of $n+V$ is still 5. So what is the leading digit of $U+V$?

Once we accept the axiom of choice we are able to assume the existence of pretend numbers…

For any actual number $u$ the leading digit of $u+V$ is 5, so if you ask if $U$ has this property, I would have to say `yes’, or you would have caught me out as not having an actual number.  So the leading digit of $U+V$ is 5.  But by the same logic the leading digit of $V+U$ is 2, so they cannot be the same!

Once we accept the axiom of choice we are able to assume the existence of pretend numbers with desired properties and do calculations with them as above.  However we should bear in mind that we can never actually specify a pretend number.  We cannot specify consistent answers to all the infinitely many yes/no questions one might ask about it.  To actually produce a pretend number would contradict the fact that it is not mathematically wrong to say that there are no pretend numbers.


Given this intangibleness, it is natural to assume that pretend numbers are an intellectual curiosity, of no relevance to anything else. In fact nothing could be further from the truth.  They play a prominent role in many fields such as mathematical logic, non-standard analysis, point set topology, topological dynamics, geometric group theory, combinatorics and field extensions and products of algebras.

One especially amazing application is in the field of Ramsey theory.  Ramsey theory considers the painting of structures  (such as the connections in a network or whole numbers) with different colours. In particular, it determines which structures are guaranteed to be found in a part painted a single colour.  The usual idea is that if so little of the structure is painted one colour so that what you are looking for cannot be found in that colour, then enough must be painted in the remaining colours that what you are looking for can be found in one of them.  A simple example is that if you paint $11$ balls each either red or green, then $6$ of them must be the same colour.  You do not know if there are $6$ green balls or $6$ red balls, but if you don’t have one then you must have the other.

In order to appreciate the application of pretend numbers to Ramsey theory, I invite you to solve the following puzzle before reading on:

In the box below the numbers from 1 to 30 have been coloured with different colours.  Can you find three distinct numbers that are all the same colour and all the same colour as any sum of some of them?

Hopefully that wasn’t too hard.  Now suppose that the numbers from 1 to 1,000,000 are coloured red, green, blue and yellow.  Could you guarantee that no matter how they are coloured, you can still find three distinct numbers, all the same colour and whose sums are all the same colour?  It is much harder when you do not know how the numbers are coloured, as you must account for every possible colouring.

University of California, Berkley (Charlie Nguyen, CC-BY 2.0)

Now suppose that all the numbers, $1,2,3,\ldots,$ are coloured with a finite number of colours.  Instead of finding just three numbers, could you guarantee that there is an infinite set of numbers, so that if you add any finite combination of them together, the result is always the same colour?

In 1970 this was an open problem that combinatorialists were seeking a solution to.  Around this time a young mathematician, Steven Glazer of Berkeley, thought of a short and elegant proof that however all the numbers are coloured, such an infinite set can always be found.  However his proof would only work if there were a pretend number $U$ with $U+U=U$.

Not knowing if a pretend number with the required property existed, Steven Glazer did not publish his work.  In 1974 Neil Hindman finally proved that such an infinite set can always be found, via a much more complicated proof.  The result now bears his name: Hindman’s Theorem.

A year later Steven Glazer mentioned his proof to another mathematician, Fred Galvin.  Now, Fred Galvin knew about compact semigroups.  These structures arose in an area of mathematics known as dynamical systems, which grew out of the study of how physical systems evolve over time-far away from the world of colouring in whole numbers, one might think.

Don’t worry if you don’t know what a non-empty compact semicontinuous semigroup is.  What is important is that the pretend numbers are one (assuming the axiom of choice) and Fred knew it.  Further he knew that in  any  such semigroup, one can solve $U+U=U$.

Note no actual number has this property, as when you add an actual number to itself it gets bigger.  However, using mathematics from this very different field, one can deduce that there is a pretend number with this property. Thus Steven Glazer’s proof was valid.

Glazer’s proof

Suppose all the positive whole numbers have been coloured with some finite set of colours.  Let $U$ be a pretend number with the property that $U+U=U$.  We can ask if $U$ is red.  If not we can ask if it is green.  Proceeding through all the colours, the answer must at some point be yes (or the answers would not be consistent and the pretend number $U$ would be revealed as fake).

Suppose then that $U$ is blue.  As $U+U=U$, it is not saying anything extra to add that $U+U$ is blue.  Thus $U$ satisfies the property of actual numbers $u$ that ‘$u$ is blue and $u+U$ is blue’.  Thus there must actually be a number $x_1$ with this property (or $U$ is revealed as fake).
Now we know the following are blue:

$U$, $U+U$, $x_1$, $x_1+U$, $x_1+U+U$.

Rearranging we have the following all blue:

$U$, $U+U$, $x_1$, $U+x_1$, $U+ x_1+U$.

Let $FS$ of a sequence be the set of sums of distinct terms.  So we may make the above statement more succinctly by saying that the $FS(x_1,U,U)$ are all blue. So $U$, has the property of actual numbers $u$ that the following are all blue:

$u$, $u+U$, $x_1$, $u+x_1$, $u+x_1+U$.

Again there must actually be such a number $x_2$, or $U$ would be revealed as fake. Thus  $FS(x_1,x_2,U)$ are blue. We proceed in this way by induction.

Suppose we have established that $FS(x_1,\ldots, x_n,U)$ are all blue. As $U=U+U$, we know that  $FS(x_1,\ldots, x_n,U,U)$ are all blue.  Thus there must actually be a number $x_{n+1}$ so that  $$FS(x_1,\ldots, x_n,x_{n+1}) \qquad  {\rm and} \qquad  FS(x_1,\ldots, x_n,x_{n+1}+U)$$ are all blue.  We already knew that  $FS(x_1,\ldots, x_n,U)$ are all blue, so combining all three sets we get that  $FS(x_1,\ldots, x_n,x_{n+1},U)$ are all blue.

Thus we obtain an infinite sequence $x_1,x_2,x_3,\ldots$, whose sums of distinct terms are all blue.  Either we have a subsequence of infinitely many distinct $x_i$ or some value $x$ gets repeated infinitely often.  In the latter case all finite sums of distinct terms of the sequence $x,2x,3x,4x,\ldots$ are blue.

Either way we have infinitely many distinct numbers, with any sum of distinct numbers being blue and the proof is complete.

And so…

Steven Glazer found out that his solution works, one year after the problem had been solved by someone else.  Nonetheless his proof is much celebrated by mathematicians for its shortness, its elegance and how it demonstrates the importance of mathematicians sharing ideas from different fields.  In the words of George Bernard Shaw: “If you have an apple and I have an apple and we exchange apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”