Surname extinction

Why do surnames die out? We take a look at the Galton-Watson process for modelling the extinction of surnames to answer the question: ‘When will we all be Smiths?’

Ever wondered if it it’s possible for your surname to die out? Well, unfortunately, it is. In fact, since 1901 about 200,000 surnames have disappeared from England and Wales.

In 1873, Sir Francis Galton, a statistician, was working on the problem of how to predict surname extinction and submitted problem 4001 to the Educational Times:

Francis Galton. Image: public domain.

“A large nation of whom we will only concern ourselves with the adult males, $N$ in number, and who each bear separate surnames, colonise a district. Their law of population is such that, in each generation, $a_0$ percent of the adult males have no male children who reach adult life; $a_1$ have one such male child; $a_2$ have two; and so on, up to $a_5$ who have five. Find (1) what proportion of the surnames will have become extinct after $r$ generations; and (2) how many instances there will be of the same surname being held by $m$ persons.”

He noted that a general solution was preferable as ‘he finds it a laborious matter to work it out numerically’. The Reverend Henry Watson replied to him in the next issue of the Educational Times with a solution and together in 1875 they published their paper ‘On the extinction of families.’

Image: Public domain.

Henry Watson.
Image: public domain.

So what was Watson’s solution? Well, it formed the basis for the Galton-Watson process, a stochastic branching process that is used to model the transferal of Y chromosome genes and, of course, surnames.

The Galton-Watson process

Firstly note we are assuming a purely patriarchal society so the surname can only be transferred though the male line (although the model is the same for matriarchal societies) so as a by-product female children are effectively ignored. Galton’s suggested set-up assumes that the generations are effectively discrete and separate from one another and the maximum number of male surviving children each male may have is 5.

Firstly, let’s generalise this a bit more. Let’s say there is some number of sons, say $q$, where it is highly unlikely that any man will have more than $q$ sons who survive to adulthood. So we may say that $a_{q+1}$ is so small it is effectively zero and we don’t need to consider the possibility of there being more than $q$ surviving sons.

Now let’s address the question of distinct generations. Clearly the generations must overlap. It’s perfectly possible for a father to have a son and grandson in the same generation, so how do we tackle this? We may define $$t_i=\frac{a_i}{100}$$ to be the chance of any individual man having $i$ surviving sons in any generation. Because we are effectively saying a man can’t have more than $q$ children, we must have that $$\sum_{i = 0}^{i = q} t_i = 1.$$

So if there are $p$ men with the surname Skywalker, consider the following polynomial: $$(t_0+t_1 x+t_2 x^2+….+t_q x^q)^p$$

If $p=1$ then clearly the chance of there being $n$ Skywalkers in the next generation is $t_n$. So for some general $p$ the chance of there being $n$ Skywalkers will be the coefficient of $x^n$ in the polynomial above. For neatness, let $$T=t_0+t_1 x+t_2 x^2+…+t_q x^q.$$

But we’re not just interested in the number of individuals with a particular surname, we also want to know the likelihood of a surname dying out. Define $m_{rs}$ to be the fraction of $N$ surnames that have $s$ representatives in the $r$th generation. So the number of surnames with $s$ representatives in generation $r$ is $m_{rs}N$. Hence the number of surnames with $n$ representatives in generation $r$ will be the coefficient of $x^n$ in the polynomial $$(m_{r0}+m_{r1}T+m_{r2}T^2+…+m_{rq}T^q)N.$$

We may define a series of functions $$ f_r(x)=f_{r-1}(t_0+t_1 x+t_2 x^2+…+t_q x^q) $$

with $$ f_1(x)=t_0+t_1 x+t_2 x^2+…+t_q x^q. $$

So the fraction of surnames with $s$ representatives in the $r$th generation is the coefficient of $x^s$ in $f_r(x)$. The total number can be found by multiplying this number by $N$.

So $N f_r(0)$ surnames become extinct in the $r$th generation and there will be $ \frac{N}{s! } $$ \frac{d}{dx} f_r(0) $ surnames with $s$ representatives in the $r$th generation.

Using the Galton-Watson Process

Let’s consider a couple of examples. Consider the situation where we have a population with $10$ surnames and let’s take $q=5$ as in Galton’s original problem. Firstly if we consider the case where a man is equally likely to have anywhere between no, and 5 surviving sons so $$t_0=t_1=t_2=t_3=t_4=t_5=\frac{1}{6}.$$

We may use the Galton-Watson process to set up an iterative procedure. Let $$f(x)=\frac{1}{6}\sum_{i=0}^{5} x^i$$ so the first generation will have $10-10f(0)$ surnames then the second generation will have $10-10f(f(0))$ and so on. It is fairly simple to use Mathematica to find the number of surnames left after 10 generations.

Galton-Watson simulation with $N=10$, $q=5$ and equal likelihood of number of surviving sons.

We see a leveling off after the third generation with roughly 2 out of 10 surnames lost. But what if the likelihood of the number of sons is dependent on the number of sons? Consider the case $$t_i=\frac{1}{5+i}$$ with $$t_0=1-t_1-t_2-t_3-t_4-t_5.$$ We see

Galton-Watson simulation with N=10, q=5, and varying likelihood of surviving sons.

This time we see roughly five surnames are lost with again a plateau forming after the fifth generation.

Now, let’s look at a more realistic simulation. Don’t forget that Galton was a statistician, so we’ll try a probability-based approach. Let’s suppose that the number of surviving sons is distributed using a Poisson distribution (chosen as it conveniently doesn’t cover negative values) with an a mean of 1 (ie the average number of surviving sons is 1). So $$t_i=P(x=i).$$ In this case we obtain the following.

Galton-Watson simulation with N=10, q=5 and the likelihood of surviving sons dependent on the Poisson Distribution.

This time we see less of a plateau occurring and about eight surnames having become extinct by the 10th generation.

But let’s face it, so far we’ve just been guessing numbers and trying them. To use the Galton-Watson process we really need to get the statistics on how many sons a man would have in each generation, and then fit an appropriate distribution.

But does surname extinction actually happen?

So in our simulations we saw that some surnames became extinct after 10 generations depending on  how we quantify the likelihood of surviving sons. But how do we know it actually works?

Well, surname extinction is a well documented phenomena in China. Here, surnames as we know them were in use by 220 BC (unlike in Britain where surnames came into use in the Middle Ages). Statisticians argue that there were originally around 4000 to 6000 surnames, while today 70% of the Chinese population have one of just 45 different surnames.

So is it likely that our descendants are going to have the surname Smith? Perhaps.

Is it possible that in the future your surname will have become extinct? Definitely.

Eleanor is a postdoctoral researcher at the University of Manchester. A mathematician by training, she works on developing mathematical models to improve our understanding of biological mechanisms in medicine, with particular interests in women’s health and autoimmune conditions. When not doing mathematics, she crochets, sews and reads everything and anything.

More from Chalkdust