In the days before email, mathematicians relied upon pen, paper and the postman to share ideas and communicate fiendish numerical taunts. An excited Dirichlet wrote to Kronecker in 1858:
… that sum, which I could only describe up to an error of order $\sqrt{x}$ at the time of my last letter, I’ve now managed to home in on significantly.
Dirichlet’s sum is associated to the divisor function $d(n)$: the number of distinct divisors of the natural number $n$. The number six is divisible by 1, 2, 3 and 6, so $d(6)=4$. Four has divisors 1, 2 and 4, so $d(4)=3$.
It might seem odd that a great mathematician was troubled by a quantity whose description amounts to a good grasp of ‘times tables’ – still odder that he was confessing to a distinguished contemporary that the damned thing had caused him grief.
It is not hard (although perhaps time-consuming) to work out the value of $d(n)$ for any given natural number $n$. But what if we want to provide a formula to describe it? Where to begin?
Mean values
The trouble is, the divisor function is not at all well-behaved. In fact, it jumps significantly in value infinitely often. The adjacent integers $d(200560490130)=2^{11}$ and $d(200560490131) = 2$ provide one such example, but you can happily construct as large a jump as you fancy. Try it.
The divisor function is one of many interesting arithmetic functions; understanding their asymptotic behaviour as $n\to\infty$ is a key question in multiplicative number theory. However, as we saw above, the value of these functions can be erratic.
To try to smooth this wildness, you could instead consider certain types of ‘average’. Dirichlet’s sum was one such average.
He began by accumulating a sum from the divisor function on adjacent naturals:
\begin{equation}
D(x) = d(1) + d(2) + \cdots + d([x]) = \sum\limits_{n \leqslant x} d(n),
\end{equation}
where $[x]$ indicates the largest integer less than or equal to $x$.
Upon dividing the sum $D(x)$ by $x$, we get the mean value of the divisor function up to $x$.
Taking the mean has the effect of ‘smoothing out’ the jumps in the divisor function. But just how well-behaved is this average? Do we get some sort of regular behaviour as $x$ becomes very large?
Hyperbolæ
Estimating the divisor mean comes more naturally if we take a different perspective on the divisor function. Instead of viewing $d(n)$ as the number of divisors of $n$, we think of it as the number of natural-valued pairs $(a,b)$ such that $n=ab$. The number 6 may be written 1 × 6 = 2 × 3 = 3 × 2 = 6 × 1, giving us the four pairs (1,6), (2,3), (3,2) and (6,1). If we wanted to see the divisors, we could just look at the first number in each pair. For 4 we have the three pairs: (1,4), (2,2) and (4,1); again showing $d(4)=3$.
Why not treat these natural-valued pairs, from here on called lattice points, as coordinates in the plane? The question of counting the number of divisors of $n$ turns into that of counting the lattice points lying on the hyperbola $XY=n$. If $n$ is not a natural number, then the hyperbola contains no lattice points. If $n$ is a natural, then the hyperbola contains exactly $d(n)$ lattice points.
Let us picture the $XY=1$ hyperbola given by setting $n=1$.If we now let $n$ be real-valued and increase it continuously, the hyperbola sweeps through the upper quadrant of the plane. The only time that the curve passes through lattice points is when $n$ hits a natural number.
By the reasoning above $d(1) + d(2) + \cdots + d([x])$ is a count of all of the pairs that the hyperbola sweeps
through as we run $n$ from 1 through to $x$. Dirichlet’s sum $D(x)$ counts the number of lattice points lying below the hyperbola $XY=x$.
Perhaps the simplest way of summing up these lattice points is to group them into vertical towers. For each natural-valued $X$-coordinate we look vertically, counting $\lbrack \frac{x}{n}\rbrack$ for the number of lattice points lying directly above $X=n$, but below the hyperbola. So
\begin{equation}
D(x)= \sum\limits_{n \leqslant x}\left\lbrack \frac{x}{n}\right\rbrack.
\end{equation}
Keeping track of errors: big-O notation
Our count above is still exact and depends upon the precise value of the divisor function, which we know to jump suddenly in value. If we are happy to settle for trying to determine the dominant behaviour of $D(x)$, then we can afford to be a little less precise: allowing for some small deviation can make it easier to provide a more useful formula whose value is easy to calculate. This deviation from the precise value of the function is usually called error, and provided that we don’t introduce too much, we can get a good idea of the dominant behaviour.
For instance, say we want to get an idea of the size of the function $f(x) = [x^2]$ for large $x$. It makes sense to resign ourselves to an error of at most 1, taking just the larger and larger $x^2$ term. To keep track of our neglected error we write $f(x) = [x^2] = x^2 + O(1)$. The symbol $O(1)$ means that we are allowing an error that is at most constant with respect to $x$.
If our function were more complicated, we might wish to keep track of an error that varies with respect to $x$, instead of remaining constant. Big-O notation gives us this flexibility: we write $r(x) = O(g(x))$, for some positive function $g(x)$, to mean that there is a constant $K$ and a positive number $x_0$ such that whenever $x \geqslant x_0$ we have that $|r(x)| \leqslant K g(x)$.
Let us lend flavour to this slightly knotty definition. For the longevity of this article and for our younger readership, we define an analogue radio to be an obsolete cuboidal object with a knob, which ingests radio waves and—subject to correct knob position and alignment of the clouds—excretes a melody of questionable taste.One day—not too cloudy—you twizzle the knob and your favourite tune blares out. At least, you think it is your favourite tune… There is some distractingly apparent white noise, rendering it a challenge to make out the precise melody. You try turning up the volume, but this only helps up to a certain point. From some volume onwards, the white noise remains proportionally loud.
We can return to big-O notation by thinking of the $x$ variable in the definition above as the volume of the radio’s excretion, the white noise as the error $r(x)$ from your favourite tune. Then $r(x) = O(g(x))$ is to say that: whenever the volume $x$ is greater than some $x_0$, the radio’s deviation $r(x)$ from your favourite tune is at most proportional to some function $g(x)$.
Now for a more mathematical example: $f(x) = x^7 + 5x + \sin(x)$. If $x$ is large then $x^7$ is very large. If you were to plot a graph of $f(x)$, the $5x$ and $\sin(x)$ terms would seem less and less significant. We could make use of big-O notation to indicate the dominant behaviour: $f(x) = x^7 + O(x)$.
For Dirichlet’s sum we can now write
\begin{equation}
D(x) = \sum\limits_{n \leqslant x}\left(\frac{x}{n} + O(1)\right) = x\sum\limits_{n \leqslant\ x}\frac{1}{n} + O(x).\label{D1}\tag{1}
\end{equation}
The sum $\sum_{n \leq x}\frac{1}{n}$ is (up to a constant) the same as the integral of $\frac{1}{t}$ over the interval $[1,x]$. This
integral is $\log(x)$. The constant in question approaches the Euler–Mascheroni constant, $\gamma \approx 0.57$, at a rate of $O\left(\frac{1}{x}\right)$. As $x$ gets large this error vanishes, so that the sum $\sum_{n \leqslant x}\frac{1}{n}$ is close in value to $\log(x) + \gamma$. Substituting into \eqref{D1} gives
\begin{equation}
D(x) = x\left(\log x + \gamma\right) + O(x) = x\log x + O(x).
\end{equation}
Phrased differently, the mean value of the divisor function does behave well for large $x$: it grows like $\log(x)$ with some fixed constant error:
\begin{equation}
\frac{1}{x} \sum\limits_{n\leqslant x} d(n) = \log x + O(1).
\end{equation}
Exploiting symmetry
This was not good enough for Dirichlet. He wanted to pin down the constant—that number floating around in the $O(1)$ envelope above. With some rather clever counting he exploited a natural symmetry of the hyperbola $XY=x$ so as to make fewer than the $x$ lots of $O(1)$ approximations we made in \eqref{D1}.
If we swap the $X$ and $Y$ axes the hyperbola is left unchanged: there is a reflective symmetry about the line $X=Y$. Counting the lattice points lying below the hyperbola but vertically above the $X$-axis coordinates $1, 2,\ldots, [\sqrt{x}]$ is exactly the same as counting the lattice points lying below the hyperbola but horizontally to the right of the $Y$-axis coordinates $1, 2, \ldots, [\sqrt{x}]$. Combined, these counts cover all of the lattice points under the hyperbola. Except that we have over-counted the $[\sqrt{x}]\times[\sqrt{x}]$ lattice points lying in the square below the hyperbola which is sent to itself under the symmetry. In symbols:\begin{align*}
D(x) = 2 \sum\limits_{n \leqslant \sqrt{x}} \left\lbrack \frac{x}{n} \right\rbrack – \left\lbrack \sqrt{x} \right\rbrack ^2.
\end{align*}
Arithmetic along the lines of \eqref{D1} transforms this into:
\begin{align*}
D(x) = x \log x + (2\gamma – 1)x + O(\sqrt{x}).
\end{align*}
Dirichlet had pinned down the constant. Upon dividing both sides above by $x$, the $O(\sqrt{x})$ turns into $O(\frac{1}{\sqrt{x}})$, which vanishes as $x$ becomes large. We have shown that the mean value of the divisor function $d(n)$ taken over $n\leqslant x$, for large $x$, is $\log(x) + 2\gamma – 1$.
Dirichlet’s divisor problem
Well, I feel content. Don’t you? But not Dirichlet. Never Dirichlet. Despite a decent amount of work, a rather sneaky argument and answering all of the questions that he had asked, he wanted to further dissect the $O(\sqrt{x})$. He was convinced that he was being far more imprecise than he needed to be: like approximating $[x^2]$ by $x^2+O(x)$, when in fact we can be sure that it is $x^2 + O(1)$.
Writing
\begin{equation}
\mathit{\Delta}(x) = D(x) – x\log x – (2\gamma – 1)x,
\end{equation}
our efforts so far have shown that $\mathit{\Delta}(x) = O(\sqrt{x})$. In his letter to Kronecker, Dirichlet hinted that he could replace $O(\sqrt{x})$ by something significantly more precise. Nothing in all of Dirichlet’s notes has been found to back up this claim. Perhaps it was a simply a taunt aimed at Kronecker? Perhaps he made a mistake. In any case it gave birth to the Dirichlet divisor problem:
The “ε condition” is a neat way of saying that we are only interested in determining the smallest possible $\alpha$ up to a
fixed power of $\log(x)$. The logarithm grows more slowly than any positive power of $x$. So for instance $x^2(\log(x))^7 = O(x^{2+\varepsilon})$ for every $\varepsilon> 0$.
The other end up
The first fifty years after Dirichlet’s letter saw little improvement. Progress was slow and the going was tough. Together with Littlewood, Hardy decided to take a different tack to the ‘O’ approach. Noting the difficulty of containing the error in a big-O envelope, their idea was to “attack the problem from the other end“, introducing the notion of ‘Ω’ results. They sought to display positive functions $g(x)$, for which there exists a constant $K$ such that the multiple $Kg(x)$ is exceeded by $|\mathit{\Delta}(x)|$ for arbitrarily large values of $x$.
By finding such a $g$ of as large an order of magnitude as they could, they were proving that the divisor sum error $\mathit{\Delta}(x)$, was at best $O(g(x))$. In this way, they could install the lower bound of an interval for possible $\alpha$. The year 1915 saw Hardy prove an ‘Ω‘ result showing that $\alpha \geqslant \frac{1}{4}$. When read alongside Dirichlet’s hyperbola result we surmise that $\alpha \in \lbrack \frac{1}{4}, \frac{1}{2}\rbrack$.
Hot oscillations
Since around 1922, the sharp end of research has involved playing with a daunting representation of the error:
\begin{equation}
\mathit{\Delta}(x) = \frac{x^{1/4}}{\mathrm{\pi}\sqrt{2}}\sum\limits_{n=1}^{\infty}\frac{d(n)}{n^{3/4}} \cos \left(4\mathrm{\pi}\sqrt{nx} – \frac{1}{4}\mathrm{\pi} \right).
\end{equation}
Deriving such an expression is no mean feat. As illustration: we can encode the divisor function in a Dirichlet series, which turns out to equal the square of the Riemann-zeta function. As a meromorphic function, the techniques of complex analysis can be put to work. Mellin transforms, asymptotic formulæ for Bessel functions and contour integration all feature in the derivation of the equation for $\mathit{\Delta}(x)$ above.
Because of the oscillatory nature of the cosine, researchers have used this handy expression to provide both lower bounds and upper bounds for $\alpha$, turning that infamous phrase: “it’s not the size, it’s what you do with it” on its head. Here, it’s what you do with it that indicates its size: ‘Ω’ or ‘O’.
In the form above, the representation is not very useful. Far better is a truncation of the sum at some carefully chosen $n = N(x)$, together with a big-O error for the remainder of the terms. There is an art to deciding the truncation point so as to reveal dominant behaviour of the expression whilst keeping the error in check.
Closing in on α
Where does the problem stand? Thanks to Hardy we have a lower bound $\alpha \geqslant \frac{1}{4}$. In fact, from computational evidence and heuristic arguments this is expected to be exactly on the money—we believe that $\alpha = \frac{1}{4}$. If true, the first person to show an upper bound that is also $\frac{1}{4}$ will have solved Dirichlet’s divisor problem.
However, 150 years of dedicated work on the upper bound can be read in the decelerating sequence of improvements: $\frac{1}{2}$, $\frac{1}{3}$, $\frac{33}{100}$, $\frac{27}{82}$, $\frac{15}{46}$, $\frac{12}{37}$, $\frac{346}{1067}$, $\frac{35}{108}$, $\frac{7}{22}$. The current best has stood since 2003, when a contribution by Martin Huxley showed that $\alpha \leqslant \frac{131}{416} \approx 0.3149$.
We have confined $\alpha$ to the interval [0.25, 0.3149]. At the current rate of progress, we are still a long time from squeezing this interval to one number, thus pinning down the precise value of $\alpha$. Well over a century on, Dirichlet’s challenge is still to be met.