# Normal numbers

Imagine if we sample numbers from the unit interval and count the occurrences of each digit in every number’s infinite decimal representation. For instance, in the number , every non-zero digit appears exactly once, but appears infinitely often (trailing zeros). On the other hand, for the number , the digit occurrences (after the “dot”) is equal for every digit. As the number of samples increases, one may expect that the number of occurrences for each digit in to be close to one another. In fact, the Normal Number Theorem, first formulated by Borel, not only formalizes this intuition but goes a step further: for almost every number in , each digit appears equally likely. Below we discuss this theorem in detail.

We first recall a few basic facts about the representation of a number. Rather than working entirely in base , we work with an arbitrary base .

Let with . We say that has a base- representation if there is some such that the series converges to , with .

In other words, is the integer component of and the fractional component. With , our familiar decimal representation, we know that every real number has a decimal representation, and by identifying as , the representation is unique. Analogously, for every , every real number has a base- representation as well.

(Normal numbers for base b) Let with . We say that is normal with respect to base and digit if , and is normal with respect to base if for every , it is normal with respect to base and digit .

Note that in the definition above, , the digits in the integer component of , are completely ignored. The reason is that for every , its integer component is bounded, so in the above definition is fixed and does not affect the limiting value of digit frequencies.

We say that is normal if for every , is normal with respect to .

We now state the Normal Number Theorem formally:

There is a null set such that every , is normal.
The theorem is non-constructive — while one is almost guaranteed to find a normal number, it does not construct specific normal numbers. Furthermore, it is not known if famous irrational numbers such as and are normal.
The power of the theorem lies in aggregation by 1) non-constructively examining all numbers at once instead of a individual number and 2) considering all digit precision instead of a specific precision. For instance, the frequency of the leading digit often obeys Benford’s Law, where the digits and in a decimal representation tend to appear more frequently than and .

Let us first consider a truncated version of the theorem. Consider tossing a fair coin independently times. From a combinatorial argument, most sequences of coin tosses have about heads and tails. By interpreting a head as and a tail as , most -bit numbers are normal. The passage from bits to infinite bits is the crux of the theorem, since the combinatorial process of selecting a random -bit number no longer extends. The proof illustrates how a measure-theoretic abstraction can circumvent this difficulty. In particular, we will apply the following “-trick” repeatedly.

Let be any probability space. Let be a countable collection of events with each . Then .
Let . By assumption, can be bounded above by . Then

Since was arbitrary, must equal .

(Normal Number Theorem)
First consider the interval , where . By the previous proposition, it suffices to show that the set of non-normal numbers in is null. Now fix . By the proposition again, it suffices to show that the set of numbers in that are not normal with respect to is null. Now fix . By the proposition again, it suffices to show that the set of numbers in that are not normal with respect to base and digit is null. By definition, for , is normal with respect to base and digit iff is normal with respect to base and digit . Thus, without loss of generality, we may further assume that .

Now consider the probability space (, where is the set of Borel sets on , and the Lebesgue (probability) measure. For , define to be , where is the -th digit of the base expansion .

is a sequence of i.i.d. Bernoulli random variable with probability .
(of Claim)
As defined, is an indicator function, and for , the set is just a union of intervals:

which is a Borel set, implying that is a Bernoulli random variable with .

To show independence, it suffices to show that for every , the family of random variables is independent. This follows directly from the observation that (since the interval has length ) and the previous observation that .

Given our setup, we now apply the Strong Law of Large Numbers to conclude: there exists a null set so that for every , converges to , i.e., is normal with respect to base and digit .

As a corollary, we conclude that for every continuous random variable, its output is most likely normal.

Let be a probability space, and a continuous random variable with a probability density function. Then , where is the set of non-normal numbers.
By the Normal Number Theorem, the set of non-normal numbers, denoted , is null, so for every , there is some open set with . Let be the probability density function of , then

finishing the proof.

A continuous random variable having a probability density function is equivalent to the cumulative distribution function of being absolutely continuous. (And recall the Cantor function as an example where may not even have a probability density function!) Alternatively, the above corollary also follows by observing that can be approximated by a finite number of small intervals in (e.g., Littlewood’s First Principle as discussed in a previous blog post), which being absolutely continuous must map into another null set.