Normal numbers

Imagine if we sample numbers from the unit interval [0,1] and count the occurrences of each digit in every number’s infinite decimal representation. For instance, in the number 0.0123456789, every non-zero digit appears exactly once, but 0 appears infinitely often (trailing zeros). On the other hand, for the number 0.\overline{0123456789}, the digit occurrences (after the “dot”) is equal for every digit. As the number of samples increases, one may expect that the number of occurrences for each digit in \{0,1,\ldots,9\} to be close to one another. In fact, the Normal Number Theorem, first formulated by Borel, not only formalizes this intuition but goes a step further: for almost every number in [0,1], each digit appears equally likely. Below we discuss this theorem in detail.

We first recall a few basic facts about the representation of a number. Rather than working entirely in base 10, we work with an arbitrary base b\geq 2.

Let b\in\mathbb{N} with b\geq2. We say that x has a base-b representation if there is some M\in\mathbb{N} such that the series \sum_{m=0}^{M}z_{m}b^{m}+\sum_{n=1}^{\infty}x_{n}b^{-n} converges to x, with z_{m},x_{n}\in\mathbb{Z}_{b}.

In other words, \sum_{m=0}^{M}x_{m}b^{m} is the integer component of x and \sum_{n=1}^{\infty}x_{n}b^{-n} the fractional component. With b=10, our familiar decimal representation, we know that every real number has a decimal representation, and by identifying 0.999... as 1, the representation is unique. Analogously, for every b\geq2, every real number has a base-b representation as well.

(Normal numbers for base b) Let x\in\mathbb{R} with x=\sum_{m=1}^{M}z_{m}b^{m}+\sum_{n=1}^{\infty}x_{n}b^{-n}. We say that x is normal with respect to base b and digit d if \lim_{N\rightarrow\infty}\frac{1}{N}\#\{n:x_{n}=d,1\leq n\leq N\}=\frac{1}{b}, and x\in\mathbb{R} is normal with respect to base b if for every d\in\mathbb{Z}_{b}, it is normal with respect to base b and digit d.

Note that in the definition above, z_{m}, the digits in the integer component of x, are completely ignored. The reason is that for every x\in\mathbb{R}, its integer component is bounded, so M in the above definition is fixed and does not affect the limiting value of digit frequencies.

We say that x\in\mathbb{R} is normal if for every b\geq2, x is normal with respect to b.

We now state the Normal Number Theorem formally:

There is a null set E\subset\mathbb{R} such that every x\notin E, x is normal.
The theorem is non-constructive — while one is almost guaranteed to find a normal number, it does not construct specific normal numbers. Furthermore, it is not known if famous irrational numbers such as \pi and e are normal.
The power of the theorem lies in aggregation by 1) non-constructively examining all numbers at once instead of a individual number and 2) considering all digit precision instead of a specific precision. For instance, the frequency of the leading digit often obeys Benford’s Law, where the digits 1 and 2 in a decimal representation tend to appear more frequently than 8 and 9.

Let us first consider a truncated version of the theorem. Consider tossing a fair coin independently N times. From a combinatorial argument, most sequences of coin tosses have about N/2 heads and N/2 tails. By interpreting a head as 1 and a tail as 0, most N-bit numbers are normal. The passage from N bits to infinite bits is the crux of the theorem, since the combinatorial process of selecting a random N-bit number no longer extends. The proof illustrates how a measure-theoretic abstraction can circumvent this difficulty. In particular, we will apply the following “\epsilon2^{-n}-trick” repeatedly.

Let (\Omega,\mathcal{F},P) be any probability space. Let \{E_{n}\}_{n\in\mathbb{N}} be a countable collection of events E_{n}\in\mathcal{F} with each P(E_{n})=0. Then P(\cup_{n\in\mathbb{N}}E_{n})=0.
Let \eps>0. By assumption, P(E_{n}) can be bounded above by \eps2^{-n}. Then

    \begin{eqnarray*}   P(\cup_{n\in\mathbb{N}}E_{n}) & \leq & \sum_{n=1}^{\infty}P(E_{n})\\                                 & \leq & \sum_{n=1}^{\infty}\eps2^{-n}\\                                 & \leq & \eps. \end{eqnarray*}

Since \eps was arbitrary, P(\cup_{n\in\mathbb{N}}E_{n}) must equal 0.

(Normal Number Theorem)
First consider the interval [R,R+1], where R\in\mathbb{N}. By the previous proposition, it suffices to show that the set of non-normal numbers in [R,R+1] is null. Now fix b\geq2. By the proposition again, it suffices to show that the set of numbers in [R,R+1] that are not normal with respect to b is null. Now fix d\in\mathbb{Z}_{b}. By the proposition again, it suffices to show that the set of numbers in [R,R+1] that are not normal with respect to base b and digit d is null. By definition, for x\in[R,R+1], x is normal with respect to base b and digit d iff x-R is normal with respect to base b and digit d. Thus, without loss of generality, we may further assume that R=0.

Now consider the probability space ([0,1],\mathcal{B},\mathbb{P}), where \mathcal{B} is the set of Borel sets on [0,1], and \mathbb{P} the Lebesgue (probability) measure. For n\in\mathbb{N}, define X_{n}:[0,1]\rightarrow\{0,1\} to be X_{n}(x)=\mathbb{I}_{\{x_{n}=d\}}, where x_{n} is the n-th digit of the base b expansion x=\sum_{n=1}^{\infty}x_{n}b^{-n}.

\{X_{n}\}_{n\in\mathbb{N}} is a sequence of i.i.d. Bernoulli random variable with probability \frac{1}{b}.
(of Claim)
As defined, X_{n} is an indicator function, and for x_{n}\in\{0,1\}, the set \{x:X_{n}(x)=x_{n}\} is just a union of intervals:

    \[ \bigcup_{x_{1}\ldots,x_{n-1}\in\mathbb{Z}_{b}}\left[\sum_{i=1}^{n}x_{i}b^{-i},\sum_{i=1}^{n}x_{i}b^{-i}+b^{-n}\right), \]

which is a Borel set, implying that X_{n} is a Bernoulli random variable with \mathbb{P}(X_{n}=x_{n})=b^{n-1}\cdot b^{-n}=\frac{1}{b}.

To show independence, it suffices to show that for every N\in\mathbb{N}, the family of random variables \{X_{n}\}_{n=1}^{N} is independent. This follows directly from the observation that \mathbb{P}(X_{1}=x_{1},\ldots,X_{N}=x_{N})=b^{-N} (since the interval [\sum_{n=1}^{N}x_{n}b^{-n},\sum_{n=1}^{N}x_{n}b^{-n}+b^{-N}) has length \frac{1}{b^{N}}) and the previous observation that \mathbb{P}(X_{n}=x_{n})=\frac{1}{b}.

Given our setup, we now apply the Strong Law of Large Numbers to conclude: there exists a null set E so that for every x\not\notin E, \lim_{N\rightarrow\infty}\frac{1}{N}\sum_{n=1}^{N}X_{n}(x) converges to E[X_{1}(x)]=\frac{1}{b}, i.e., x is normal with respect to base b and digit d.

As a corollary, we conclude that for every continuous random variable, its output is most likely normal.

Let (\Omega,\mathcal{F},P) be a probability space, and X:\Omega\rightarrow \mathbb{R} a continuous random variable with a probability density function. Then P(X\in E)=0, where E is the set of non-normal numbers.
By the Normal Number Theorem, the set of non-normal numbers, denoted E, is null, so for every n, there is some open set V_{n}\subset\mathbb{R} with P(V_{n})\leq1/n. Let f_X be the probability density function of X, then

    \begin{eqnarray*} P(X\in E) & \leq & P(X\in V_{n})\\ & = & \int f_{X}(x)\mathbb{I}_{V_{n}}(x)dx. \end{eqnarray*}

By the Dominated Convergence Theorem,

    \[ \lim_{n\rightarrow\infty}\int f_{X}(x)\mathbb{I}_{V_{n}}(x)dx=\int f_{X}(x)\lim_{n\rightarrow\infty}\mathbb{I}_{V_{n}}(x)dx=0, \]

finishing the proof.

A continuous random variable X having a probability density function is equivalent to the cumulative distribution function F_{X} of X being absolutely continuous. (And recall the Cantor function as an example where X may not even have a probability density function!) Alternatively, the above corollary also follows by observing that E can be approximated by a finite number of small intervals in \mathbb{R} (e.g., Littlewood’s First Principle as discussed in a previous blog post), which F_{X} being absolutely continuous must map into another null set.

Leave a Reply

Your email address will not be published. Required fields are marked *