# Normal numbers

Imagine if we sample numbers from the unit interval and count the occurrences of each digit in every number’s infinite decimal representation. For instance, in the number , every non-zero digit appears exactly once, but appears infinitely often (trailing zeros). On the other hand, for the number , the digit occurrences (after the “dot”) is equal for every digit. As the number of samples increases, one may expect that the number of occurrences for each digit in to be close to one another. In fact, the Normal Number Theorem, first formulated by Borel, not only formalizes this intuition but goes a step further: for almost every number in , each digit appears equally likely. Below we discuss this theorem in detail.

We first recall a few basic facts about the representation of a number. Rather than working entirely in base , we work with an arbitrary base .

Let with . We say that has a base- representation if there is some such that the series converges to , with .

In other words, is the integer component of and the fractional component. With , our familiar decimal representation, we know that every real number has a decimal representation, and by identifying as , the representation is unique. Analogously, for every , every real number has a base- representation as well.

(Normal numbers for base b) Let with . We say that is normal with respect to base and digit if , and is normal with respect to base if for every , it is normal with respect to base and digit .

Note that in the definition above, , the digits in the integer component of , are completely ignored. The reason is that for every , its integer component is bounded, so in the above definition is fixed and does not affect the limiting value of digit frequencies.

We say that is normal if for every , is normal with respect to .

We now state the Normal Number Theorem formally:

There is a null set such that every , is normal.
The theorem is non-constructive — while one is almost guaranteed to find a normal number, it does not construct specific normal numbers. Furthermore, it is not known if famous irrational numbers such as and are normal.
The power of the theorem lies in aggregation by 1) non-constructively examining all numbers at once instead of a individual number and 2) considering all digit precision instead of a specific precision. For instance, the frequency of the leading digit often obeys Benford’s Law, where the digits and in a decimal representation tend to appear more frequently than and .

Let us first consider a truncated version of the theorem. Consider tossing a fair coin independently times. From a combinatorial argument, most sequences of coin tosses have about heads and tails. By interpreting a head as and a tail as , most -bit numbers are normal. The passage from bits to infinite bits is the crux of the theorem, since the combinatorial process of selecting a random -bit number no longer extends. The proof illustrates how a measure-theoretic abstraction can circumvent this difficulty. In particular, we will apply the following “-trick” repeatedly.

Let be any probability space. Let be a countable collection of events with each . Then .
Let . By assumption, can be bounded above by . Then

Since was arbitrary, must equal .

(Normal Number Theorem)
First consider the interval , where . By the previous proposition, it suffices to show that the set of non-normal numbers in is null. Now fix . By the proposition again, it suffices to show that the set of numbers in that are not normal with respect to is null. Now fix . By the proposition again, it suffices to show that the set of numbers in that are not normal with respect to base and digit is null. By definition, for , is normal with respect to base and digit iff is normal with respect to base and digit . Thus, without loss of generality, we may further assume that .

Now consider the probability space (, where is the set of Borel sets on , and the Lebesgue (probability) measure. For , define to be , where is the -th digit of the base expansion .

is a sequence of i.i.d. Bernoulli random variable with probability .
(of Claim)
As defined, is an indicator function, and for , the set is just a union of intervals:

which is a Borel set, implying that is a Bernoulli random variable with .

To show independence, it suffices to show that for every , the family of random variables is independent. This follows directly from the observation that (since the interval has length ) and the previous observation that .

Given our setup, we now apply the Strong Law of Large Numbers to conclude: there exists a null set so that for every , converges to , i.e., is normal with respect to base and digit .

As a corollary, we conclude that for every continuous random variable, its output is most likely normal.

Let be a probability space, and a continuous random variable with a probability density function. Then , where is the set of non-normal numbers.
By the Normal Number Theorem, the set of non-normal numbers, denoted , is null, so for every , there is some open set with . Let be the probability density function of , then

finishing the proof.

A continuous random variable having a probability density function is equivalent to the cumulative distribution function of being absolutely continuous. (And recall the Cantor function as an example where may not even have a probability density function!) Alternatively, the above corollary also follows by observing that can be approximated by a finite number of small intervals in (e.g., Littlewood’s First Principle as discussed in a previous blog post), which being absolutely continuous must map into another null set.

# Littlewood’s Three Principles

In measure theory, the notion of measurability restricts sets and functions so that limit operations are sensible. With non-measurabilty manifesting only through the Axiom of Choice, measurability is a semantically rich class, and in particular, Littlewood’s Three Principles specify precisely how measurability relates back to more elementary building blocks as follows:

1. Every measurable set is nearly open or closed.
2. Every measurable function is nearly continuous.
3. Every convergent sequence of measurable functions is nearly uniformly convergent.

Here in this post, we flesh out these principles in detail, with an emphasis on how these various concepts approximate one another. Each section corresponds to one principle and is independent from another. We primarily focus on the Lebesgue measure on Euclidean space before discussing the abstract generalization. While there are many equivalent definitions of measurability, for simplicity we will only use these definitions in this discussion:

A set is measurable if for every , there is some open set containing such that , where is the Lebesgue measure.
A function is measurable if for every open interval , the preimage of under , , is a measurable set.

We will use these facts about the Lebesgue measure throughout.

Let be the Lebesgue measure on . Let be a collection of measurable sets.
1. (Monotonicity) If , then .
3. (Additivity) If are disjoint, .
4. (Upward monotone convergence) If , .
5. (Downward monotone convergence) If with for some , then .

## First Principle

By definition, every measurable set is already arbitrary close in “volume” to an open set. Nonetheless, we can say a lot more about its structure: in , a measurable set can be approximated by a disjoint collection of cubes. In the visual setting where , a measurable set can be approximated by a “pixelation” of squares. We first define the notion of cubes.

We say that is a closed cube of length left-aligned at if .

The structure of measurable sets actually inherit from the structure of open sets, as stated in the following proposition:

Let be open. Then is a countable union of closed cubes, and every pair of these cubes is disjoint except at the boundary.

We first state and prove the First Principle before proving the proposition.

Let be a measurable set with finite measure. For every , there exist disjoint cubes such that , where .
(of theorem) Fix . It suffices to show that there exist disjoint cubes such that and . Since is measurable, there is an open set containing such that . In particular, since . From the above proposition, , where is a closed cube and pairwise disjoint except at the boundary. For each closed cube , there exists a cube such that , and is pairwise disjoint. Now, by the upward monotone convergence of measure, there is some such that . Let . Then by the monotonicity of . For the other case,

where the lines follow from the monotonicity of , additivity of , disjointness of , construction of , sub-additivity of , and construction of , respectively.

(of proposition) Note that can be divided into cubes of length :

with . Let be an open set in , and define

In other words, contains all cubes of length for some integer that reside entirely within . Since is countable and is a subset of countable union of , is countable. By construction . For the other direction, since is open, for every , there is an open ball centered at of radius , , that is contained inside . By Pythagorean Theorem, there exists a cube with length centered at that is contained inside . For every , let . By maximality, , with and at a distance less than from . Thus, is inside the cube of length left-aligned at , implying . Lastly, to see that cubes are almost disjoint except at the boundary, note we may modify so that if a cube is selected from , then we can remove its sub-cubes from for every . By this modification, a selected cube in is never contained in another.

The First Principle can also be extended into any abstract measure space , where is the underlying set, is the set of measurable subsets of , and is the measure for . In the concrete setting, let be a bounded subset of , the set of all measurable sets contained in , and the Lebesgue measure. The First Principle effectively states that every measurable set is equivalent (up to a difference of measure zero) to a countable union of closed sets, and a direct generalization of what we have just proved is that every bounded element of the -algebra “generated” from closed sets can be approximated by closed sets themselves. More precisely,

Let be a measure space with . Suppose is an algebra. Then for every , , where is the intersection of all -algebra containing , there exists such that .
Fix and let . Since contains , it suffices to show that is itself a -algebra. It is easy to see that is closed under complement since is. For countable union, for every , let , . By definition, for every , there is some such that , where will be chosen later. By upward monotone convergence, there exists some such that . Write and , and note that is in . Then

of , and our choice of , respectively. Similarly,

Thus, 3 and setting finishes the proof.

## Second Principle

(Lusin’s Theorem) Let be a finite, measurable function. For every , there exists such that and , the restriction of to , is continuous.

As an example, consider the indicator function where is the set of rationals. When restricted to the domain of , the function becomes identically zero. However, the function remains discontinuous at the points in . Thus, the continuity asserted in Lusin’s Theorem applies only to the restriction of the original function.

Before proving Lusin’s Theorem, we first prove that the continuous functions are dense in absolutely integrable functions.

Let be a measurable function. Suppose . Then for every , there exists a continuous function such that .
We proceed in stages, where in each stage is restricted to an elementary, but more general class of functions than the previous stage. Note that in all cases, the hypothesis that is measurable and always applies.

Case 1: , where is a cube in (as defined in the First Principle).
When , is an interval, and has two points of discontinuity at the endpoints of this interval. A continuous function can be defined so that it agrees with everywhere, except when near each endpoint, is a nearly vertical line dropping from to . The case for larger can be similarly constructed.

Case 2: , where is a finite collection of cubes.
By Case 1, each has a continuous function such that . is continuous, and by triangle inequality, .

Case 3: , where is a measurable set.
From the First Principle, can be approximated by a finite union of disjoint cubes, and the result follows by Case 2 and applying triangle inequality.

Case 4: is simple, i.e., , where is a finite collection of measurable sets.
Follows from a similar argument as in Case 2.

Case 5: .
By definition, . Since , there exists some simple with . Result follows from Case 4 and triangle inequality.

Case 6: General case.
Write , where and . Then Case 5 applies to and , and the general case follows from Case 5 and triangle inequality.

(Lusin’s Theorem) We proceed in stages by proving the statement for progressively more general classes of functions. In all casees, the hypothesis that is measurable and finite always applies.

Case 1: .
Fix . By the previous proposition, for every positive integer , (to be specified), there is a continuous function such that . Define

By Markov’s Inequality, . Then , which is at most if we specify . Let . Then by construction, converges to uniformly inside . The restriction of to remains continuous, and since continuity is perserved under uniform limit, is continuous as well.

Case 2: is a bounded function.
Consider the “vertical” truncation of : . Since is bounded, is bounded and . By Case 1, there exists some such that , with continuous when restricted to . Let , then , thus every vertical truncation is continuous when restricted to . Since continuity is a local property, for every , must agree with some on a neighborhood of , so is continuous on as well.

Case 3: General case.
Consider the “horizontal” truncation of : . Since is bounded, by Case 2, there is some such that , with continuous when restricted to . Similarly as in Case 2, we define , with , and every truncation is continuous when restricted to . Since is finite, for every , must agree for some on a neighborhood of and is continuous on as well.

One may extend Lusin’s Theorem to an abstract measure space , where is Radon so that its measure is also compatibile with its topology so that continuity can be defined appropriately.

## Third Principle

(Egorov’s Theorem) , with . Let be a sequence of measurable functions with converging to pointwise as . Then for every , there exists a measurable subset with , where converging to uniformly within .
Egorov’s Theorem also implies Lusin’s Theorem, the Second Principle. If is measurable, by an alternate definition of measurability, it is also the pointwise limit of simple functions. These simple functions are also approximated by continuous functions. By Egorov’s Theorem, these continuous functions uniformly converge to except on a small subset . Since continuity is preserved under uniform convergence, is continuous on a domain without .
By definition, converges to uniformly on iff for every , for every positive integer , there is some positive integer with . Let

We see that converges to uniformly on , where will be chosen later. First note that is measurable: the limit of measurable functions is measurable, so and thus is meaurable, implying that , the pre-image of the open interval under , is measurable. Since is a countable intersection of , is also measurable. Second, note that is an increasing sequence, and its complement is a decreasing sequence. Since converges to , as . Since , by the downward monotone convergence of measure, as . Thus, there exists some such that .

Then

where the lines follow by the definition of , sub-additivity of measure, construction of , and geometric series, respectively.

It is easy to extend the above setting directly to an abstract one, with the proof following verbatim as before, except replacing the Lebesgue measure by an arbitrary measure :

(Abstract Egorov’s Theorem) Let be a measure space with . Let be a sequence of measurable functions, with converging to pointwise as . Then for every , there exists a measurable subset with , where converges to uniformly within .
When the measure space is a probability space, the finiteness of measure is trivially true. Since random variables are measurable by definition, Egorov’s Theorem states that a sequence of converging random variables is then almost uniformly convergent, except on a subset with small probability.

Lastly, we can lift the restriction that the domain of has finite measure, but uniform convergence must still be local.

(Egorov’s Theorem II) Let be a sequence of measurable functions with converging to pointwise as . Then for every , there exists a measurable subset with , where converges to locally uniform within , i.e., for every bounded subset , converges to uniformly on .
We will simply describe how to modify the preceding proof. For every positive integer , let be the ball of radius of centered at the origin. Define . Then following the same argument applied to , there exists some such that . Then letting , has measure at most . By definition,

so any bounded must be inside for some , so , i.e., converges to uniformly on .

In both variations of Egorov’s Theorem, the boundedness condition is necessary. Otherwise, when , consider , which converges to the zero function pointwise, but can never converge uniformly to zero. One can see that the mass of this function “escapes” to horizontal infinity. This is exemplified in the proof above, where convergence is not uniform on : a sequence of points with can all reside in but remain outside of .