Category Archives: analysis

Normal numbers

Imagine if we sample numbers from the unit interval [0,1] and count the occurrences of each digit in every number’s infinite decimal representation. For instance, in the number 0.0123456789, every non-zero digit appears exactly once, but 0 appears infinitely often (trailing zeros). On the other hand, for the number 0.\overline{0123456789}, the digit occurrences (after the “dot”) is equal for every digit. As the number of samples increases, one may expect that the number of occurrences for each digit in \{0,1,\ldots,9\} to be close to one another. In fact, the Normal Number Theorem, first formulated by Borel, not only formalizes this intuition but goes a step further: for almost every number in [0,1], each digit appears equally likely. Below we discuss this theorem in detail.

We first recall a few basic facts about the representation of a number. Rather than working entirely in base 10, we work with an arbitrary base b\geq 2.

Let b\in\mathbb{N} with b\geq2. We say that x has a base-b representation if there is some M\in\mathbb{N} such that the series \sum_{m=0}^{M}z_{m}b^{m}+\sum_{n=1}^{\infty}x_{n}b^{-n} converges to x, with z_{m},x_{n}\in\mathbb{Z}_{b}.

In other words, \sum_{m=0}^{M}x_{m}b^{m} is the integer component of x and \sum_{n=1}^{\infty}x_{n}b^{-n} the fractional component. With b=10, our familiar decimal representation, we know that every real number has a decimal representation, and by identifying 0.999... as 1, the representation is unique. Analogously, for every b\geq2, every real number has a base-b representation as well.

(Normal numbers for base b) Let x\in\mathbb{R} with x=\sum_{m=1}^{M}z_{m}b^{m}+\sum_{n=1}^{\infty}x_{n}b^{-n}. We say that x is normal with respect to base b and digit d if \lim_{N\rightarrow\infty}\frac{1}{N}\#\{n:x_{n}=d,1\leq n\leq N\}=\frac{1}{b}, and x\in\mathbb{R} is normal with respect to base b if for every d\in\mathbb{Z}_{b}, it is normal with respect to base b and digit d.

Note that in the definition above, z_{m}, the digits in the integer component of x, are completely ignored. The reason is that for every x\in\mathbb{R}, its integer component is bounded, so M in the above definition is fixed and does not affect the limiting value of digit frequencies.

We say that x\in\mathbb{R} is normal if for every b\geq2, x is normal with respect to b.

We now state the Normal Number Theorem formally:

There is a null set E\subset\mathbb{R} such that every x\notin E, x is normal.
The theorem is non-constructive — while one is almost guaranteed to find a normal number, it does not construct specific normal numbers. Furthermore, it is not known if famous irrational numbers such as \pi and e are normal.
The power of the theorem lies in aggregation by 1) non-constructively examining all numbers at once instead of a individual number and 2) considering all digit precision instead of a specific precision. For instance, the frequency of the leading digit often obeys Benford’s Law, where the digits 1 and 2 in a decimal representation tend to appear more frequently than 8 and 9.

Let us first consider a truncated version of the theorem. Consider tossing a fair coin independently N times. From a combinatorial argument, most sequences of coin tosses have about N/2 heads and N/2 tails. By interpreting a head as 1 and a tail as 0, most N-bit numbers are normal. The passage from N bits to infinite bits is the crux of the theorem, since the combinatorial process of selecting a random N-bit number no longer extends. The proof illustrates how a measure-theoretic abstraction can circumvent this difficulty. In particular, we will apply the following “\epsilon2^{-n}-trick” repeatedly.

Let (\Omega,\mathcal{F},P) be any probability space. Let \{E_{n}\}_{n\in\mathbb{N}} be a countable collection of events E_{n}\in\mathcal{F} with each P(E_{n})=0. Then P(\cup_{n\in\mathbb{N}}E_{n})=0.
Let \eps>0. By assumption, P(E_{n}) can be bounded above by \eps2^{-n}. Then

    \begin{eqnarray*}   P(\cup_{n\in\mathbb{N}}E_{n}) & \leq & \sum_{n=1}^{\infty}P(E_{n})\\                                 & \leq & \sum_{n=1}^{\infty}\eps2^{-n}\\                                 & \leq & \eps. \end{eqnarray*}

Since \eps was arbitrary, P(\cup_{n\in\mathbb{N}}E_{n}) must equal 0.

(Normal Number Theorem)
First consider the interval [R,R+1], where R\in\mathbb{N}. By the previous proposition, it suffices to show that the set of non-normal numbers in [R,R+1] is null. Now fix b\geq2. By the proposition again, it suffices to show that the set of numbers in [R,R+1] that are not normal with respect to b is null. Now fix d\in\mathbb{Z}_{b}. By the proposition again, it suffices to show that the set of numbers in [R,R+1] that are not normal with respect to base b and digit d is null. By definition, for x\in[R,R+1], x is normal with respect to base b and digit d iff x-R is normal with respect to base b and digit d. Thus, without loss of generality, we may further assume that R=0.

Now consider the probability space ([0,1],\mathcal{B},\mathbb{P}), where \mathcal{B} is the set of Borel sets on [0,1], and \mathbb{P} the Lebesgue (probability) measure. For n\in\mathbb{N}, define X_{n}:[0,1]\rightarrow\{0,1\} to be X_{n}(x)=\mathbb{I}_{\{x_{n}=d\}}, where x_{n} is the n-th digit of the base b expansion x=\sum_{n=1}^{\infty}x_{n}b^{-n}.

\{X_{n}\}_{n\in\mathbb{N}} is a sequence of i.i.d. Bernoulli random variable with probability \frac{1}{b}.
(of Claim)
As defined, X_{n} is an indicator function, and for x_{n}\in\{0,1\}, the set \{x:X_{n}(x)=x_{n}\} is just a union of intervals:

    \[ \bigcup_{x_{1}\ldots,x_{n-1}\in\mathbb{Z}_{b}}\left[\sum_{i=1}^{n}x_{i}b^{-i},\sum_{i=1}^{n}x_{i}b^{-i}+b^{-n}\right), \]

which is a Borel set, implying that X_{n} is a Bernoulli random variable with \mathbb{P}(X_{n}=x_{n})=b^{n-1}\cdot b^{-n}=\frac{1}{b}.

To show independence, it suffices to show that for every N\in\mathbb{N}, the family of random variables \{X_{n}\}_{n=1}^{N} is independent. This follows directly from the observation that \mathbb{P}(X_{1}=x_{1},\ldots,X_{N}=x_{N})=b^{-N} (since the interval [\sum_{n=1}^{N}x_{n}b^{-n},\sum_{n=1}^{N}x_{n}b^{-n}+b^{-N}) has length \frac{1}{b^{N}}) and the previous observation that \mathbb{P}(X_{n}=x_{n})=\frac{1}{b}.

Given our setup, we now apply the Strong Law of Large Numbers to conclude: there exists a null set E so that for every x\not\notin E, \lim_{N\rightarrow\infty}\frac{1}{N}\sum_{n=1}^{N}X_{n}(x) converges to E[X_{1}(x)]=\frac{1}{b}, i.e., x is normal with respect to base b and digit d.

As a corollary, we conclude that for every continuous random variable, its output is most likely normal.

Let (\Omega,\mathcal{F},P) be a probability space, and X:\Omega\rightarrow \mathbb{R} a continuous random variable with a probability density function. Then P(X\in E)=0, where E is the set of non-normal numbers.
By the Normal Number Theorem, the set of non-normal numbers, denoted E, is null, so for every n, there is some open set V_{n}\subset\mathbb{R} with P(V_{n})\leq1/n. Let f_X be the probability density function of X, then

    \begin{eqnarray*} P(X\in E) & \leq & P(X\in V_{n})\\ & = & \int f_{X}(x)\mathbb{I}_{V_{n}}(x)dx. \end{eqnarray*}

By the Dominated Convergence Theorem,

    \[ \lim_{n\rightarrow\infty}\int f_{X}(x)\mathbb{I}_{V_{n}}(x)dx=\int f_{X}(x)\lim_{n\rightarrow\infty}\mathbb{I}_{V_{n}}(x)dx=0, \]

finishing the proof.

A continuous random variable X having a probability density function is equivalent to the cumulative distribution function F_{X} of X being absolutely continuous. (And recall the Cantor function as an example where X may not even have a probability density function!) Alternatively, the above corollary also follows by observing that E can be approximated by a finite number of small intervals in \mathbb{R} (e.g., Littlewood’s First Principle as discussed in a previous blog post), which F_{X} being absolutely continuous must map into another null set.

Littlewood’s Three Principles

In measure theory, the notion of measurability restricts sets and functions so that limit operations are sensible. With non-measurabilty manifesting only through the Axiom of Choice, measurability is a semantically rich class, and in particular, Littlewood’s Three Principles specify precisely how measurability relates back to more elementary building blocks as follows:

  1. Every measurable set is nearly open or closed.
  2. Every measurable function is nearly continuous.
  3. Every convergent sequence of measurable functions is nearly uniformly convergent.

Here in this post, we flesh out these principles in detail, with an emphasis on how these various concepts approximate one another. Each section corresponds to one principle and is independent from another. We primarily focus on the Lebesgue measure m on Euclidean space before discussing the abstract generalization. While there are many equivalent definitions of measurability, for simplicity we will only use these definitions in this discussion:

A set E\subset\mathbb{R}^{d} is measurable if for every \eps>0, there is some open set V\subset\mathbb{R}^{d} containing E such that m(V\backslash E)\leq\eps, where m is the Lebesgue measure.
A function f:\mathbb{R}^{d}\rightarrow\mathbb{R} is measurable if for every open interval I\subset\mathbb{R}, the preimage of I under f, f^{-1}(I), is a measurable set.

We will use these facts about the Lebesgue measure throughout.

Let m be the Lebesgue measure on \mathbb{R}^d. Let \{E_n\}_{n\in \mathbb{N}} be a collection of measurable sets.
1. (Monotonicity) If E_1 \subset E_2, then m(E_1) \leq m(E_2).
2. (Sub-additivity) m(\bigcup_n E_n) \leq \sum_n m(E_n).
3. (Additivity) If E_n are disjoint, m(\bigcup_n E_n) = \sum_n m(E_n).
4. (Upward monotone convergence) If E_1 \subset E_2 \subset \ldots, m(\bigcup_n E_n) = \lim_{n\rightarrow\infty} m(E_n).
5. (Downward monotone convergence) If E_1 \supset E_2 \supset \ldots with m(E_n)<\infty for some n, then m(\bigcap_n E_n) = \lim_{n\rightarrow\infty} m(E_n).

First Principle

By definition, every measurable set is already arbitrary close in “volume” to an open set. Nonetheless, we can say a lot more about its structure: in \mathbb{R}^{d}, a measurable set can be approximated by a disjoint collection of cubes. In the visual setting where d=2, a measurable set can be approximated by a “pixelation” of squares. We first define the notion of cubes.

We say that Q\subset\mathbb{R}^{d} is a closed cube of length \ell left-aligned at (a_{1},\ldots,a_{d})\in\mathbb{R}^{d} if Q=\Pi_{i=1}^{d}[a_{i},a_{i}+\ell].

The structure of measurable sets actually inherit from the structure of open sets, as stated in the following proposition:

Let V\subset\mathbb{R}^{d} be open. Then V is a countable union of closed cubes, and every pair of these cubes is disjoint except at the boundary.

We first state and prove the First Principle before proving the proposition.

Let E\subset\mathbb{R}^{d} be a measurable set with finite measure. For every \eps>0, there exist N disjoint cubes Q_{1},\ldots,Q_{N} such that m(E\bigtriangleup Q)\leq\eps, where Q=\cup_{n=1}^{N}Q_{n}.
(of theorem) Fix \eps>0. It suffices to show that there exist N disjoint cubes Q_{1},\ldots,Q_{N} such that m(E\backslash\bigcup_{n=1}^{N}Q_{n})\leq\eps/2 and m(\bigcup_{n=1}^{N}Q_{n}\backslash E)\leq\eps/2. Since E is measurable, there is an open set V containing E such that m(V\backslash E)\leq\eps/2. In particular, m(V)<\infty since m(E)<\infty. From the above proposition, V=\bigcup_{n}Q'_{n}, where Q'_{n} is a closed cube and pairwise disjoint except at the boundary. For each closed cube Q'_{n}, there exists a cube Q_{n}\subsetneq Q'_{n} such that m(Q_{n})\geq m(Q'_{n})-\frac{\eps}{4}\cdot 2^{-n}, and Q_{n} is pairwise disjoint. Now, by the upward monotone convergence of measure, there is some N such that m(\bigcup_{n=1}^{N}Q'_{n})\geq m(V)-\eps/4. Let Q=\bigcup_{n=1}^{N}Q_{n}. Then m(Q\backslash E)\leq m(V\backslash E)\leq\eps/2 by the monotonicity of m. For the other case,

    \begin{align*} m(E\backslash Q) & \leq m(V\backslash Q)\\  & = m(V)-m(Q)\\  & = m(V)-\sum_{n=1}^{N}m(Q_{n})\\  & \leq m(V)-\sum_{n=1}^{N}m(Q'_{n})+\frac{\eps}{4\cdot2^{n}}\\  & \leq m(V)-\sum_{n=1}^{N}m(\bigcup_{n=1}^{N}Q'_{n})+\frac{\eps}{4}\\  & \leq \eps/2, \end{align*}

where the lines follow from the monotonicity of m, additivity of m, disjointness of Q_{n}, construction of Q_{n}, sub-additivity of m, and construction of Q'_{n}, respectively.

(of proposition) Note that \mathbb{R}^{d} can be divided into cubes of length 2^{-n}:

    \[ G_{n}=\{ \Pi_{i=1}^{d}[a_{i}2^{-n},(a_{i}+1)2^{-n}]:a_{1},\ldots,a_{d}\in\mathbb{Z}\}, \]

with \mathbb{R}^{d}=\bigcup_{B\in G_{n}}B. Let V be an open set in \mathbb{R}^{d}, and define

    \[ Q=\bigcup_{n\in\mathbb{N}}\{B\in G_{n}:B\subset V\}. \]

In other words, Q contains all cubes of length 2^{-n} for some integer n that reside entirely within V. Since G_n is countable and Q is a subset of countable union of G_n, Q is countable. By construction Q\subset V. For the other direction, since V is open, for every v\in V, there is an open ball centered at v of radius r, B(v,r), that is contained inside V. By Pythagorean Theorem, there exists a cube with length 2n centered at v that is contained inside B(v,r). For every i, let a_{i}=\max_{z\in\mathbb{Z}} \{z\cdot2^{-n}:z2^{-n}\leq v_i\}. By maximality, v_{i}\in[a_{i}, a_{i}+2^{-n}], with a_{i} and a_{i}+2^{-n} at a distance less than r from v_i. Thus, v is inside the cube of length n left-aligned at (a_{1},\ldots,a_{d}), implying v\in Q. Lastly, to see that cubes are almost disjoint except at the boundary, note we may modify Q so that if a cube is selected from G_{n}, then we can remove its sub-cubes from G_{m} for every m>n. By this modification, a selected cube in Q is never contained in another.

The First Principle can also be extended into any abstract measure space (X,\mathcal{B},\mu), where X is the underlying set, \mathcal{B} is the set of measurable subsets of X, and \mu is the measure for (X,\mathcal{B}). In the concrete setting, let X be a bounded subset of \mathbb{R}^{d}, \mathcal{B} the set of all measurable sets contained in X, and \mu the Lebesgue measure. The First Principle effectively states that every measurable set is equivalent (up to a difference of measure zero) to a countable union of closed sets, and a direct generalization of what we have just proved is that every bounded element of the \sigma-algebra “generated” from closed sets can be approximated by closed sets themselves. More precisely,

Let (X,\mathcal{B},\mu) be a measure space with \mu(X)<\infty. Suppose \mathcal{A\subset B} is an algebra. Then for every \eps>0, E\in\sigma(\mathcal{A}), where \sigma(\mathcal{A}) is the intersection of all \sigma-algebra containing \mathcal{A}, there exists F\in\mathcal{A} such that \mu(E\triangle F)\leq\eps.
Fix \eps>0 and let \mathcal{P}=\{E\subset X:\exists F\in\mathcal{A}\text{ with }\mu(E\triangle F)\leq\eps\}. Since \mathcal{P} contains \mathcal{A}, it suffices to show that \mathcal{P} is itself a \sigma-algebra. It is easy to see that \mathcal{P} is closed under complement since \mathcal{A} is. For countable union, for every n\in\mathbb{N}, let E_{n}\in\mathcal{P}, E=\bigcup_{n}E_{n}. By definition, for every n\in\mathbb{N}, there is some F_{n}\in\mathcal{A} such that \mu(E_{n}\triangle F_{n})\leq\eps'\cdot2^{-n}, where \eps' will be chosen later. By upward monotone convergence, there exists some N\in\mathbb{N} such that \mu(\bigcup_{n=1}^{N}E_{n})\geq\mu(E)-\eps'. Write E^{N}=\bigcup_{n=1}^{N}E_{n} and F^{N}=\bigcup_{n=1}^{N}F_{n}, and note that F^{N} is in \mathcal{A}. Then

    \begin{align*} \mu(F^{N}\backslash E) & \leq\mu(F^{N}\backslash E^{N})\\  & \leq\sum_{n=1}^{N}\mu(F_{n}\backslash E_{n})\\  & \leq\eps', \end{align*}

where the inequalities follow from the monotocity of \mu, sub-additivity
of \mu, and our choice of F_{n}, respectively. Similarly,

    \begin{align*} \mu(E\backslash F^{N}) & =\mu(E^{N}\backslash F^{N})+\mu((E\backslash E^{N})\backslash F^{N})\\  & \leq\sum_{n=1}^{N}\mu(E_{n}\backslash F_{n})+\mu(E\backslash E^{N})\\  & \leq\eps'+\eps'. \end{align*}

Thus, \mu(E\triangle F^{N})\leq 3\eps', and setting \eps'=\eps/3 finishes the proof.

Second Principle

(Lusin’s Theorem) Let f:\mathbb{R}^{d}\rightarrow\mathbb{R} be a finite, measurable function. For every \eps>0, there exists A\subset\mathbb{R}^{d} such that m(\mathbb{R}^{d}\backslash A)<\eps and f_{|A}, the restriction of f to A, is continuous.

As an example, consider the indicator function 1_{\mathbb{Q}}:[0,1]\rightarrow\{0,1\} where \mathbb{Q} is the set of rationals. When restricted to the domain of [0,1]\backslash\mathbb{Q}, the function becomes identically zero. However, the function remains discontinuous at the points in [0,1]\backslash\mathbb{Q}. Thus, the continuity asserted in Lusin’s Theorem applies only to the restriction of the original function.

Before proving Lusin’s Theorem, we first prove that the continuous functions are dense in absolutely integrable functions.

Let f:\mathbb{R}^{d}\rightarrow\mathbb{R} be a measurable function. Suppose \int|f|<\infty. Then for every \eps>0, there exists a continuous function g such that \int|f-g|\leq\eps.
We proceed in stages, where in each stage f is restricted to an elementary, but more general class of functions than the previous stage. Note that in all cases, the hypothesis that f is measurable and \int |f| < \infty always applies.

Case 1: f=1_{Q}, where Q is a cube in \mathbb{R}^{d} (as defined in the First Principle).
When d=1, Q is an interval, and f has two points of discontinuity at the endpoints of this interval. A continuous function g can be defined so that it agrees with f everywhere, except when near each endpoint, g is a nearly vertical line dropping from 1 to 0. The case for larger d can be similarly constructed.

Case 2: f=\sum_{Q\in\mathcal{C}}1_{Q}, where \mathcal{C} is a finite collection of cubes.
By Case 1, each 1_{Q} has a continuous function g_{Q} such that \int|1_{Q}-g_{Q}|\leq\eps/|\mathcal{C}|. \sum_{Q\in\mathcal{C}}g_{Q} is continuous, and by triangle inequality, \int|f-\sum_{Q\in\mathcal{C}}g_{Q}|\leq\eps.

Case 3: f=1_{E}, where E is a measurable set.
From the First Principle, E can be approximated by a finite union of disjoint cubes, and the result follows by Case 2 and applying triangle inequality.

Case 4: f is simple, i.e., f=\sum_{E\in\mathcal{E}}1_{E}, where \mathcal{E} is a finite collection of measurable sets.
Follows from a similar argument as in Case 2.

Case 5: f:\mathbb{R}^{d}\rightarrow[0,\infty).
By definition, \int f=\sup_{g\leq f,g\text{ simple}}\int g. Since \int f<\infty, there exists some simple g\leq f with \int f-g\leq O(\eps). Result follows from Case 4 and triangle inequality.

Case 6: General case.
Write f=f_{+}-f_{-}, where f_{+}=\max(f,0) and f_{-}=\min(f,0). Then Case 5 applies to f_{+} and f_{-}, and the general case follows from Case 5 and triangle inequality.

(Lusin’s Theorem) We proceed in stages by proving the statement for progressively more general classes of functions. In all casees, the hypothesis that f is measurable and finite always applies.

Case 1: \int|f|<\infty.
Fix \eps>0. By the previous proposition, for every positive integer n, \eps_{n}>0 (to be specified), there is a continuous function g_{n} such that \int|f-g_{n}|\leq\eps_{n}. Define

    \[ E_{n}=\{x\in\mathbb{R}^{d}:|f(x)-g_{n}(x)|\geq1/n\}. \]

By Markov’s Inequality, m(E_{n})\leq n\eps_{n}. Then m(\bigcup_{n}E_{n})\leq\sum_{n}n\eps_{n}, which is at most \eps if we specify \eps_{n}:=\frac{\eps}{n2^{n}}. Let A:=\mathbb{R}^{d}\backslash\bigcup_{n}E_{n}. Then by construction, g_{n} converges to f uniformly inside A. The restriction of g_{n} to A remains continuous, and since continuity is perserved under uniform limit, f_{|A} is continuous as well.

Case 2: f is a bounded function.
Consider the “vertical” truncation of f: f_{N}(x):=f(x)1_{\{|x|\leq N\}}. Since f is bounded, f_{N} is bounded and \int|f_{N}|<\infty. By Case 1, there exists some A_{N} such that m(\mathbb{R}^{d}\backslash A_{N})\leq\eps2^{-N}, with f_{N} continuous when restricted to A_{N}. Let A=\bigcap_{N}A_{N}, then m(\mathbb{R}^{d}\backslash A)\leq\eps, thus every vertical truncation f_{N} is continuous when restricted to A. Since continuity is a local property, for every x\in A, f must agree with some f_{N} on a neighborhood of x, so f is continuous on A as well.

Case 3: General case.
Consider the “horizontal” truncation of f: f^{(N)}(x):=f(x)1_{\{|f(x)|\leq N\}}. Since f^{(N)} is bounded, by Case 2, there is some A_{N} such that m(\mathbb{R}^{d}\backslash A_{N})\leq\eps2^{-N}, with f^{(N)} continuous when restricted to A_{N}. Similarly as in Case 2, we define A=\bigcap_{N}A_{N}, with m(\mathbb{R}^{d}\backslash A)\leq\eps, and every truncation f_{N} is continuous when restricted to A. Since f is finite, for every x\in A, f must agree f^{(N)} for some N on a neighborhood of x and is continuous on A as well.

One may extend Lusin’s Theorem to an abstract measure space X, where X is Radon so that its measure is also compatibile with its topology so that continuity can be defined appropriately.

Third Principle

(Egorov’s Theorem) X\subset\mathbb{R}^{d}, with m(X)<\infty. Let f_{n}:X\rightarrow\mathbb{R} be a sequence of measurable functions with f_{n} converging to f pointwise as n\rightarrow\infty. Then for every \eps>0, there exists a measurable subset A\subset X with m(X\backslash A)\leq\eps, where f_{n} converging to f uniformly within A.
Egorov’s Theorem also implies Lusin’s Theorem, the Second Principle. If f is measurable, by an alternate definition of measurability, it is also the pointwise limit of simple functions. These simple functions are also approximated by continuous functions. By Egorov’s Theorem, these continuous functions uniformly converge to f except on a small subset E. Since continuity is preserved under uniform convergence, f is continuous on a domain without E.
By definition, f_{n} converges to f uniformly on A iff for every x\in A, for every positive integer m, there is some positive integer N_{m} with |f_{n}(x)-f(x)|<1/m. Let

    \[ A_{m,N}=\{x\in X:\forall n\geq N\big|f_{n}(x)-f(x)\big|<1/m\}. \]

We see that f_{n} converges to f uniformly on A:=\bigcap_{m}A_{m,N_{m}}, where N_{m} will be chosen later. First note that A_{m,N} is measurable: the limit of measurable functions is measurable, so f and thus f-f_{n} is meaurable, implying that A_{m,N}, the pre-image of the open interval (-1/m,1/m) under f_{n}-f, is measurable. Since A is a countable intersection of A_{m,N_{m}}, A is also measurable. Second, note that A_{m,1}\subset A_{m,2}\subset\ldots is an increasing sequence, and its complement X\backslash A_{m,1}\supset X\backslash A_{m,2}\supset\ldots is a decreasing sequence. Since f_{n} converges to f, X\backslash A_{m,n}\rightarrow\emptyset as n\rightarrow\infty. Since m(X)<\infty, by the downward monotone convergence of measure, m(X\backslash A_{m,n})\rightarrow0 as n\rightarrow\infty. Thus, there exists some N_{m} such that m(X\backslash A_{m,N_{m}})<\eps\cdot2^{-m}.

Then

    \begin{align*} m(X\backslash A) & =m\left(\bigcup_{m}X\backslash A_{m,N_{m}}\right)\\  & \leq\sum_{m}m(X\backslash A_{m,N_{m}})\\  & \leq\sum_{m}\frac{\eps}{2^{m}}\\  & =\eps, \end{align*}

where the lines follow by the definition of A, sub-additivity of measure, construction of N_{m}, and geometric series, respectively.

It is easy to extend the above setting directly to an abstract one, with the proof following verbatim as before, except replacing the Lebesgue measure m by an arbitrary measure \mu:

(Abstract Egorov’s Theorem) Let (X,\mathcal{B},\mu) be a measure space with \mu(X)<\infty. Let f_{n}:X\rightarrow\mathbb{R} be a sequence of measurable functions, with f_{n} converging to f pointwise as n\rightarrow\infty. Then for every \eps>0, there exists a measurable subset A\subset X with m(X\backslash A)\leq\eps, where f_{n} converges to f uniformly within A.
When the measure space is a probability space, the finiteness of measure is trivially true. Since random variables are measurable by definition, Egorov’s Theorem states that a sequence of converging random variables is then almost uniformly convergent, except on a subset with small probability.

Lastly, we can lift the restriction that the domain of f_{n} has finite measure, but uniform convergence must still be local.

(Egorov’s Theorem II) Let f_{n}:\mathbb{R}^{d}\rightarrow\mathbb{R} be a sequence of measurable functions with f_{n} converging to f pointwise as n\rightarrow\infty. Then for every \eps>0, there exists a measurable subset A\subset X with m(X\backslash A)\leq\eps, where f_{n} converges to f locally uniform within A, i.e., for every bounded subset B\subset A, f_{n} converges to f uniformly on B.
We will simply describe how to modify the preceding proof. For every positive integer m, let B_{m} be the ball of radius of m centered at the origin. Define E_{m,n}=B_{m}\bigcap(\mathbb{R}^{d}\backslash A_{m,n}). Then following the same argument applied to E_{m,n}, there exists some N_{m} such that m(E_{m,N_{m}})\leq\eps2^{-m}. Then letting E=\bigcup_{m}E_{m,N_{m}}, E has measure at most \eps. By definition,

    \[ \mathbb{R}^{d}\backslash E=\bigcap_{m}\left(A_{m,N_{m}}\bigcup\mathbb{R}^{d}\backslash B_{m}\right), \]

so any bounded B must be inside B_{m_{0}} for some m_{0}, so B\backslash E=B\bigcap\left(\bigcap_{m}A_{m,N_{m}}\right), i.e., f_n converges to f uniformly on B.

In both variations of Egorov’s Theorem, the boundedness condition is necessary. Otherwise, when d=1, consider f_{n}(x)=1_{\{x\in[n,n+1)\}}, which converges to the zero function pointwise, but can never converge uniformly to zero. One can see that the mass of this function “escapes” to horizontal infinity. This is exemplified in the proof above, where convergence is not uniform on \mathbb{R}^{d}\backslash E=\bigcap_{m}\left(A_{m,N_{m}}\bigcup\mathbb{R}^{d}\backslash B_{m}\right): a sequence of points x_{n} with |x_{n}|=n+1 can all reside in \mathbb{R}^{d}\backslash E but remain outside of \bigcap_{m}A_{m,N_{m}}.