Suppose in an election with two candidates, we are certain of the final tally. However, as the votes are counted sequentially, the lead may switch between the candidates during the process. This was certainly the case in the 2020 U.S. presidential election, where in several swing states, different types of votes (mail, early-voting, in-person) were counted at different stages. While the final tally does not change in a recount, the lead change may. One may ask: How likely will there be lead changes in a recount? Can the winning candidate always lead from start to finish? The Ballot Theorem provides a quantitative answer to this question. (Informal) Suppose candidates and each have […]

More## Normal numbers

Imagine if we sample numbers from the unit interval and count the occurrences of each digit in every number’s infinite decimal representation. For instance, in the number , every non-zero digit appears exactly once, but appears infinitely often (trailing zeros). On the other hand, for the number , the digit occurrences (after the “dot”) is equal for every digit. As the number of samples increases, one may expect that the number of occurrences for each digit in to be close to one another. In fact, the Normal Number Theorem, first formulated by Borel, not only formalizes this intuition but goes a step further: for almost every number in , each digit appears […]

More## Littlewood’s Three Principles

In measure theory, the notion of measurability restricts sets and functions so that limit operations are sensible. With non-measurabilty manifesting only through the Axiom of Choice, measurability is a semantically rich class, and in particular, Littlewood’s Three Principles specify precisely how measurability relates back to more elementary building blocks as follows: Every measurable set is nearly open or closed. Every measurable function is nearly continuous. Every convergent sequence of measurable functions is nearly uniformly convergent. Here in this post, we flesh out these principles in detail, with an emphasis on how these various concepts approximate one another. Each section corresponds to one principle and is independent from another. We primarily focus on […]

More## Dirichlet’s Theorem for arithmetic progressions with difference 4

From antiquity, Euclid’s Theorem asserts that the number of primes is infinite. Since all primes except are odd, a re-phrasement of Euclid’s theorem is that the set of odd integers contains an infinite number of primes. A natural generalization, proved by Dirichlet, is that every arithmetic progression of the form contains an infinite number of primes, provided that and are relatively prime. (Dirichlet’s Theorem) If , then the arithmetic progression contains infinitely many primes, where denotes the greatest common divisor of and . The proof is one of the first application of Fourier analysis on finite abelian groups. Here in this post, we prove Dirichlet’s theorem when , i.e., both the arithmetic […]

More## Exponential sum and the equidistribution theorem

Consider the sequence , where gives the fractional part of a real number. The equidistribution theorem states that is irrational iff for any sub-interval of the unit interval, for sufficiently large , roughly of the numbers fall inside . More precisely, (Equidistributed sequence) A sequence with is equidistributed mod 1 if for every , (Equidistribution theorem) The sequence is equidistributed mod 1 iff is irrational. Let us try to understand why the sequence behaves differently depending on whether is rational or not. If for some integers and , then roughly of the first multiples of are integers, i.e., . In fact, it is not difficult to see that for , […]

More## Summability, Tauberian theorems, and Fourier series

Consider a periodic function . Its Fourier series always exists formally, but questions of whether its Fourier series converges to itself can be rather subtle. In general, may be different from its Fourier series: a Fourier coefficient is the evaluation of an integral, so can be changed pointwise without affecting the value of any Fourier coefficient. It may be natural to believe that convergence holds when is continuous. However, an example constructed by du Bois-Reymond showed that continuity is not sufficient. The most general result in this area is Carleson’s theorem, showing that -functions have convergent Fourier series almost everywhere. Here we will prove that the Fourier series of converges to in […]

More## The Basel problem via Fourier series

The Basel problem asks to compute the summation of the reciprocal of the natural numbers squared, and it is known that While there are numerous known proofs (our presentation resembles Proof 9 most closely), we give a self-contained one that motivates the usage of Fourier analysis for periodic functions. Recall that a function is -periodic if for every , . From Fourier analysis, every such function can be expressed as a series of cosines and sines. More precisely, If is continuous and -periodic, then where , and is the -th Fourier coefficient equal to . We say that is the Fourier series of . When we define the […]

More## Auction algorithm for the assignment problem

The maximum bipartite matching is a well-known problem in graph theory, operations research, and economics. Also known by the name assignment problem, it models a marketplace with buyers and items, where every buyer has a valuation for each item, and we want to match (assign) each buyer to an item, with no item being shared. For social utility, an assignment that maximizes the total valuation is ideal. In graph theory, the Hungarian algorithm by Kuhn produces a matching in polynomial time maximizing the total weight of the edges. Independently, Bertsekas from operations research and Demange, Gale, and Sotomayor from the economics perspective both use an underlying auction to solve the same problem. […]

More## Data analysis workflow

Motivation In data analysis, a REPL (read-evaluate-print-loop) is great for asking questions quickly about a dataset. As opposed to writing code using a compiler, an interactive environment lets us hone in on the right questions, which is of great importance for exploratory work. A typical REPL session — e.g., using R, Pig, Hive, Python — involves loading the dataset and performing a combination of filtering, projection, and transformation. Finding the appropriate combination may require many iterations, such as tweaking the various parameters used in filtering. The process of repeating almost the same commands can be quite tedious, especially dealing with wraparound lines. Some potential workflows are: 1) Write script in a text […]

More## SVD and low-rank approximation

The singular value decomposition (SVD) factorizes a matrix where are unitary and is diagonal with nonnegative entries. Unlike the eigenvalue decomposition, every (real or complex) matrix is guaranteed to have a SVD. Geometrically, the SVD implies that every linear transformation is equivalent to a rotation, a scaling, then another rotation. In data analysis, the SVD can be computed to perform principal component analysis (PCA), where an ellipsoid is used to “fit” a data matrix. In other words, once the SVD is computed, a partial SVD, can be used to approximate the matrix, where and are the left and right singular vectors, and are the singular values in decreasing order. Equivalently, a partial […]

More