hi

# Exponential sum and the equidistribution theorem

Consider the sequence , where gives the fractional part of a real number. The equidistribution theorem states that is irrational iff for any sub-interval of the unit interval, for sufficiently large , roughly of the numbers fall inside . More precisely,

(Equidistributed sequence) A sequence with is equidistributed mod 1 if for every ,

(Equidistribution theorem) The sequence is equidistributed mod 1 iff is irrational.

Let us try to understand why the sequence behaves differently depending on whether is rational or not. If for some integers and , then roughly of the first multiples of are integers, i.e., . In fact, it is not difficult to see that for , fraction of the sequence equals . So equidistribution does not occur when is rational.

Now for , consider the exponential function , which has period . As discussed above, roughly of the sequence is , so the exponential sum contains an infinite number of ones. For example if , the exponential sum evaluates to , which is Grandi’s series and diverges. For an arbitrary , the exponential sum is a repeating sum of all -th roots of unity, and one can use the same technique showing that Grandi’s series diverges to establish that this exponential sum diverges too. However, the case when is irrational differs: every nonzero integer multiple of is an non-integer, and thus for any . Writing , the exponential sum converges to the geometric series , since .

In fact, the above intuition can be summarized into the following criterion due to Weyl:

(Weyl’s criterion) A sequence is equidistributed mod 1 iff for all integers ,

From the above discussion, Weyl’s criterion immediately implies the equidistribution theorem.

(of equidistribution theorem) Suppose is rational, i.e., for some integers and . Without loss of generality, we assume . Then for all integer , by Euclid’s algorithm, is in . So the sequence is not equidistributed, e.g., no multiples of fall inside .

Now suppose is irrational. Since , we have

which tends to as .

Before proving Weyl’s criterion, it is instructive to state the notion of equidistribution analytically. For , let iff . Note that is a periodic function on all of .

(Weyl’s criterion restated) Let be a sequence in . The following two statements are equivalent:
1) For ,

2) For all integers ,

Written in this form, Weyl’s criterion is asserting that if the sum of every interval function at a sequence converges to its integral, then every exponential sum converges to its integral as well, and vice versa. The reason is that both the space of interval functions and the space of exponential functions can approximate one another. More precisely, an exponential function is clearly periodic and continuous (and thus uniformly continuous), and every uniformly continuous function can be approximated by a finite sum of interval functions. For the other direction, a periodic, interval function can be approximated by a periodic, continuous function, which by Fejér’s Theorem can also be approximated by a finite sum of exponential functions. Now we have all the machinery to write down a proof.

(of Weyl’s criterion) For the forward direction (1) => (2), fix and . Since the exponential function continuous on the closed interval , it is also uniformly continuous, i.e., there exists some such that for every , . Let be a step function where if . By construction, is a finite sum of interval functions with .

By linearity, we have

By taking sufficiently large and repeatedly applying triangle inequality, we have

implying that converges to as .

Now we consider the reverse direction (2) => (1). Fix and let . Define

Note that is continuous, periodic, and dominates . By Fejér’s Theorem, there exists such that . By linearity, we have

By taking sufficiently large and repeatedly applying triangle inequality, we have

implying that

Since dominates , we have

One can similarly define a periodic, continuous function that approximates from below, showing that is bounded below by , which together with the proceeding argument will show that the limit of must exist and converge to .

# Summability, Tauberian theorems, and Fourier series

Consider a periodic function . Its Fourier series always exists formally, but questions of whether its Fourier series converges to itself can be rather subtle. In general, may be different from its Fourier series: a Fourier coefficient is the evaluation of an integral, so can be changed pointwise without affecting the value of any Fourier coefficient. It may be natural to believe that convergence holds when is continuous. However, an example constructed by du Bois-Reymond showed that continuity is not sufficient. The most general result in this area is Carleson’s theorem, showing that -functions have convergent Fourier series almost everywhere.

Here we will prove that the Fourier series of converges to in the sense of Cesàro (or Abel), and we will highlight through a Tauberian-type theorem that convergence holds when is “smooth”. In particular, the main theorem we will show is that for a periodic, continuous function, its Fourier series converges pointwise if its Fourier coefficients are decaying at a linear rate.

If and and is continuous at , then as .
The sufficient condition in the above theorem can be improved to , but we won’t prove the stronger statement here.

What types of functions are characterized by decaying Fourier coefficients?
A simple calculation with integration by parts shows that if is differentiable, then . Since by the Riemann-Lesbesgue lemma, we have the following:

If is differentiable, then its Fourier series converges pointwise to .

In general, convergence also holds for other types of smoothness conditions.

## Summability

We first take a detour (without the Fourier analysis setting) to investigate the different notions of summability for a sequence. The standard and familiar notion of summability captures the intuitive notion that as one adds more numbers from a sequence, the sum gets closer to its limit. More precisely,

A sequence is summable to if its sequence of partial sums where converges to .

Under this definition, the partial sums of Grandi’s series oscillate between and , and thus the series diverges. However, one could argue that this series “hovers” around since the partial sums oscillate between and . In fact, the average partial sums of the sequence converge to . This motivates the following alternate definition of summability:

A sequence is Cesàro-summable to if its sequence of Cesàro means converges to , where and is the -th partial sum .

It is instructive to rewrite the Cesàro mean as

We can see that the Cesàro mean is a weighted average of the ‘s, where the later terms contribute with a discount linear factor. With this insight, it is intuitive to conclude if a sequence is summable, then it is also Cesàro-summable: a summable sequence must have vanishing “tail”, implying that its Cesàro means form a Cauchy sequence and thus converge.

We can consider another definition of summability where the discount factor is exponential:

A sequence is Abel summable to if for every , converges and .

Intuitively, Abel summability ought to capture a larger class of sequences. In fact, one can summarize this entire section in the following:

If a sequence is summable to , then it is Cesàro-summable to . If a sequence is Cesàro-summable to , then it is also Abel-summable to .
Suppose first that is summable to . For every , there exists such that for every , . Let and . Then for every , we have

and when is sufficiently large, ,
implying that as .

Now suppose that converges to . Define . Then we can write

Since , using a similar calculation as above, we can write

Since converges (and is thus bounded), converges for every and so does .

Now it remains to show that . Let , then there exists such that for each , . We can split into two sums:

Since one can exchange limit with finite sums, is . So

A similar shows that , and hence is Abel summable to .

## A Tauberian-type theorem

In the previous section, we showed that the class of summable sequences is a subset of Cesàro-summable sequeences, which in turn is a subset of Abel-summable sequences. It is not difficult to see that these containments are strict. However, by imposing certain conditions, these containments can also be reversed, and statements of this type are called “Tauberian-type theorems”. Here we prove a simple version where the magnitude of the sequence decays linearly.

(Tauberian) Suppose the sequence satisfies .
(a) If is Cesàro-summable to , then is also summable to .
(b) If is Abel-summable to , then is also summable to .
(a) Fix . Let be the Cesàro mean and partial sum of , respectively. By assumption, when is sufficiently large, and by triangle inequality, is at most , so it suffices to prove that .

Recall that in the previous section, we proved that

Then we have

By assumption, there is some such that for every , , implying . Then when is sufficiently large, , proving that as desired.

(b) Let . Define . By assumption, when is sufficiently large, , and by triangle inequality, is at most , so it suffices to prove that . By triangle inequality again, we have

By assumption, there is some such that for every , . Again we can break the the above sum into two, and for the later terms, we have

For the initial terms, we have

where the last inequality is due to Bernoulli’s inequality, and is bounded above by when is sufficiently large. Together we have as desired.

## Fourier series

We now come back to the Fourier setting. First recall that

For a periodic function , we define its -th partial Fourier sum to be , with , and .

We now introduce Dirichlet and Fejér kernels, which when convolved with a function , we obtain the partial sum and Cesàro mean of , letting us invoke Tauberian Theorem.

is the -th Dirichlet kernel, and is the -th Fejér kernel.
(Tauberian theorem restated) Let be a periodic function with . Then for any , if for some as , then as .
For any nonnegative integer , let . Note that by assumption . By the Convolution Theorem, the -th partial sum of is simply

and similarly, the -th Cesàro mean of is

where the second line follows from the linearity of convolution. Thus, by Tauberian theorem, if converges, then converges to the same value as well.

To prove the main theorem, it now suffices to show that approaches . We make one last definition:

Let be a periodic function. We say that is a family of good kernels if the following three conditions all hold:
(a) for each , ,
(b) for each , , and
(c) for any , as .

The idea is that a family of good kernels essentially approximate the Dirac delta function, which is a single point with infinite mass at and vanishes everywhere else.

The Fejér kernels is a family of good kernels.

We will skip the proof of this fact as it is a straightforward calculation once one establishes the trignometric identity . (However, the Dirichlet kernels cannot be a family of good kernels, otherwise every Fourier series converges to its function, which is not true.) Intuitively, since is a weighted average of with highest weights around (by the definition of a good family of kernels), we can expect that this value is close to , as long as the does not change its value too much around . Now we formalize and prove this intuition, which will finish the proof of our main theorem.

Suppose is a family of good kernels. Let be a periodic, integrable function that is continuous at . Then as .
We first write

where the first through third lines follow from the definition of convolution, Condition (a) of a good family of kernels, and triangle inequality.

The idea is that when integrating over , is small if is bounded away from , and is small if is close to . Formally, let . Since is continuous at , there is some such that

for some by Condition (b) of a good family of kernels. Since can be arbitrary, the above integral is .

Now for the other values of , note that since is integrable, the function is bounded by some . Then

which tends to as , finishing the proof.

In the previous section, we showed that the Abel mean can play a similar role to Cesàro. Similarly, one can define the Poisson kernel and show that it is a good family of kernels as to prove the main theorem through instead.

# The Basel problem via Fourier series

The Basel problem asks to compute the summation of the reciprocal of the natural numbers squared, and it is known that

While there are numerous known proofs (our presentation resembles Proof 9 most closely), we give a self-contained one that motivates the usage of Fourier analysis for periodic functions.

Recall that a function is -periodic if for every , . From Fourier analysis, every such function can be expressed as a series of cosines and sines. More precisely,

If is continuous and -periodic, then

where , and is the -th Fourier coefficient equal to .

We say that is the Fourier series of . When we define the inner product of two functions to be , then it is easily seen that the “characters” form an orthonormal system, and the -th Fourier coefficient can be viewed as the projection of along the -th character. The Fourier theorem asserts that can be approximated in -norm by its Fourier series. In fact, by applying the Weierstrass M-test, we can make the following statement with a stronger convergence guarantee:

If is continuous and -periodic and is convergent, then its Fourier series converges uniformly to .

Furthermore, the Fourier transform itself encodes many of the properties of the original function . In particular, we note that the symmetry of is preserved in its Fourier transform.

If is even, then for every , .
If is odd, then for every , .

Now we have the machinery to compute .

(Of Basel) Define to be . We compute the Fourier coefficient of . First note that

For , we have

Using integration by parts, we have for every ,

Then

Thus,

From Euler’s formula, the Fourier series of can be written as

Since is even, by Proposition, . From our calculation, the Fourier series of is explicitly

Note that is convergent from the Cauchy condensation test. Thus, by the Convergence Theorem, we conclude that

where the convergence is uniform (and in particular, pointwise). Setting , we see that

Lastly, observe that

finishing the proof.

# Auction algorithm for the assignment problem

The maximum bipartite matching is a well-known problem in graph theory, operations research, and economics. Also known by the name assignment problem, it models a marketplace with buyers and items, where every buyer has a valuation for each item, and we want to match (assign) each buyer to an item, with no item being shared. For social utility, an assignment that maximizes the total valuation is ideal.

In graph theory, the Hungarian algorithm by Kuhn produces a matching in polynomial time maximizing the total weight of the edges. Independently, Bertsekas from operations research and Demange, Gale, and Sotomayor from the economics perspective both use an underlying auction to solve the same problem. The auction provides additional intuition for this problem not transparent when using only graph-theoretic techniques. Here we describe a simple auction algorithm, largely following the presentation from this blog post.

To formulate this problem, we consider a bipartite graph , where represents buyers and represents items, with each edge having an nonnegative weight representing the value Buyer places on Item .

A matching from to is a function such that for all , or .

The assignment problem is simply to find a matching function such that its valuation is maximized. (Since it is possible that some buyers may not be assigned an item, we write for notational convenience.)

Below is the pseudocode of the auction algorithm. Intuitively, the algorithm is simulating a multi-item auction, where we put a price tag initialized to zero on each item. The queue represents the list of unmatched buyers, which is initialized to . In each iteration, a buyer is greedily matched with his most valued item minus the price tag. However, each time an item is matched, it becomes slightly more expensive, becoming less valuable to other buyers in future rounds.

AUCTION(epsilon):
Queue Q <- B
for j in I:
p[j] <- 0

BidirectionalMap M
while !Q.isEmpty():
i <- Q.pop()
j <- Find j in I maximizing v[i][j] - p[j]

if v[i][j]-p[j] >= 0:
if M.ContainsValue(j):
Q.push(M.KeyOf(j))
M(i) <- j
p[j] <- p[j] + epsilon

Output M, p

Consider the case when , where a group of buyers is being matched for a single item. Clearly whoever values the item the most will win the auction, but at what price? From the highlighted line in the above pseudocode, it is readily seen that a buyer drops out when the price is above his value. Thus, the final sale price is at most the second highest valuation among all buyers (plus ) . Hence, the above algorithm can be seen as a generalization of a second-price auction for multiple items.

Complexity. Before analyzing the running time, we first make the observation that the algorithm must terminate in finite steps. The key insight is that a buyer will not overpay for an item. In particular, writing , we have

AUCTION() terminates within iterations.

To see this, fix an item . Note that before each increment to , the inequality must hold. So the final price must satisfy . The number of iterations in the loop is at most the number of times a price is incremented, which is .

A trivial implementation that finds the item with maximum value takes linear time, leading to an implementation with a running time of . A faster algorithm can be achieved by maintaining priority queues of size , one for each buyer, as follows: Each time the price of Item is incremented, priority queues are updated, where is the number of buyers who are connected to Item in the bipartite graph. Then the total number of priority queue updates is

By the handshaking lemma, , where is the number of edges in the bipartite graph. Thus, a priority queue-based implementation takes time at most .

Now we analyze the valuation produced from the matching . The proof is short and self-contained but utilizes -complementary slackness, a concept related to primal-dual method in optimization.

Let be the matching produced by AUCTION() and be any matching from to . Then

Since we can choose to be arbitrarily small, the theorem implies that output matching is arbitrarily close to optimal. In particular, if all the values are binary, setting implies that the algorithm terminates with a maximal matching, and its time complexity is cubic on dense graphs, on par with the Hungarian algorithm.

We say that Buyer is -happy under if he is matched with an item that is -close to optimal, i.e., for all , For notational convenience, we will write , so a buyer being completely out-priced (being matched with an “empty” item) is also -happy.

The key insight is that during each iteration, the algorithm preserves the loop invariant that all buyers in are -happy. To see this, note that the invariant holds vacuously true initially. During each iteration, whenever a buyer is removed from the queue, he is always matched with an item whose price (after the update) is -close to being optimal. Then the previous holder of the item is added back to the queue. Since all buyers in are -happy when the algorithm terminates, we can observe that for any matching ,

Re-arranging, it is easy to see that

Thus, it suffices to show that . If Item is in the range of but not , then the algorithm must have never incremented its price, so . Thus,

which is nonnegative.

The theorem also implies that in any marketplace, there is a set of market-clearing prices, i.e., each buyer has at most one preferred item. If all the valuation are integers, then a variant of the above algorithm from the DGS paper produces a price vector that is pointwise minimal among all market-clearing prices.

# Data analysis workflow

## Motivation

In data analysis, a REPL (read-evaluate-print-loop) is great for asking questions quickly about a dataset. As opposed to writing code using a compiler, an interactive environment lets us hone in on the right questions, which is of great importance for exploratory work.

A typical REPL session — e.g., using R, Pig, Hive, Python — involves loading the dataset and performing a combination of filtering, projection, and transformation. Finding the appropriate combination may require many iterations, such as tweaking the various parameters used in filtering. The process of repeating almost the same commands can be quite tedious, especially dealing with wraparound lines.

Some potential workflows are:
1) Write script in a text editor. Execute it on the command line.
During development, lines are typically appended to an existing script. So each script execution has a significant overlap with previous executions. This can be a bottleneck when loading the input or a particular transformation takes time to process. Executing a script also makes it harder to try out various transformation on the fly.

2) Write script in a text editor. Then copy and paste commands as needed into a REPL.
While this allows many iteration, the physical act of copy-and-paste becomes the most dominant part of script writing, overshadowing the thought process. Furthermore, copy-and-paste from one window to another is somewhat challenging in a ssh session when working remotely.

3) Write script in a text editor. Execute the current line of the cursor in a REPL in a separate window.
This workflow allows the flexibility of experimental work. Certain IDEs, such as RStudio, already has this functionality. However, if one is looking for a uniform experience across various languages, a simple shell experience can be achieved with the vim+tmux+slime combination, described next.

## vim + tmux + slime

The basic idea is to split a shell into two panes divided by a horizontal line, where vim is in the top pane and a REPL session at the bottom. A keyboard shortcut then sends the current line under the cursor in vim to the REPL. (Selecting a block of lines to be evaluated is also an option.)

1) tmux
tmux has many features. Here we just focus on our particular use case. Once installed, create a new session and split the current window into two panes:

tmux new -s session-name tmux split-window

2) vim-slime
Originally an Emacs plugin, vim-slime is the tool that sends a block of lines in vim directly to another tmux pane. Once installed, adding these lines to .vimrc enables this powerful trick:

let mapleader="\" map t :SlimuxREPLSendLine vmap t :SlimuxREPLSendSelection

In this example, in vim’s normal mode, the keystroke “space” followed by “t” sends either the current line or the highlighted block of lines, if it exists, to another pane. (For the first usage, there will be a prompt that asks which pane should receive the signal from vim.)

Lastly, the vim-tmux-navigator plugin allows seamless navigation among vim split windows and tmux panes.

Here is an animated screenshot that sends three lines in vim to be evaluated in R:

# SVD and low-rank approximation

The singular value decomposition (SVD) factorizes a matrix where are unitary and is diagonal with nonnegative entries. Unlike the eigenvalue decomposition, every (real or complex) matrix is guaranteed to have a SVD. Geometrically, the SVD implies that every linear transformation is equivalent to a rotation, a scaling, then another rotation.

In data analysis, the SVD can be computed to perform principal component analysis (PCA), where an ellipsoid is used to “fit” a data matrix. In other words, once the SVD is computed, a partial SVD, can be used to approximate the matrix, where and are the left and right singular vectors, and are the singular values in decreasing order. Equivalently, a partial SVD is obtained by replacing the smaller singular values with zero in the factorization. Intuitively, with the length of each ellipsoid’s component equal to its corresponding singular value, keeping only the components with non-trivial length makes the ellipsoid a good fit to the data. Formally, the Eckart-Young-Mirsky Theorem states that a partial SVD provides the best approximation to among all low-rank matrices.

Let and be any matrices, with having rank at most . Then
,
.

The theorem has a short proof for either norm, but here we describe a slightly longer proof using spectral method. Before proceeding, since we will work with singular values of different matrices, we will write to denote the -th singular value of a matrix. We first state the following lemma, which provides an estimate on how much a low-rank perturbation to a matrix can affect its singular values.

Let and be any matrices, with having rank at most .

The theorem is then an easy application of the above lemma.

(Of theorem) Recall that the norm of a matrix is equal to its largest singular value. So

which by the lemma (setting ) is at most .

For Frobenius norm, recall that the square of the Frobenius norm of a matrix is equal to the sum of the squares of its singular values. Applying the lemma above, we have

finishing the proof.

The lemma is almost a re-statement of one of Weyl’s inequalities:

To see this, let , , and , and note that is zero. Instead of invoking this inequality directly, we will prove the lemma from first principle.

(Of lemma) We first prove the case when , i.e., .
Let be the vector space spanned by the top right singular vectors , and be the null space of . By dimension argument, there must exist a non-zero vector in . Then bounding from above, we have

Bounding from below and using the facts that is orthogonal to when , Pythagorean Theorem, decreasing magnitude of singular values, and lies in the orthonormal set , respectively, we obtain

Combining both sides yields .

Now we prove the general case. Using the case, we know that for any matrix with rank at most ,

Writing , we can write Since (expressing as the inner product and applying Cauchy-Schwarz, where and are the top left and right singular vectors of , respectively) , we have that

Since the above inequality holds for arbitrary with rank at most , we can choose and . Then we can conclude that

completing the proof since is zero.