Suppose in an election with two candidates, we are certain of the final tally. However, as the votes are counted sequentially, the lead may switch between the candidates during the process. This was certainly the case in the 2020 U.S. presidential election, where in several swing states, different types of votes (mail, early-voting, in-person) were counted at different stages. While the final tally does not change in a recount, the lead change may. One may ask: How likely will there be lead changes in a recount? Can the winning candidate always lead from start to finish? The Ballot Theorem provides a quantitative answer to this question.








The theorem matches our intuition that if the election is close, i.e., the count difference is small, then it is very likely that in a recount,
will be winning at some point, even though
will eventually overtake
‘s lead and win it all at the end. Let us formalize the statement in terms of random variables.






To see that the formal statement captures the semantics of ballot counting, imagine that all votes are in a box, and we randomly sample a ballot from a box without replacement. At time
,
if the
-th sample is for Candidate
and
if the
-th sample is for Candidate
. Then
. In the rest of this post, we will prove the Ballot Theorem using two commonly used techniques in probability, the reflection principle and martingale.
Reflection Principle
Reflection Principle is a common technique used in Brownian motion, which is a continuous stochastic process that can model the movement of particles or the trajectory of stock prices over time. The idea is that if at any time, the stock price goes up or down equally likely, then any upward trajectory of the stock price has a corresponding downward trajectory (“reflected” along the x-axis). This symmetry allows us to reduce certain questions about a sequence of random variables to just one. For instance, the Ballot Theorem shows the probability that all random variables
are positive is reduced to the single constant random variable
.
Before writing the proof formally, let us first describe the Ballot Theorem at a high level. Imagine the ballots are sorted, and we start counting the votes one by one. Suppose the first vote belongs to Candidate . Since
, at some point during counting,
must relinquish its lead and tie with
, say with
votes each. Now pause the counting (temporarily!), and pair up each of the counted votes for
with a unique counted vote for
. With this pairing, we can re-create another recount by just swapping every counted
vote with its paired
vote. Then in this other recount, the first vote belongs to Candidate
and afer
votes,
is tied with
. In other words, whenever a recount has
losing its initial lead, there is another recount where
initially loses its lead as well!
Imagine plotting the vote counts. The scenario that has an initial lead is represented by a curve that is below the x-axis and eventually hits the x-axis. The curve for the imagined recount described above is simply obtained by reflecting the original curve along the x-axis.
The probability that the first vote is for and
relinquish this lead is
(since
does not have enough votes to win in the final tally). By reflection, the probability that the first vote is for
and
relinquishes this lead is also
. So the probability that
never falls behind is
.



![Rendered by QuickLaTeX.com T=\begin{cases} \inf\{t\in[N]:\overline{X_{t}}=0\} & \text{if the set is non-empty}\\ N & \text{otherwise.} \end{cases}](https://www.victorchen.org/wp-content/ql-cache/quicklatex.com-531f649d213052cf0a58022fcabea29f_l3.png)
Note that , so it suffices to compute
. To this end, note that if
, by the minimality of
, either
or
.
Now here is the reflection principle at work.







Again by the minimality of ,
Note that . To see this, if
is such that
, then
. Conversely, if
and
, there must be some
such that
.
Thus, putting the above together, we have
And finally, since the random variables are i.i.d.,
finishing the proof.

















Martingale
In a betting system, a gambler often formulates a strategy at time based on previous observations, say the first
card draws. For example, if we see a sequence of coin tosses being head, it is tempting to bet big on the 11th toss being a tail. A martingale is essentially a strategy where the expectation valuation at time
given the observations up to
does not differ from previous valuation. Continuing the coin example, if the coin tosses are fair, then the expected winning on the 11th toss is the same as previous coin tosses.


At time , a gambler may have access to other information computed from
, not just the values of these previous random variables. Thus, we use a larger set, a
-algebra
that semantically captures all the information available up to time
(as opposed to just
. More generally,










Semantically, the first criterion just says that information at time is cumulative: it includes information available previous time. For the second criterion, by definition
,
, so it guarantees that
actually captures the information at time
. Lastly, the third criterion enforces that the betting system is fair as discussed.
In probability, many problems have a hidden martingale, and finding it often illuminates the problem more. Here is roughly how martingale is embedded inside the Ballot Theorem. Imagine we have randomly sorted ballots for a recount. At time
(the negative indices will be explained soon), we have
ballots uncounted. Since we knew the final tally, and we have just counted
ballots, we know how many of the remaining
ballots are for Candidates
and
, just not the specific order that they will come out. Thus, if we draw a random ballot from the remaining
ballots, its distribution is exactly
, implying that
is a martingale (precise calculation within the proof below).




We do not need the full machinery of backward martingale since in our application, , the number of ballots, is bounded. But for pedagogical purposes we will retain the use of negative indices to remind ourselves that
is actually a backward martingale with stronger tools should we need to avail ourselves to them.
Similar to the reflection principle, we will also make use of stopping time, but this time we will use it more extensively, so we define it carefully now.
Semantically, when represents the cumulative information at time
, any stopping time criterion (bet after five heads are tossed, fold when money is below a threshold, etc) can only depend on the information available up to time
. Stopping time is often used in conjunction with martingales. Often it is useful to stop a sequence of random variables, say the first time that
hits
. If the original sequence forms a martingale, then the sequence up to the stopping time — which itself is a random variable — remains a martingale if
is bounded. There are more general statements, but the following suffices for our purpose.









The first condition is satisfied by definition. For the second, by telescoping, we have
For each term, we have
since and
is a martingale with respect to
.
![Rendered by QuickLaTeX.com t\in[N]](https://www.victorchen.org/wp-content/ql-cache/quicklatex.com-1176f67e048c92aa8e6db5b30bb6f40e_l3.png)







On the other hand, by i.i.d. again, we have
Now let be the stopping time when
first becomes
, i.e.,
Since is a stopping time, in the else-case above,
must be
since
. Since
form a martingale, by the Martingale Optional Stopping Time Theorem,
we have
Since , by the Law of Total Expectation, we have
Evaluating both sides, and writing , we have
where the third through fifth line follow from if
, the definition of
, and that
if
, respectively.