Predicting the Future

Everyone claims they can’t predict the future. But mathematically, we can do better than guessing. The real problem isn’t the lack of predictive tools, it’s that most people never learned to use them.

Let’s say we’re trying to predict if it’ll rain tomorrow. Last year, during this same month it rained 5 out of 30 days. A simple way to calculate the probability is by dividing the rainy days by the non-rain days:

$$ \begin{aligned} &P(\text{event happening next}) = \frac{w}{n} = \frac{5}{30} \approx 17\% \\ &\text{where}\\ &\hspace{1em}w = \text{times an event happened} \\ &\hspace{1em}n = \text{total events} \end{aligned} $$

Laplace’s Rule of Succession

Laplace’s rule estimates the probability of an event happening next, given past outcomes. It adds one imaginary success and one imaginary failure to calm overconfidence. Using this method, we can predict whether it’ll rain using:

$$ \begin{aligned} &P(\text{event happening next}) = \frac{w+1}{n+2} = \frac{5+1}{30+2} \approx 19\% \\ &\text{where}\\ &\hspace{1em}w = \text{times an event happened} \\ &\hspace{1em}n = \text{total events} \end{aligned} $$

The estimate is a simple way to deal with low data. The static addition of 1 into the events has a large impact when data is small, and little impact with large data. With a low-probability event, the rule brings the probability up; the opposite happens with high probabilities.

Bayes’ Rule

Bayes’ rule tells you how to update a belief when new information arrives. It weighs prior probability (what you believed before) and likelihood (how consistent the new evidence is with that belief).

Let’s look at the cloud coverage to add another piece of data to the calculation. Let’s say the probability of clouds is 40%, and the probability of clouds given there’s rain is 50%. You might think that if clouds are out, then there’s a 50% chance of rain, but that’s not true. We can calculate the probability of rain given that clouds are out:

$$ \begin{aligned} &P(R|C) = \frac{P(C|R) P(R)}{P(C)} = \frac{50\% * 17\%}{40\%} \approx 21\%\\ &\text{where}\\ &\hspace{1em}P(R) = \text{Probability of rain}\\ &\hspace{1em}P(C) = \text{Probability of clouds}\\ &\hspace{1em}P(R|C) = \text{Probability of rain given clouds are out}\\ &\hspace{1em}P(C|R) = \text{Probability of clouds given there’s rain}\\ \end{aligned} $$

Bayes rule calibrates the probability to the data we have so far. Each piece of data adjusts the probability, but the overall probability is still the baseline.

The Copernican Principle

The Copernican Principle assumes you’re observing a phenomenon at a random moment, not at its birth or death. If it’s lasted a certain time, it’ll probably last about the same again. For example, If a theater company has operated for 20 years, it’s statistically more likely to last another 20 than to fold tomorrow or last forever. You’re seeing it mid-career, not at a unique moment. Longevity often doubles roughly symmetrically. Things rarely vanish immediately, but endless persistence is equally rare.

Predicting the future using prior data

To more accurately predict how long until an event happens, we can combine the Copernican Principle and Bayes’s Rule. The first thing we need is prior data to analyze the distribution of the data.

Multiplicative Rule (Power-Law Prior)

When outcomes follow heavy-tailed distributions like box-office sales, it’s called a power-law prior and the odds multiply. If something’s already large or old, its lifespan scales by a factor, not a fixed increment. A movie with $10 million in sales so far is more likely to get to $15 million than a movie with $1 million so far. The strong get stronger and often survive longer. Duration and scale feed each other in power-law systems.

Average Rule (Normal Prior)

When phenomena cluster around a known average, it’s called a normal prior, and you can predict near the cluster. Outliers exist but are rare, and uncertainty follows a bell curve, like life expectancy. We know the average is about 75 years old. A 6-year-old is likely to live until 76 since they made it past infancy. A 90-year-old is likely to live until 94, but the prediction will continue to decrease since they’re past the average.

Additive Rule (Erlang Prior)

Some events are memoryless: the chance of occurrence stays constant over time, no matter how many have passed, like a fair roulette spin. No matter how many times it lands on red, the chance of landing on black is still about 50%; the prediction adds to the current state. Not everything has a pattern, and it’s challenging to see when we’re zoned into one day at the casino. Zoom out to the big picture, and it’s clear the results are just chance.

Miscalibrated Priors and the News

Predictive reasoning collapses when your priors are wrong. That’s what modern media does best, distort the baseline. When every broadcast fixates on violence, you subconsciously boost your prior for danger. Plane crashes, shark attacks, lightning strikes, rare events amplified until they feel normal. Prediction isn’t about seeing the future. It’s about correcting how we see the present. Personal experience will always dominate our worldview, but data can help set our expectations in a realistic direction.

Predicting the Future#

Laplace’s Rule of Succession#

Bayes’ Rule#

The Copernican Principle#

Predicting the future using prior data#

Multiplicative Rule (Power-Law Prior)#

Average Rule (Normal Prior)#

Additive Rule (Erlang Prior)#

Miscalibrated Priors and the News#