Traditionally, the story of the study of probability begins in French gambling houses in the mid-seventeenth century. But we can start it earlier than that.

The Italian polymath Gerolamo Cardano had attempted to quantify the maths of dice gambling in the sixteenth century. What, for instance, would the odds be of rolling a six on four rolls of a die, or a double six on twenty-four rolls of a pair of dice?

His working went like this. The probability of rolling a six is one in six, or 1/6, or about 17 percent. Normally, in probability, we don’t give a figure as a percentage, but as a number between zero and one, which we call p. So the probability of rolling a six is p = 0.17. (Actually, 0.1666666… but I’m rounding it off.)

Cardano, reasonably enough, assumed that if you roll the die four times, your probability is four times as high: 4/6, or about 0.67. But if you stop and think about it for a moment, that can’t be right, because it would imply that if you rolled the die six times, your chance of getting a six would be one-sixth times six, or one: that is, certainty. But obviously it’s possible to roll six times and have none of the dice come up six.

What threw Cardano is that *the average number of sixes *you’ll see on four dice is 0.67. But sometimes you’ll see three, sometimes you’ll see none. The odds of seeing a six (or, separately, at least one six) are different.

*would*happen by the number of goes you take, but to look at the chances it

*wouldn’t*happen.

In the case of the one die rolled four times, you’d get it badly wrong—the real answer is about 0.52, not 0.67—but you’d still be right to bet, at even odds, on a six coming up. If you used Cardano’s reasoning for the second question, though, about how often you’d see a double six on twenty-four rolls, it would lead you seriously astray in a gambling house. His math would suggest that, since a double six comes up one time in thirty-six (p ≈ 0.03), then rolling the dice twenty-four times would give you twenty-four times that probability, twenty-four in thirty-six or two-thirds (p ≈ 0.67, again).

This time, though, his reasonable but misguided thinking would put you on the wrong side of the bet. The probability of seeing a double six in twenty-four rolls is 0.49, slightly less than half. You’d lose money betting on it. What’s gone wrong?

A century or so later, in 1654, Antoine Gombaud, a gambler and amateur philosopher who called himself the Chevalier de Méré, was interested in the same questions, for obvious professional reasons. He had noticed exactly what we’ve just said: that betting that you’ll see at least one six in four rolls of a die will make you money, whereas betting that you’ll see at least one double six in twenty-four rolls of two dice will not. Gombaud, through simple empirical observation, had got to a much more realistic position than Cardano. But he was confused. Why were the two outcomes different? After all, six is to four as thirty-six is to twenty-four. He recruited a friend, the mathematician Pierre de Carcavi, but together they were unable to work it out. So they asked a mutual friend, the great mathematician Blaise Pascal.

The solution to this problem isn’t actually that complicated. Cardano had got it exactly backward: the idea is not to look at the chances that something *would *happen by the number of goes you take, but to look at the chances it *wouldn’t *happen.

In the case of the four rolls of a single die, your chance of *not *seeing a six on any one throw is 5/6, or p ≈ 0.83. If you roll it again, your chance of not seeing a six on either throw is 0.83 times 0.83, or just shy of 0.7. Each time you roll the die, you *reduce *the chance of *not *seeing a six by 17 percent.

If you roll the die four times, your chance of *not *seeing a six is 0.83 × 0.83 × 0.83 × 0.83 ≈ 0.48. (To save time, we can say “0.83 to the power 4,” or “0.83 ^ 4.”) So your chance of *seeing *a six is 1 minus 0.48, or 0.52, or 52 percent. If you bet at even odds one hundred times, you’d expect to win fifty-two times, and you’d be in profit.

But look what happens when we do it with the two dice, looking for a double six. Your chance of seeing a double six on one roll of two dice is 1/36, or p ≈ 0.03, as we said earlier. So your chance of *not *seeing a double six is 35/36 or about 0.97.

If you roll your dice twenty-four times, your chance of *not *seeing a double six is 0.97 multiplied by itself twenty-four times (0.97 ^ 24). If you do that sum, you end up with 0.51. So the chance of *seeing *a double six is 0.49. If you bet at even odds, you’d expect to see it forty-nine times in a hundred, and you’d lose money.

(We should take a moment, here, to recognize the absolutely heroic amount of gambling that Gombaud must have been doing in order to be able to tell that his 52 percent bet was coming off, but his 49 percent bet wasn’t. Apparently, he had deduced, correctly, that you need twenty-five rolls of the dice, not twenty-four, for it to be a good bet. Gombaud was a man who enjoyed his dice-rolling.)

This led Gombaud to raise another question with Pascal. Imagine two people are playing a game of chance—cards or dice. Their game is interrupted halfway through, with one player in the lead. What’s the fairest way to divide the pot? It seems wrong to simply split it down the middle, since one person is winning; but it’s also unfair to give it all to the player in the lead, since they haven’t actually won yet.

Pascal found this fascinating, and exchanged a series of letters discussing the problem with his contemporary, Pierre de Fermat, of Last Theorem fame.

Again, this problem goes back a few centuries. The Italian monk Pacioli had a go at solving something like it in 1494, in his work *Summa de arithmetica, geometrica, proportioni et proportionalità*.

He imagines that two players are playing a ball game in which you win ten points for each goal, and the winner is the first person to get to sixty points. One of the players has reached fifty points, and the other has reached twenty, before the game is interrupted. How should the winnings be split?

Pacioli reasons that, since one player has scored five-sevenths of all the points so far scored, that player should win five-sevenths of the pot. Forty-five years later, the aforementioned Cardano—he who’d got the math backward on the dice problem, so could perhaps have shown a little more humility—scoffed that Pacioli’s solution was “absurd.” He imagined a slightly different scenario, where two players play a game of first to ten. One has seven points, and one has nine. In that situation, by Pacioli’s system, the first player should get nearly half the pot—seven-sixteenths—and the second player only slightly more, nine-sixteenths. But that seems obviously unfair, since one player only needs one point to win, while the other needs three.

Cardano suggested a better route. “His major insight,” writes Prakash Gorroochurn, “was that the division of stakes should depend on how many rounds each player had yet to win, not how many rounds they had already won.”

But Cardano didn’t get all the way there. He suggests using the ratio of the “progressions” of the two players’ still-required scores. The progression of a number, in his jargon, is that number, plus that number minus one, plus that number minus two, and so on down to one. So the progression of five would be 5 + 4 + 3 + 2 + 1 = 15.

In the example Cardano gave, the first player has three points still to win. The progression of three is six (3 + 2 + 1 = 6). The second player has one point still to win, and the progression of one is one (1 = 1). So, for Cardano, the pot should be divided six parts to one in favor of the second player.

This is better than Pacioli’s system, or at least gets you closer to the true answer. But it’s still wrong.

This is where Pascal and Fermat come into the picture. They realized the key point: It’s not how close to the finish you are, or how far from the start you’ve come, that matters. It’s *the number of possible outcomes that remain*, and how many of those outcomes favor one player over the other. Pascal, in a letter to Fermat, imagined a simple situation: two gamblers are playing a game of first to three points. They have each bet thirty-two pistoles (a gold coin used in currency at the time), so the total pot is sixty-four pistoles.

Let’s say it’s all square at two points each, and they suddenly have to end the game. In that case, reasons Pascal, it’s easy enough to divide. You just split it in half, thirty-two each.

But what if they’d had to end it one turn before, when one player had two points and the other player had one? Pascal extends the reasoning. They would have split it evenly had it gone to two rolls each, so the first player is sure of at least half the pot—even if that player were to lose the next throw, they would still have that. The other half is still a going concern. “Perhaps I will have them and perhaps you will have them,” Pascal imagines the first player saying. “The risk is equal. Therefore let us divide the thirty-two pistoles in half, and give the thirty-two of which I am certain besides.” So the first player will take 32 + 16 = 48, or three-quarters of the pot.

Another way to look at it is to say that there are four possible ways the game could have gone, had it continued. Player One could have won the first throw and the second; they could have won the first throw but lost the second; they could have lost the first throw but won the second; and they could have lost the first throw and lost the second.

Only in the fourth scenario does Player Two win the pot. If Player One wins the first throw, the second throw is irrelevant: Player One has made it to three points. So half the outcomes are wins for Player One without even going to the last throw. And even if they lose that first throw, they’re still in with a fifty-fifty chance of winning.

So the fair distribution of the pot, if the two players have to stop playing with one player up two to one, is three to one, just as Pascal said.

You can expand this, and Pascal does. Imagine that Player One was winning two–nil, not 2–1. If they win the next throw, they win. But if they lose, the other player is back to 2–1. And we’ve just seen that, from that point, their chance of winning the pot is 75 percent. In Pascal’s example, Player One would say: “If I win, I shall gain all; that is sixty-four. If I lose, forty-eight will legitimately belong to me. Therefore give me the forty-eight that are certain to be mine, even if I lose, and let us divide the other sixteen in half, because there is as much chance that you will gain them as that I will.”

So now Player One has a seven-eighths, or 87.5 percent, chance of winning, so the fair division is that Player One takes fifty-six pistoles out of sixty-four.

But how about if Player One only has *one *point, and Player Two zero? Then you extend it one further back, said Pascal. If Player Two wins the first throw, then it’s one–all, and an equal chance of winning. But if Player One wins the first throw, then it’s two–nil, and we know the situation: she has seven-eighths chance. Out of a possible sixteen outcomes, Player One wins in eleven, so she should win eleven-sixteenths of sixty-four pistoles, or forty-four.

This is the great insight of probability theory: that we should look at the possible outcomes from a given situation, not what has gone before. But laboriously counting out the number of possible outcomes as we have above takes quite a long time, so Pascal and Fermat worked on ways of making it quicker.

When you’re trying to work out how likely something is, what we need to talk about is the number of outcomes.You can work it out as a sum, but it’s complicated if you have large numbers of rounds left to play. You need to work out the maximum possible number of remaining throws—that is, the number Player One needs to win, plus the number Player Two needs to win, minus one. If someone’s one–nil up in a first-to-three game, that’s four. (The highest score the game could reach is 3–2, five points in total.) Four remaining rounds means sixteen remaining possible outcomes—that is, two times itself four times. And *then *you need to work out which of the outcomes correlate to a win for Player One, which involves a lot of superscript and Greek letters and would just tire us all out.

Luckily, Pascal came up with a cheat. He wasn’t the first to use what we now call Pascal’s triangle—it was known in ancient China, where it is named after the mathematician Yang Hui, and in second-century India. But Pascal was the first to use it in problems of probability.

It starts with 1 at the top, and fills out each layer below with a simple rule: on every row, add the number above and to the left to the number above and to the right. If there is no number in one of those places, treat it as zero.

Pascal realized that he could use the triangle to solve the problem of points. Take our example. There are a maximum of four rounds left to play, so you count down four rows from the top (counting the very top row, the solitary 1, as row zero). Player One needs two more points to win, so take off the first two numbers from the left. Add the remaining numbers together, divide them by the total value of that row, and you get your chance of winning.

In this case, count down four rows from the 1, and you find you’re on a row that goes: 1 4 6 4 1. Take the first two numbers away, and you’re left with 6 4 1, which add up to 11. The whole row adds up to 16. That is, a 11/16, or a 68.75 percent chance: p = 0.6875.

Try it for the other examples we’ve looked at. If Player One has 2 points and Player Two has 1, then there are a maximum of two possible goes left, and Player One only needs to win one of them. So you count down two rows to the 1 2 1 row, you remove the 1, and you’re left with 3/4, or p = 0.75. It’s astonishingly neat, and saves you lots of time. It works for any event that has two equally likely outcomes, like coin-flipping or games between equally matched opponents. For a given number of goes, X, you look at row X (again, with the very top line being row zero). That gives you the total number of possible outcomes. So if you flipped a coin seven times, you’d count down to row seven, the one starting 1 7 21, add those outcomes up, and you find that it equals 128. So there are 128 possible outcomes.

Now, if you want to know what the possibility is of seeing *exactly *Y outcomes, say heads, on those seven flips:

It’s possible that you’ll see no heads at all. But it requires every single coin coming up tails. Of all the possible combinations of heads and tails that could come up, only one—tails on every single coin—gives you seven heads and zero tails.

There are seven combinations that give you one head and six tails. Of the seven coins, one needs to come up heads, but it doesn’t matter which one. There are twenty-one ways of getting two heads. (I won’t enumerate them all here; I’m afraid you’re going to have to trust me, or check.) And thirty-five of getting three.

You see the pattern? 1 7 21 35—it’s row seven of the triangle.

So if you want to know the chance of getting exactly Y heads on X flips, you count down X rows from row zero and look at the number that’s Y from the left (again, counting the 1 at the left as 0). Then you divide that second number by the first. Say you want to know the odds of getting exactly five heads, you look at row seven—that’s the 1 7 21 35 35 21 7 1—and starting from zero, you count five along. That’s the second twenty-one. So 21/128 ≈ 0.164, or about a one-in-six chance.

To find the chance of getting *at least *five heads, you just add the number of possible ways of getting six heads or seven heads to the ways of getting five heads: 21 + 7 + 1 = 29. Then you divide it by 128 as we did before. That’s what Pascal was doing to work out the fairest way to split the pot.

Pascal’s triangle is only one way of working out the probability of seeing some number of outcomes, although it’s a very neat way. In situations where there are two possible outcomes, like flipping a coin, it’s called a “binomial distribution.”

But the point is that when you’re trying to work out how likely something is, what we need to talk about is the number of outcomes— the number of outcomes that result in whatever it is you’re talking about, and the total number of possible outcomes. This was, I think it’s fair to say, the first real formalization of the idea of “probability.”

__________________________________

*Excerpted from* Everything Is Predictable: How Bayesian Statistics Explain Our World *by Tom Chivers. Copyright © 2024. Available from Atria/One Signal Publishers, an imprint of Simon & Schuster.*