subscribe to our mailing list:

SECTIONS




IMPROBABLE PROBABILITIES
Assorted comments on some uses and misuses of probability
theory
First posted on June 22, 1999, updated September 2001, last update November 2006
By Mark Perakh
Las Vegas is arguably one of the most famous (or infamous) places on the
globe. The glittering towers of its hotels rising to the sky in the middle of
the desert have been reproduced on millions of photographs distributed all over
the world. Since its first hotel/casino named The Flamingo sprang up from the
sandy ground of the Nevada flatland, uncounted billions of dollars flowed
through its cash registers, inevitably in only one direction, from the gambling
customers to the bank accounts of the casinos' owners.
What is the basis of the casinos' unbreakable sequence of immense
profitability? There are two. One is a human psychological frailty ensuring an
uninterrupted supply of fools hoping to catch Lady Luck's attention. The other
is a science.
The science in question is mathematical statistics. Of course, the casinos'
operators are by and large scientifically illiterate. They don't need to know
mathematical statistics any more than an old lady driving her motorized jalopy
needs to know the physical chemistry of oil's oxidation in the cylinders of
her engine. All she needs is some primitive skill in pushing certain pedals and
rotating the steering wheel. However, it is the physical chemistry of the oil's
oxidation which makes her driving possible. Likewise, even though the casinos'
operators normally have hardly any knowledge of mathematical statistics, it is
mathematical statistics which makes their business so immensely profitable.
Another field where mathematical statistics is the basis of success is the
insurance industry, where professional statisticians are routinely employed to
determine the values of premiums and payoffs necessary to maintain the insurer's
profitability.
In both cases, that of casinos and that of insurance, the success is based
on the proper (even if sometimes not quite conscious) use of mathematical statistics.
There are, however, situations where mathematical statistics is being used
in a way contradicting its own main concepts. When that happens it often results
in claims, which may look statistically sound but are actually meaningless and
often misleading.
At the core of mathematical statistics is probability theory. Besides
mathematical statistics, probability theory is also the foundation of
statistical physics. It deals with the quantity called probability. While the
concept of probability may seem to be rather simple for laymen, probability
theory reveals that that quantity is multifaceted and its use must follow
certain precautions. When those precautions are not adhered to, the result is
often a meaningless conclusion.
While an incorrect application of mathematical statistics may involve any
part of that science, a large portion of the errors in question occur already at
the stage when its seminal quantity, probability, is miscalculated or
misinterpreted.
One example of an incorrect application of the probability concept is the
attempts by the proponents of the socalled Bible code to calculate the
probability of occurrence of certain letter sequences in various texts. Another
example is the oftenproposed calculation of the probability of the spontaneous
emergence of life on earth. There are, of course, many other examples of
improper uses of the probability calculation.
There are many good textbooks on probability theory. Usually they make use
of a rather sophisticated mathematical apparatus. This article is not meant to
be one more discussion of probabilities on a rigorously mathematical level. In
this article I will discuss the concept of probability mostly without resorting to
mathematical formulas or to the axiomatic foundation of probability theory. I will rather try to clarify the concept in question by considering examples of
various situations in which different facets of probability manifest themselves
and can be viewed in as simple a way as possible. Of course, since probability
theory is essentially a mathematical discipline, it is only possible to discuss
probability, without resorting to some mathematical apparatus, to a very limited
extent. Hence, this paper will stop at the point where the further discussion
without mathematical tools would become too crude.
Calculation of probabilities is sometimes a tricky task even for qualified mathematicians, not to mention laymen. Here are two examples of rather simple probabilistic problems whose solution often escaped even some experienced scientists.
The first problem is as follows. Imagine that you watch buses arriving at a certain stop. After watching them for a long time, you have determined that the interval between the arrivals of any two sequential buses is, on the average, one minute. The question you ask is: How long should you expect to wait for the next bus if you start waiting at an arbitrary moment of time? Many people asked to answer that question would confidently assert that the average time of waiting is thirty seconds. This answer would be correct if all the buses arrived at exactly the same interval of one minute. However, the situation is different in that one minute is just the average interval between any two consecutive bus arrivals. This number  one minute  is a mean of a distribution over a range from zero to a maximum which is larger than one minute. Therefore, the average waiting time, which in the case of a constant interarrival interval equals half the interarrival interval, in the case of a varying interval is always larger than a half of the interarrival interval. While this result can be proven in a rigorously mathematical way [1] it can easily be understood intuitively. If the interarrival intervals vary, some of them being shorter than others, then obviously, on the average, more passengers are expected to start waiting for the next bus within longer interarrival intervals than starting within shorter intervals. Therefore the average waiting time happens to be longer than a half of the interarrival interval (as it was in the case of constant intervals). For a certain interarrivals distributions (for example for the so called Poisson process, studied in mathematical statistics, which corresponds to the perfectly random arrivals), the average waiting time exactly equals the average interarrival interval (i.e. in our example, 1 minute). For an arbitrary arrivals distributions the average waiting time is larger than a half of the average interarrival intervals, and may even be larger than the entire average interarrival intervals.
The second problem had been used in a popular TV game show conducted by Monty Hall, wherein the players were offered a choice among the three closed doors. Behind one of the doors, there was a valuable prize, while behind the two other doors there was nothing. Obviously, whichever door a player chose, the probability of winning the prize would be 1/3. However, after the player chose a certain door, the compere, who knew where the price was, would open one of the two doors not chosen by the player, and show that there was nothing behind it. At that point, the player would be given a choice, either to stick to the door he had already chosen, or to choose instead the remaining closed door. The problem a player faced was to estimate whether or not changing his original choice would provide a better chance of winning. Most of the people, including some trained mathematicians, answered that the probability of winning is exactly the same regardless of whether the participant sticks to the originally chosen door or switches to the other yet unopened door. Indeed, at first glance it seems that the chance of the prize being behind any of the two yet unopened doors is the same. Such a conclusion would be correct only if the compere chose at random which door to open. In Monty Hall’s actual game, however, he knew precisely where the prize was hidden and chose which door to open not at random, but with a confidence that the door he opens hides no prize. In this case changing the choice from the door originally chosen by the player to the other yet unopened door actually doubles the probability of winning.
To see why this is so, note that at the beginning of the game, there was only one “winning” door and two “losing” doors. Hence, when a player chose arbitrarily one of the doors, the probability of his choosing the “winning” door was 1/3 while the probability of his choosing the “losing” door was 2/3, i.e., twice as large. Now, if the player luckily chose the “winning” door, he would win if he did not change his choice. This situation happens, on the average, in 1/3 of games, if the game is played many times. If, though, the player happened to choose the “losing” door, he had to change his choice in order to win. This situation happens, on the average, in 2/3 of the games if they are played many times. Hence, to double his chance to win, the player better has to change his original choice.
I can suggest a simple semiformal proof of the above conclusion.
Denote the doors A, B, and C. P(X) is probability of X being the winning door. Obviously P(A)=P(B)=P(C)=1/3 and P(A)+P(B)+P(C)=1. In our case P(A)=1/3 and P(~A)=P(B)+P(C)=2/3; Assume the compere opened door B and showed that it did not hide the prize (as the compere already knew with a 100% certainty). Now we see that P(B)=0, hence P(A)+P(C)=1. Since P(A)=1/3, P(C)=2/3. QED.
Instead of 3, any number N of doors can be used. The calculation (left to readers) shows that in such a case, changing the choice from the originally chosen door to some other (specified) of N2 not originally chosen and still closed doors increases probability of winning (N1)/(N2) times (while the probability of the originally chosen door losing, that is the probability of some (unspecified) of the originally not chosen door winning, increases N1 times).
Comment 1: I refer to the above proof as “semiformal” because it is simplified for the sake of readers not well versed in probability theory; a more detailed proof would use conditional probabilities; the result, however, would not change. The most rigorous but substantially more complicated proof can be performed using the so called Bayesian approach.
Comment 2: the above simple proof is based on the fact that the compere knew precisely where the prize was and which door, B or C, was empty. In the absence of such knowledge, that is were Monty Hall to choose at random which door (B or C) to open, the above proof would become invalid; indeed, in such a case it does not matter whether the player sticks to the originally chosen door or switches to an alternative door  the chance of winning in both cases will be the same.
If many people, including trained mathematicians, are sometimes confused by the above rather simple probabilistic situations, the misuse of probabilities in many more complex cases happens quite often, thus showing the extreme caution necessary if probabilities are used to arrive at important conclusions.
The above section was substantially improved due to discussions with several colleagues. Specifically, the part related to the “buses” problem was clarified thanks to comments by Brendan McKay and Peter Olofsson; the part related to MontyHall game, was likewise edited thanks to comments by Brendan McKay, Peter Olofsson, Jason Rosenhouse, and Douglas Theobald.
Consider a game with a coin. Each time we toss a coin it can result in
either tails or heads facing up. If we toss a die, it can result in any of six
numbers facing up, namely 1, 2, 3, 4, 5 and 6. If we want to choose one card out
of fifty cards scattered on a table, face down, and turn its face up, it can
result in any one of fifty cards facing up.
Let us now introduce certain terms. Each time we toss a coin or a die, or
turn over a card, this will be referred to as a trial. In the case
of a coin, the trial can have either of two possible outcomes, tails (T)
or heads (H). In the game with dice, each trial can result in any of six
possible outcomes, 1, 2, 3, 4, 5, or 6. In the case of 50 cards, each
trial can result in any one of fifty possible outcomes, such as five of
spades, or seven of diamonds, etc.
Now assume we conduct the game in sets of several trials. For example, one
of the players tosses a die five times in a row, resulting in a set of five
outcomes, for example 5, 3, 2, 4, and 4. Then his competitor also tosses the die
five times resulting in some other combination of five outcomes. The player
whose five trials result in a larger sum of numbers wins. The set of 5 (or 10,
or 100, or 10,000 etc) trials constitutes a test. The combination
of 5 (or 10, or 100, or 10,000 etc) outcomes obtained in a test
constitutes an event. Obviously, if each test comprises only a single
trial, terms trial and test as well as terms outcome and event
become interchangeable.
For the further discussion we have to introduce the concept of an
"honest coin" (also referred to as a "fair coin"). It means
we postulate that the coin is perfectly round and its density is uniform all
over its volume, and that in no trial do the players consciously attempt to
favor either of the two possible outcomes. If our postulate conforms to reality,
what is our estimate of the probability that the outcome of an arbitrary
trial will be, for example T (or H)?
First, it seems convenient to assign to something that is certain, a
probability of 1 (or, alternatively, 100%). It is further convenient to assign
to something that is impossible a probability of zero. Then the probability of
an event that is not certain, will always be between 0 and 1 (or between
0% and 100%).
Now we can reasonably estimate the actual value of some probabilities in
the following way. For example, if we toss a fair coin, the outcomes H and T
actually differ only in the names we give them. Thus, in a long sequence of coin
tosses, H and T can be expected to happen almost equally often. In other words,
outcomes H and T can be reasonably assumed to have the same probability. This is
only possible if each has probability of 1/2 (or 50%).
Since we use the concept of probability, which by definition is not
certainty, it means that we do not predict the precise outcome of a
particular trial. We expect, though, that in a large number of trials the
number of occurrences of H will be roughly equal that of T. For
example, if we conduct a million of trials, we expect that in approximately one
half of trials (i.e. in close to 500,000 trials) the outcome will be T
and in about the same number of trials it will be H.
Was our postulate of an honest coin correct? Obviously it could not be
absolutely correct. No coin is perfect. Each coin has certain imprecision of
shape and mass distribution, which may make the T outcome slightly more
likely than the H outcome, or vice versa. A player may inadvertently
favor a certain direction, which may be due to some anatomical peculiarities of
his/her arm. There may be occasional wind affecting the coin's fall, etc.
However, for our theoretical discussion we will ignore the listed possible
factors and assume an honest coin. Later we will return to the discussion
of the above possible deviations from a perfectly honest coin.
We see that our postulate of an honest coin led to another postulate, that
of the equal probability of possible outcomes of trials. In the case of a
coin there were two different possible outcomes, T and H, equally
probable. In some other situation there can be any number of possible outcomes.
In some situations those possible outcomes can be assumed to be all equally
probable, while in some other situations the postulate of their equal
probability may not hold. In each specific situation it is necessary to
clearly establish whether or not the postulate of equal probability of all
possible outcomes is reasonably acceptable or whether it must be dismissed. Ignoring
this requirement has been the source of many erroneous considerations of
probabilities. We will discuss specific examples of such errors later on.
Now consider one more important feature of probability. Suppose we have
conducted a trial and the outcome was T. Suppose we proceed to conduct
one more trial, tossing the coin once more. Can we predict the outcome of the
second trial given the known outcome of the first trial? Obviously, if we accept
the postulate of an honest coin and the postulate of equal probability of
outcomes, the outcome of the first trial has no effect on the second trial.
Hence, the postulates of an honest coin and of equal probability of outcomes
lead us to a third postulate, that of independence of tests. The
postulate of independence of tests is based on the assumption that in
each test the conditions are exactly the same, which means that after each test
the initial conditions of the first test are exactly restored. The applicability
of the postulate of tests' independence must be ascertained before any
conclusions can be made in regard to the probabilities' estimation. If the
independence of tests cannot be ascertained, the probability must be calculated
differently from the situation when the tests are independent.
We will discuss situations with independent and not independent tests in
more detail later on.
A discussion analogous to that of the honest coin can be also applied to
those cases where the number of possible outcomes of a trial is larger than two,
be this number three, six, or ten million. For example, if, instead of a coin,
we deal with a die, the postulate of an honest coin has to be replaced with the
similar postulate of an honest die, while the postulates of equal
probability of all possible outcomes (of which there now are six instead of two)
and of independent tests have to be verified as well before calculating the
probability.
The postulate of an honest coin or its analogs are conventionally implied
when probabilities are calculated. Except for some infrequent situations, this
postulate is usually reasonably valid. However, some writers who calculate
probabilities do not verify the validity of the postulates of equal probability
and of independence of tests. This is not an uncommon source of erroneous
estimation of probabilities. Pertinent examples will be discussed later on.
Suppose we conduct our coin game in consecutive sets of 10 trials each.
Each set of 10 trials constitutes a test. In each tentrial test the
result is a set of 10 outcomes, constituting an event. For example,
suppose that in the first test the event comprised the following 10 outcomes: H,
H, T, H, T, T, H, T, H, and H. Hence, the event in question included 6 heads
and 4 tails. Suppose that the next event comprised the following
outcomes: T, H, T, T, H, H, H, T, T, and T. This time the event included 6 tails
and 4 heads. In neither of the two events the number of T was
equal the number of H, and, moreover, the ratio of H to T
was different in the two tests. Does this mean that, first, our estimate of the
probability of, say, H, as 1/2 was wrong, and, second, that our postulate
of equal probabilities of H and T was wrong? Of course not.
We realize that the probability does not predict the exact outcome of each
trial and hence does not predict particular events. What is, then, the meaning
of probability?
If we accept the three postulates introduced earlier (honest coin, equal
probability of outcomes and independence of tests) then we can define
probability in the following manner. Let us suppose that the probability of a
certain event A is expressed as 1/N, where N is a positive
number. For example, if the event in question is the combination of two outcomes
of tossing a coin, the probability of each such event is 1/4, where N=4. It
means that in a large number X of tests event A will occur, on
the average, once in every N tests. For this prediction to hold, X
must be much larger than N. The larger is the ratio X/N, the
closer the number of occurrences of event A will be to the probability
value, i.e. to 1 occurrence in every N tests.
For example, as we concluded earlier, in a test comprising two consecutive
tosses of a coin the probability of each of the four possible events is the same
1/4, so N=4. It means that if we repeat the described test X times, where X is
much larger than 4 (say, one million times) each of the four possible events,
namely HH, HT, TT, and TH will happen, on the average once in every four
tests.
We have now actually introduced (not quite rigorously) one more postulate,
sometimes referred to as the law of large numbers. The gist of that law
is that the value of probability can be some accidental number unless it is
determined over a large number of tests. The value of probability does not
predict the outcome of any particular test, but in a certain sense we can say
that it "predicts" the results of a very large number of tests in
terms of the values averaged over all the tests.
If any one of the four postulates is not held (the number of test is not
much larger than N, the "coin" or "die" etc is not
"honest," the outcomes are not equally probable, and finally if the
tests are not independent) the value of probability calculated as 1/N has no
meaningful interpretation.
Ignoring the last statement is often the source of unfounded conclusions
from probability calculations.
Later we will also discuss situations when some of the above postulates do
not hold (in particular, the postulate of independence) but nevertheless the
probabilities of events comprising several trials each can be reasonably
estimated.
The above descriptive definition of probability is sometimes referred to
as the "classical" one.
There are in probability theory also some other definitions of
probability. They overcome certain logical shortcomings of the classical
definition and generalize it. In this paper we will not use explicitly (even
though they may be sometimes implied) those more rigorous definitions since the
above offered classical definition is sufficient for our purpose.
Let us now discuss the calculation of the probability of an event.
Remember that event is defined as the combination of outcomes in a set of
trials. For example, what is the probability that in a set of two trials with a
coin the event will be "T, H" i.e. that the outcome of the
first trial will be T and of the second trial will be H? We know
that the probability of T in the first trial was 1/2. This conclusion
stemmed from the fact that there were two equally probable outcomes. The
probability of 1/2 was estimated dividing 1 by the number (which was 2) of all
possible equally probable outcomes. If the trials are conducted twice in a row,
how many possible equally probable events can be imagined? Here is the obvious
list of all such events: 1) T, T; 2) T, H; 3) H, H; 4) H, T. The
total of 4 possible results, all equally probable, covers all possible events.
Obviously, the probability of each of those four events is the same 1/4. We see
that the probability of the event comprising the outcomes of two consecutive
trials equals the product of probabilities of each of the sequential outcomes.
This is one more postulate, which is based on the independence of tests, the
rule of probabilities multiplication. The probability of an event is the
product of the probabilities of the outcomes of all sequential trials constituting that event. As we will see
later, this rule has certain limitations.
(In textbooks on the probability theory the independence of tests is often
treated in the opposite way, namely establishing that if the probability of a
combination of events equals the product of the probabilities of the individual
events, then these individual events are independent).
Let us discuss certain aspects of probability calculations which have been
a pivotal point in the dispute between "creationists" (who assert that
the life could not have emerged spontaneously but only via a divine act
by the Creator) and "evolutionists" (who adhere to a theory asserting
that life emerged as a result of random interactions between chemical compounds
in the primeval atmosphere of our planet or of some other planet).
In particular, the creationists maintain that the probability of life's
spontaneous emergence was so negligibly low that it must be dismissed as
improbable.
To ascertain their view, the creationists use a number of various
arguments. Lest I be misunderstood, I would like to point out that I am not
discussing here whether the creationists or evolutionists are correct in their
assertions in regard to the origin of life. This question is very complex and
multifaceted and the probabilistic argument often employed by the creationists
is only one aspect of their view. What I will show is that the probabilistic
argument itself, as commonly used by many creationists, is unfounded and cannot
be viewed as a proof of their views, regardless of whether those views are
correct or incorrect.
The probabilistic argument often used by the creationist is as follows.
Imagine tossing a die with six facets. Repeat it 100 times. There are many
possible combinations of the six numbers (we would say there are possible many
events, each comprising 100 outcomes of individual trials). The probability of
each event is exceedingly small (about one over 10^{77}) and is the same for
each combination of numbers, including, say, a combination of 100
"fours," that is 4,4,4,4,4,4,4,4,4,4 … etc, ("four"
repeated 100 times in a row). However, say the creationists, the probability
that the set of 100 numbers will be some random combination is much larger than
the probability of 100 "fours," which is a unique, or
"special" event. Likewise, the spontaneous emergence of life is a
special event whose probability is exceedingly small, hence it could not happen
spontaneously. Without discussing the ultimate conclusion about the origin of
life, let us discuss only the example with the die.
Indeed, the probability that the event will be some random collection of
numbers is much larger than the probability of "all fours." It does
not mean anything. The larger probability of random sets of numbers is simply
due to the fact that it is a combined probability of many events, while
for "all fours" it is the probability of only one particular event.
From the standpoint of the probability value, there is nothing special about
"all fours" event; it is an event which is exactly as probable as any
other individual combination of numbers, be it "all sixes," "half
threes + half sevens" or any arbitrary disordered set of 100 numbers made
up of six symbols, like 2,5,3,6,1,3,3,2…. etc. The probability that 100 trials
result in any particular set of numbers is always less then the combined
probability of all the rest of the possible sets of numbers, exactly to the same
extent as it is for "all fours." For example, the probability that 100
consecutive trials will result in the following disordered set of numbers: 2, 4,
1, 5, 2, 6, 2, 3, 3, 4, 4, 6, 1…etc., which is not a "special"
event, is less than the combined probability of all other about 10^{77} possible combinations of outcomes, including the "all fours" event, to
the same extent as this is true for the "special" event of "all
fours" itself.
The crucial fault of the creationists' probabilistic argument is their
next step. They proceed to assert that the "special' event whose
probability is extremely small, simply did not happen. However, this argument
can be equally applied to any competing event whose probability is equally
extremely small. In the case of a set of 100 trials, every one of about 10^{77} possible events has the same
exceedingly small probability. Nevertheless, one
of them must necessarily take place. If we accept the probabilistic argument of
the creationists, we will have to conclude that none of the 10^{77} possible events could have happened, which is an obvious absurdity.
Of course, nothing in probability theory forbids any event to be
"special" in some sense, and spontaneous emergence of life qualifies
very well for the title of a "special" event. Being
"special" from our human viewpoint in no way makes this or any other
event stand alone from the standpoint of probability estimation. Therefore
probabilistic arguments are simply irrelevant when the spontaneous emergence of
life is discussed.
Let us look once more, by way of a very simple example, at the argument
based on the very small probability of a "special" event versus
"nonspecial" ones. Consider a case when the events under discussion
are sets of three consecutive tosses of a coin. The possible events are as
follows: HHH, HHT, HTT, HTH, TTT, TTH, THH, THT. Let say that for some reasons
we view events HHH and TTT as "special" while the rest of the possible
events are not "special." If we adopt the probabilistic arguments of
creationists, we can assert that the probability of a "special" event,
say, HHH (which is in this case 1/8) is less then the probability of event
"Not HHH" (which is 7/8). This assertion is true. However, it does not
at all mean that event HHH is indeed special from the standpoint of probability.
Indeed, we can assert by the same token that the probability of any other of the
eight possible events, for example of event HTH (which is also 1/8) is less than
the probability of event "Not HTH" (which is 7/8). There are no
probabilistic reasons to see event HHH as happening by miracle. Its probability
is not less than that of any of the other eight possible events. This conclusion
is equally applicable to situations in which not eight but billions of billions
alternative events are possible.
The "all fours" type of argument has no bearing whatsoever on
the question of the spontaneous emergence of life.
I will return to the discussion of supposedly "special" vs.
"nonspecial" events in a subsequent section of this essay.
Now I will discuss situations in which the probabilities calculated
before the first trial cannot be directly multiplied to calculate the
probability of an event.
Again, consider an example. Imagine a box containing six balls identical
in all respects except for their colors. Let one ball be white, two balls, red,
and three balls, green. We randomly pull out one ball. (The term
"randomly" in this context is equivalent to the previously introduced
concepts of an "honest coin" and an "honest die"). What is
the probability that the randomly chosen ball is of a certain color? Since all
balls are otherwise identical and are chosen randomly, each of the six balls has
the same probability of 1/6 to be chosen in the first trial. However, since the
number of balls of different colors varies, the probability that a certain color
is chosen, is different for white, red, and green. Since there is only one white
ball available, the probability that the chosen ball will be white is 1/6. Since
there are two red balls available, the probability of a red ball to be chosen is
2/6=1/3. Finally, since there are three green balls available, the probability
that the chosen ball happens to be green is 3/6=1/2.
Assume first that the ball chosen in the first trial happens to be red.
Now, unlike in the previous example, let us proceed to the second trial without
replacing the red ball. Hence, after the first trial there remain only five
balls in the box, one white, one red and three green. Since all these five balls
are identical except for their color, each of them has the same probability of
1/5 to be randomly chosen in the second trial. What is the probability that the
ball chosen in the second trial is of a certain color? Since there is still only
one white ball available, the probability of that ball to be randomly chosen is
1/5. There is now only one red ball available, so the probability of a red ball
to be randomly chosen is also 1/5. Finally, for a green ball the probability is
3/5. So if in the first trial a red ball was randomly chosen, the probabilities
of balls of different colors to be randomly chosen in the second trial are 1/5 (W),
1/5 (R), and 3/5 (G).
Assume now that in the first trial not a red, but a green ball was
randomly chosen. Again, adhering to the "no replacement" procedure, we
proceed to the second trial without replacing the green ball in the box. Now
there remain again only five balls available, one white, two red and two green.
What are the probabilities that in the second trial balls of specific colors
will be randomly chosen? Each of the five balls available has the same
probability to be randomly chosen, 1/5. Since, though, there are only one white
ball, two red and two green balls available, the probability that the ball
randomly chosen in the second trial happens to be white is 1/5, while for both
red and green balls it is 2/5.
Hence, if the ball chosen in the first trial happened to be red, then the
probabilities to be chosen in the second trial would be 1/5 (W), 1/5 (R)
and 3/5 (G). If, though, the ball chosen in the first trial happened to
be green, then the probabilities in the second trial would change to 1/5 (W),
2/5 (R) and 2/5 (G).
The conclusion: in the case of trials without replacement, the
probabilities of outcomes in the second trial depend on the actual outcome of
the first trial, hence in this case the tests are not independent.
When the tests are not independent, the probabilities calculated
separately for each of the sequential trials cannot be directly multiplied.
Indeed, the probabilities calculated before the first trial were as follows: 1/6
(W), 2/6=1/3 (R) and 3/6=1/2 (G). If we multiplied the
probabilities like in the case of independent test, we would have obtained the
probabilities, for example, for the event (RR) as 1/3 times 1/3 which
equals 1/9. Actually, though, the probability of that event is 1/3 times 1/5
which is 1/15. Of course, probability theory provides an excellent way to deal
with the "no replacement" situation, using the concept of socalled
"conditional probabilities." However some writers utilizing
probability calculations seem to be unaware of the distinction between
independent and nonindependent tests. Ignoring that distinction has been a
source of crude errors.
One example of such erroneous calculations of probabilities is how some
proponents of the socalled Bible code estimate the probability of the
appearance in a text of certain letter sequences.
The letter sequences in question are the socalled ELS which stands for
"equidistant letter sequences." For example, in the preceding sentence
the word "question" includes the letter "s" as the fourth
letter from the left. Skip the preceding letter "e" and there is the
letter "u." Skip again the preceding letter "q" and the
space between the words (which is to be ignored), and there is the letter
"n." The three letters, s, u, and n, separated by "skips" of
2, constitute the word "sun" if read from right to left. This is an
ELS with a negative "skip" of –2. There are many such ELS, both read
from right to left and from left to right in any text.
There are people who are busy looking for arrays of ELS in the Hebrew
Bible believing these arrays had been inserted into the text of the Bible by the
divine Creator and constitute a meaningful "code." As one of the
arguments in favor of their beliefs, the proponents of the "code"
attempt to show that the probability of such arrays of ELS happening in a text
by sheer chance is exceedingly small and therefore the presence of those arrays
of ELS must be attributed to the divine design.
There are a few publications in which attempts have been made to apply an
allegedly sound statistical test to the question of the Bible code. In
particular, D. Witzum, E. Rips, and Y. Rosenberg (WRR) described such an attempt
in a paper published in 1994 in "Statistical Science" (v. 9, No 3, 429
– 438). The methodology by WRR has been thoroughly analyzed in a number of
critical publications and shown to be deficient. This methodology goes further
than the application of probability theory, making use of some tools of
mathematical statistics, and therefore is not discussed here since this paper is
only about probability calculations. However, besides the paper by WRR and some
other similar publications, there are many publications where no real
statistical analysis is attempted but only "simple" calculations of
probabilities are employed. There are common errors in those publications, one
being the multiplication of probabilities in cases when the tests are not
independent. (There are also many web publications in which a supposedly deeper
statistical approach is utilized to prove the existence of the Bible code. These
calculations purport to determine the probability of appearance in the text not
just of individual ELS, but of whole clusters of such. Such analysis usually
starts with the same erroneous calculation of probabilities of individual words
as examined in the following paragraphs.
Usually the calculations in question start by choosing a word whose
possible appearance as an ELS in the given text is being explored. When such a
word has been selected, its first letter becomes thus determined. The next step
is estimating the probability of the letter in question to appear at arbitrary
locations in a text. The procedure is repeated for every letter of the chosen
word. After having allegedly determined the probabilities of occurrence of each
letter of a word constituting an ELS, the proponents of the "code"
then multiply the calculated probabilities, thus supposedly finding the
probability of the occurrence of the given ELS.
Such multiplication is illegitimate. Indeed, a given text comprises a
certain set of letters. When the first letter of an ELS has been chosen (and the
probability of its occurrence anywhere in the text has been calculated)
this makes all the sites in the text occupied by that letter inaccessible to any
other letter. Let us assume that the first letter of the word in question is X,
and it happens x times in the entire text, whose total length is N
letters. The proponents of the code calculate the probability of X
occurring at any arbitrary site as x/N. This calculation would be correct
only for a random collection of N letters, among which letter X
happens x times. For a meaningful text this calculation is wrong.
However, since we wish at this time to address only the question of test's
independence, let us accept the described calculation for the sake of
discussion. As soon as letter X has been selected, and the probability of
its occurrence at any location in the text allegedly determined, the
number of sites accessible for the second letter in the chosen word decreases
from N to Nx. Hence, even if we accept the described calculation,
then the probability of the second letter (let us denote it Y) to appear
at an arbitrary still accessible site is now y/(Nx) where y is
the number of occurrences of letter Y in the entire text. It is well
known that the frequencies of various letters in meaningful texts are different.
For example, in English the most frequent letter is e, whose frequency
(about 12.3%) is about 180 times larger than that of the least frequent letter,
which is z (about 0.07%).
Hence, depending on which letter is the first one in the chosen word,
i.e., on what the value of x is, the probability of the occurrence of the
second letter, estimated as y/(Nx), will differ.
Therefore we have in the described case a typical situation "without
replacement" where the outcome of the second trial (the probability of Y)
depends on the outcome of the preceding trial (which in its turn depends on the
choice of X). Therefore the multiplication of calculated probabilities
performed by the code proponents as the second (as well as the third, the
fourth, etc) step of their estimation of ELS probability is illegitimate and
produces meaningless numbers of alleged probabilities.
The probabilities of various individual letters appearing at an arbitrary
site in a text are not very small (mostly between about 1/8 and 1/100). If a
word consists of, say, six letters, the multiplication of six such fractions
results in a very small number which is then considered to be the probability of
an ELS but is actually far from the correct value of the probability in
question.
Using y/(Nx) instead of y/N, and thus correcting one of the errors of
such calculations, would not suffice to make the estimation of the probability
of an ELS reliable. The correct probability of an ELS could be calculated based
on certain assumption in regard to the text's structure, which distinguishes
meaningful texts from random conglomerates of letters. There is no mathematical
model of meaningful texts available, and therefore the estimations of the ELS
probability, even if calculated accounting for interdependence of tests, would
have little practical meaning until such a mathematical model is developed.
Finally, the amply demonstrated presence of immense numbers of various ELS
in both biblical and any other texts, in Hebrew as well as in other languages,
is the simplest and also the most convincing proof that the allegedly very small
probabilities of ELS appearance, as calculated by the proponents of the
"code," are indeed of no evidential merit whatsoever.
So far I have discussed the quantitative aspects of probability. I will
now discuss probability from a different angle, namely analyzing its cognitive
aspects. This discussion will be twofold. One side of the cognitive meaning of
probability is that it essentially reflects the amount of information available
about the possible events. The other side of the probability's cognitive
aspect is the question of what the significance of this or that value of
probability essentially is.
I will start with the question of the relationship between the calculated
probability and the level of information available about the subject of the
probability analysis. I will proceed by considering certain examples
illustrating that feature of probability.
Imagine
that you want to meet your friend who works for a company with offices in a
multistory building in the downtown. Close
to 5 pm you are on the opposite side of the street, waiting for your friend to
come out of the building. Let us imagine that you would like to estimate the
probability that the first person coming out will be male. You have never been
inside that building so you have no knowledge of the composition of the people
working in that building. Your estimate will necessarily be that the probability of the
first person coming out being male is 1/2, and the same probability for female. Let us further imagine that your friend who works in that building knows
that among the people working there about 2/3 are female and about 1/3 are male.
Obviously his estimate will be that the probability of the first person coming
out to be male is 1/3 rather than 1/2. Obviously,
the objective likelihood of a male coming out first does not depend on who makes
the estimate. It is 1/3. The
different estimates of probability are due to something that has no relation to
the subject of the probability estimation. They are due to the different level
of information about the subject possessed by you and your friend Because of a very limited knowledge about the subject, you have to assume
that two possible events – a male or a female coming first, are equally
probable. Your friend knew more, in
particular he knew that the probability of a female coming out first was larger
than of a male coming out first.
This example illustrates an important property of the calculated
probability. It reflects the level of knowledge about a subject. If we possess
the full knowledge about the subject we know exactly, in advance, the outcome of
a test, so instead of probability we deal with certainty.
A common situation in which we
have full knowledge of the situation is when an event has actually occurred. In such a situation the question of the probability of the event is
meaningless. After the first person
had actually come out of the building, the question of the probability of that
event becomes moot. Of course we
still can calculate the probability of that event, but doing so we necessarily
deal with an imaginary situation assuming the event has not yet actually
occurred.
Being the reflection of the
level of knowledge about a subject is the ubiquitous and most essential feature
of the probability from the viewpoint of its cognitive essence.
What about the examples with a coin or a die, where we thought we
possessed the full knowledge of all possible outcomes and all those possible
outcomes definitely seemed to be equally probable?
We did not possess such knowledge! Our assumption of the equal probability
of either heads or tails, or of the equal probability of each of the six
possible outcomes of a trial with a die was due to our limited knowledge about
the actual properties of the coin or of the die. No coin and no die are perfect.
Therefore, in the tests with a coin, either head or tail may have a
slightly better chance of occurring. Likewise, in the test with a die, some of
the six facets of the die may have a slightly better chance to face upward. In
tests conducted by K. Pearson with a coin (1921), after it was tossed 24,000
times, head occurred in 12,012 trials, while tail, in 11988 trials. Generally
speaking, the slight difference between the numbers of heads and tails is
expected in a large sequence of truly random tests. On the other hand, we cannot
exclude that the described result was due, at least partially, to a certain
imperfection in the coin used, or in the procedure employed.
Since we have no knowledge of the particular subtle imperfections of a
given coin or die, we have to postulate the equal probability of all possible
outcomes.
In the tests with a die or a coin, we at least know all possible outcomes.
There are many situations in which we have no such knowledge. If that is the
case, we have to assume the existence of some supposedly possible events which
actually are impossible, but we simply cannot rule them out.
For example, assume we wish to estimate the probability that upon entering
a property at 1236 Honey Street, the first person we meet will be an adult man.
We have no knowledge of that property. In particular, we don't know that it is
occupied by a group of monks, so there are no women and no children at that
address. If we don't know also the percentage of women, men, male children and
female children in that town, we have to guess that the probability of
encountering each of the four supposedly possible types of a person is the same
1/4. If we knew more about that location, we would estimate the probability of
encountering an adult female, and kids, as being very small. Then the
probability of meeting an adult man would be calculated as close to 1, i.e. to
certainty.
Quite often the very small calculated probabilities of certain events are
due to the lack of information and hence to an exaggerated number of supposedly
possible events many of which are actually impossible. One example of such a
greatly underestimated probability of an event is the alleged estimation of the
probability of life's spontaneous emergence. The calculations in question are
based on a number of arbitrary assumptions and deal with a situation whose
details are largely unknown. Therefore, in such calculations the number of
possible events is greatly exaggerated, and all of them are assumed to be
equally probable, which leads to extremely small values of calculated
probability. Actually, many of the allegedly possible paths of chemical
interactions may be impossible, and those possible are by no means equally
probable. Therefore (and for some other reasons as well) the extremely small
probability of life's spontaneous emergence must be viewed with the utmost
skepticism.
Of course, it is equally easy to give an example of a case in which
insufficient knowledge of the situation results not in an increased but rather
in a decreased number of supposedly possible outcomes of a test. Imagine that
you made an appointment over the phone to meet John Doe at the entrance to his
residence. You have never before seen his residence. When you arrive at his
address you discover that he lives in a large apartment house which seems to
have two entrances at the opposite corners of the building. You have to watch
both entrances. Your estimate of the probability that John would exit from the
eastern door is 1/2, as it is also that he would exit from the western door. The
estimated number, 1/2, results from your assumption of equal probability of John's
choosing either of the exits and from your knowledge that there are two exits.
However, what if you don't know that the building has also one more exit in
the rear? If you knew that fact, your estimated probability would drop to 1/3
for each of the doors. Insufficient knowledge (you knew only about two possible
outcomes) led you to an increased estimated probability compared with that
calculated with a more complete knowledge of the situation, accounting for all
three possible outcomes.
The two described situations, one when the number of possible outcomes is
assumed to be larger than it actually is, and the other when the number of
supposedly possible outcomes is less that the actual number of them, may result
in two different types of judgment, leading either to exaggerated or to
underestimated probability for the event in question.
Now let us discuss the other side of the probability's cognitive aspect.
What is the real meaning of probability's calculated value if it happens to be
very small?
Consider first the situation when all possible outcomes of trials are
supposedly equally probable. Assume the probability of an event A was
calculated as 1/N where N is a very large number so the
probability of the event is very low. Often, such a result is interpreted as an
indication that the event in question should be considered, to all intents and
purposes, as practically impossible. However, such an interpretation, which may
be psychologically attractive, has no basis in probability theory. The actual
meaning of that value of 1/N is just that – the event in question is
one of N equally probable events. If event A has not occurred it
simply means that some other event B has occurred instead. But event B
had the same very low probability of occurring as event A. So why could
the lowprobability event B actually occur but event A which had
the same probability as B, could not occur?
An extremely low value for a calculated probability has no cognitive
meaning in itself. Whichever one of N possible events has actually
occurred, it necessarily had the same very low probability as the others, but
has occurred nevertheless. Therefore the assertion of impossibility of such
events as the spontaneous emergence of life, based on its calculated very low
probability, has no merit.
If the possible events are actually not equally probable, which is a more
realistic approach, a very low calculated probability of an event has even less
of a cognitive meaning, since its calculation ignored the possible existence of
preferential chains of outcomes which could ensure a much higher probability for
the event in question.
The above discourse may produce in the minds of some readers an impression
that my thesis was to show that the concept of probability is really not very
useful since its cognitive contents is very limited. This was by no means my
intention. When properly applied and if not expected to produce unrealistic
predictions, the concept of probability may be a very potent tool for shedding
light on many problems in science and engineering. When applied improperly and
if expected to be a magic bullet to produce predictions, it often becomes
misleading and a basis for a number of unfounded and sometimes ludicrous
conclusions. The real power of the properly calculated and interpreted
probability is, however, not in the calculations of probability of this or that
event, when it is indeed of a limited value, but when the probability is
utilized as an integrated tool within the much more sophisticated framework of
either mathematical statistics or statistical physics.
The scientific theories often seem to contradict common sense. When this
is the case, it is the alleged common sense that is deceptive, while the
assertions of science are correct. The whole science of quantum mechanics, which
is one of the most magnificent achievements of the human mind, seems to be
contrary to the "common sense" based on the everyday experience of
men.
One good example of the above contradiction is related to the motion of
spacecrafts in orbit about a planet. If there are two spacecrafts moving in the
same orbit, one behind the other, what should the pilot of the craft that is
behind do if he wishes to overtake the one ahead? "Common sense" tells
us that the pilot in question has to increase the speed of his craft along the
orbital path. Indeed, that is what we do when we wish to overtake a car that is
ahead of us on a road. However, in the case of an orbital flight the
"common sense" is wrong. To overtake a spacecraft that is ahead in the
orbit, the pilot of the craft that lags behind must decrease rather than
to increase his speed. This theoretical conclusion of the science of mechanics
has been decisively confirmed in multiple flights of spacecrafts and artificial
satellites, despite its seemingly contradicting the normal experience of car
drivers, pedestrians, runners, and horsemen, and "common sense" based
on that experience. Likewise, many conclusions of probability theory may seem to
contradict common sense, but nevertheless probability theory is correct while
"common sense" in those cases is wrong.
Consider an experiment with a die, where events in question are sets of 10
trials each. Recall that we assume an "honest die" and in addition the
independence of outcomes. If we toss the die once, each of the six possible
outcomes has the same chance of happening, the probability of each of the six
numbers to face up being the same 1/6. Assume that in the first trial the
outcome was, say, 3. Then we toss the die the second time. It is the same die,
tossed in the same way, with the same six equally probable outcomes. To get an
outcome of 3 is as probable as any of the five other outcomes. The tests are
independent, so the outcome of each subsequent trial does not depend on the
outcomes of any of the preceding trials.
Now toss the die in sets of 10 trials each. Assume that the first event is
as follows: A (3, 5, 6, 2, 6, 5, 6, 4, 1, 1). We are not surprised in the
least since we know that there are 6^{10 }(which is 60,466,176)
possible, equally probable events. Event A is just one of them and does
not stand alone in any respect among those over sixty million events, so it
could have happened in any set of 10 trials as well as any other of those sixty
million variations of numbers. Let us assume that in the second set of 10 trials
the event is B (6, 5, 4, 2, 6, 2, 3, 2, 1, 6). Again, we have no reason
to be surprised by such a result since it just another of those millions of
possible events and there is no reason whatsoever for it not to happen. So far
the probability theory seems to agree with common sense.
Assume now that in the third set of 10 trials the event is C (4, 4,
4, 4, 4, 4, 4, 4, 4, 4). I am confident that in such a case everybody would be
amazed and the immediate explanation of that seemingly "improbable"
event would be the suspicion that either the die has been tampered with or that
it was tossed using some sleight of hand.
While cheating cannot be excluded, the event with all ten
"fours" does not necessarily require the assumption of cheating.
Indeed, what was the probability of event A? It was one in over
sixty million. Despite the exceedingly small probability of A, its
occurrence did not surprise anybody. What was the probability of event B?
Again only one in over sixty million but we were not amazed at all. What was the
probability of event C? The same one in over sixty million, but this time
we are amazed.
From the standpoint of probability theory there is no difference
whatsoever between any of the sixty million possible events, including events A,
B and C, and all other (60,466,176 – 3 = 60,466,173) possible variations
of a sixnumber combination.
Is the tentime repeat of "four" extremely unlikely? Yes, it is.
Indeed, its probability was only 1 in over sixty million! However, we should
remember that any other combination of six numbers is as unlikely (or as likely)
to occur as has the "all fours" combination. The occurrence of 10
identical outcomes in a row is very unlikely, but not less likely than the
occurrence of any other possible set of ten numbers.
The theory of probability asserts that if we repeat this tendietossing
test, say a billion billion billions times, then each of the about sixty million
possible combinations of ten numbers will happen approximately once in every
60,466,176 tentossing tests. This is true equally for the "allfours"
combination and for any other of the over sixty million competing combinations.
Why does the "allfours" event seem amazing? Only for
psychological reasons. It seems easier to assume cheating on the part of the
dicetossing player than the never before seen occurrence of "all
fours" in ten trials. What is not realized is that the overwhelming
majority of events other than "all fours" was never seen, either.
There are so many possible combinations of ten numbers, composed of six
different unique numbers, that each of them occurs extremely rarely. The set of
10 identical numbers seems psychologically to be "special" among
combinations of different numbers. For probability theory, though, the set of
"all fours" is not special in any respect.
Of course, if the actual event is highly favorable to one of the players,
it justifies a suspicion of cheating. The reason for that is our experience
which tells us that cheating is rather highly probable when a monetary or other
award is in the offing. However, the probability of cheating is actually
irrelevant to our discussion. Indeed, the probability of cheating is just a
peculiar feature of the example with a game of dice. This example is used,
however, to illustrate the question of the spontaneous emergence of life where
no analog of cheating is present. Therefore, the proper analogy is one in which
cheating is excluded. When the possibility of cheating is
excluded, only the mathematical probability of any of the over sixty million
possible events has to be considered. In such a case, every one of those over
sixty million events is equally probable. Therefore the extremely low
probability of any of those events, including an ordered sequence of "all
fours," is of no cognitive significance. However special this ordered
sequence may be from a certain viewpoint, it is not special at all from the
standpoint of probability. The same must be said about the probability of
spontaneous emergence of life. However small it is, it is not less than the
probability of any of the competing events and therefore its extremely small
probability in no way means it could not have happened.
Probability theory is a part of science and has been overwhelmingly
confirmed to be a good theory of great power. There is no doubt that the
viewpoint of probability theory is correct. The psychological reaction to ten
identical outcomes in a set is as wrong as is the suggestion to a pilot of a
spacecraft lagging behind to increase his speed if he wishes to overcome a craft
ahead in orbit.
Another example of erroneous attitude to an "improbable" event,
based on psychological reasons, is the case of multiple wins in a lottery.
Consider a simple raffle in which there are only 100 tickets on sale. To
determine the winner, numbers from 1 to 100 are written on small identical
pieces of paper, the pieces are rolled up and placed in a rotating cylinder.
After the cylinder has been rotated several times, a child whose eyes are
covered with a piece of cloth pulls one of the pieces out of the cylinder. This
procedure seems to ensure as complete an absence of bias as humanly possible.
Obviously each of the tickets has the same probability of winning, namely
1/100. Let us assume John Doe is the lucky one. We congratulate him but nobody
is surprised by John's win. Out of the hundred tickets one must necessarily
win, so why shouldn't John be the winner?
Assume now that the raffle had not 100 but 10,000 tickets sold. In this
case the probability of winning was the same for each ticket, namely 1/10,000.
Assume Jim Jones won in that lottery. Are we surprised? Of course not. One
ticket out of 10,000 had to win, so why shouldn't it be that of Jim?
The same discussion is applicable to any big lottery where there are
hundreds of thousands or even millions of tickets. Regardless of the number of
tickets available, one of them, either sold or unsold, must necessarily win, so
why shouldn't it be that of Jim or John?
Now let us return to the small raffle with only 100 tickets sold. Recall
that John Doe won it. Assume now that, encouraged by his win, John decides to
play once again. John has already won once; the other
99 players have not yet won at all. What is the probability of winning in
the second run? For every one of the 100 players, including John, it is again
the same 1/100. Does John's previous win provide him with any advantages or
disadvantages compared to other 99 players? None whatsoever. All one hundred players are in the same position, including John.
Assume now that John wins again. It is as probable as that any of the
other 99 players winning this time, so why shouldn't it be John? However, if
John wins the second time in a row, everybody is amazed by his luck. Why the
amazement?
Let us calculate the probability of a double win, based on the assumption
that no cheating was possible. The probability of winning in the first run was
1/100. The probability of winning in the second run was again 1/100. The events
are independent, therefore the probability of winning twice in a row is 1/100
times 1/100 which is 1 in 10,000. It is exactly the same probability as it was
in the raffle with 10,000 tickets played in one run. When Jim won that raffle,
we were not surprised at all, despite the probability of his win being only 1 in
10,000, nor should we have been. So why should we be amazed at John's double
win whose probability was exactly the same 1 in 10,000?
Let us clarify the difference between the cases of a large raffle played
only once and a small raffle played several times in a row.
If a raffle is played only once and N tickets
have been distributed, covering all N possible versions of numbers, of
which each one has the same chance to win, then the probability that a particular
player wins is p(P)=1/N while the probability that someone out
of N players (whoever he or she might be) wins is p(S)=1 (i.e. 100%).
If though the raffle is played k times, and each
time n players participate, where n^k=N, the probability that a particular
player wins k times in a row is again the same 1/N. Indeed, in each
game the probability of winning for a particular player now is 1/n. The games are independent of each other. Hence the probability of winning
k times equals the product of probabilities of winning in each game, i.e.
it is (1/n)^k=1/N.
However, the probability that someone (whoever
he/she happens to be) wins k times in a row is now not 1, but not more
than n/N, that maximum value corresponding to the situation in which the
same n players play in all k games. Indeed, for each particular
player the probability of winning k times in a row is 1/N. Since there are n players, each with the same chance
to win k times, the probability of someone in that group winning k
times in a row is n times 1/N i.e. n/N . In
other words, in a big raffle played only once somebody necessarily wins (p=1). On the other hand, in a small raffle played k times, it is likely
that nobody wins k times in a row, as the probability of such a multiple
win is small.
Here is a numerical example. Let the big raffle be such that
N=1,000,000. If all N tickets are distributed, the probability that John
Doe wins is one in a million. However, the probability that somebody (whoever
he/she happens to be) wins is 1 (i.e.100%).
If the raffle is small, such that only n=100 tickets are
distributed, the probability of any particular player winning in a given
game is 1/100. If k=3 games
are played, the probability that a particular John Doe wins 3 times in a row is (1/100)^3 which is again one in a million, exactly as it was in a onegame raffle
with N=1,000,000 tickets.
However,
the probability that someone wins three times in a row, whoever he or she
happens to be, is now not 100% but not more than only n/N=100/1,000,000 (or
less, if the composition of the players group changes from game to game) which
is 0.0001, i.e. 10,000 times less
than in a onegame raffle with one million tickets. Hence, such a raffle may be
played time after time after time, without anybody winning k times in a
row. Actually such a multiple win
must be very rare.
When John Doe wins three times in a row, we are amazed
not because the probability of that event was one in a million (which is the
same as for a single win in a big onegame raffle) but because the probability of anyone winning three times in a row
is ten thousand times less than it is in a onegame big raffle.
Hence, while in the big raffle
played just once, the fact that somebody won is a 100% probable (i.e. is a certain
event), in the case of a small raffle played three times a triple win is a rare
event of low probability (in our example 1 in 10,000).
However, if we adhere to the postulate of a
fair game, a triple win is not a special event despite its low probability. It is as probable as any other combination of three winning tickets,
namely in our example one in a million. To
suspect fraud means to abolish the postulate of a fair game. Indeed, if we know
that fraud is possible, intuitively, we compare the probability of an honest
triple win with the probability of fraud. Our estimate is that the probability
of an honest triple win (in our case 1 in 10,000) is less than the probability
of fraud (which in some cases may be quite high).
The
above discussion related only to a raffletype lottery. If the lottery is what
sometimes is referred to as the Irishtype lottery, the situation is slightly
different. In this type of a lottery, the players themselves choose a set of
numbers for their tickets. For example, I believe that in the California State lottery each
layer has to choose any 6 numbers between 1 and 50. There are about 16 million of possible combinations of such
sets of 6 numbers. This means that there can be not more than 16 million of
tickets with differing sets of the chosen 6 numbers. However, nothing prevents two or more players to coincidentally choose
the same set of 6 numbers. (Such coincidence is impossible in a raffle, where
all tickets that are distributed among the players have unique numbers each). If more than one player have chosen the same set of 6 numbers, this
diminishes the probability that somebody will win in the drawing in
question. The
probability of someone (not a particular player) winning in this type of
a lottery is calculated in the Appendix to this article.
From the above we can conclude that when a particular player wins more
than once in consecutive games, we are amazed not because the probability of
winning for that particular player is very low, but because the probability of
anybody (whoever he/she happens to be) winning consecutively in more than one
game is much less than the probability of someone winning only once in an even
much larger lottery. We intuitively
estimate the difference between the two situations. However, the important point is that what impresses us is not
the sheer small probability of someone winning against enormous odds. This
probability is equally small in the case of winning only once in a big lottery,
but in that case we are not amazed. This
illustrates the psychological aspect of probability.
Let us briefly discuss the meaning of the term "special event."
When stating that none of the N possible, equally probable events was in
any way special, I only meant to say that it was not special from the standpoint
of its probability. Any event, while not special in the above sense, may be very
special in some other sense.
Consider an example. Let us imagine a die whose six facets bear, instead
of numbers from 1 to 6, six letters A, B, C, D, E, and F. Let us imagine further
that we toss the die in sets of six trials each. In such a case there are 6^{6}
= 46,656 possible, equally probable events. Among those events are the following
three: ABCDEF, AAAAAA, and FDCABE. Each of these three events has the same
probability of 1 in 46,656. Hence, from the standpoint of probability none of
these three events is special in any sense.
However, each of these three events may be special in a sense otherwise
than probabilistic. Indeed, for somebody interested in alphabets the first of
the three events may seem to be very special since the six outcomes are in the
alphabetical order. Of course, alphabetical order in itself has no intrinsic
special meaning, thus a person whose language is, for example, Chinese, would
hardly see anything special in that particular order of symbols. The second
event, with its six identical outcomes may seem miraculous to a person inclined
to see miracles everywhere and to attach some special significance to
coincidences many of which happen all the time. The third event seems to be not
special but rather just one of the large number of possible events. However,
imagine a person whose first name is Franklin, middle name is Delano, and whose
last name is (no, not Roosevelt!) Cabe. For this person the six trials resulting
in the sequence FDCABE may look as if his name, F. D. Cabe, was miraculously
produced by six throws of dice, despite the probability of such a coincidence
being only 1 in 46,656.
Whatever special significance this or that person may be inclined to
attribute to any of the possible, equally probable events, none of them is
special from the standpoint of probability. This conclusion is equally valid
regardless of the value of the probability, however small it happens to be.
In particular, the extremely small probability of the spontaneous
emergence of intelligent life, as calculated (usually not quite correctly) by
the opponents of the hypothesis of life's spontaneous emergence, by no means
indicates that the spontaneous emergence of life must be ruled out. (There are
many nonprobabilistic arguments both in favor of creationism and against it
which we will not discuss in this essay). The spontaneous emergence of life was
an extremely unlikely event, but all other alternatives were extremely unlikely
as well. One out of N possible events did occur, and there is nothing special in
that from the standpoint of probability, even though it may be very special from
your or my personal viewpoint.
In this
appendix, I will calculate the probability of more than one player
simultaneously winning the Irish type lottery.
Let N be the number of possible
combinations of numbers to be chosen by players (of which one combination, chosen
randomly, will be the winning set). Let
T <= N be the number of tickets sold.
Now calculate p(L), the
probability that exactly L players select the winning combination.
The number of choices of L
tickets out of T is given by the binomial distribution:
bin(T,L) = T!/(L!(TL)!)
For
those L tickets to be the only winners, they must all select the winning
combination . The probability of that is (1/N)^L. All the other TL players must
select a nonwinning combination, probability of that being (11/N)^(TL).
We multiply those three
quantities, which yields the formul
P(L) = bin(T,L) (1/N)^L (11/N)^(TL).
This formula can be simplified,
preserving a good precision. Since
usually N and T are very large and L is very small, we can use the following
approximations:
T!/(TL)! approx= T^L; (11/N)^(TL) approx= exp(T/N).
Now the formula becomes P(L) approx= (T/N)^L
exp(T/N) / L!
This approximate (but quite accurate) formula is the
Poisson distribution with mean T/N. In
the case when T=N (i.e. when all available tickets are sold) we have a simpler
formula:
P(L) approx= exp(1) / L!.
(A complication in practice may
be that when one person buys more than one ticket he/she certainly makes sure
that all the combinations of numbers he/she chooses are different. However, the approximate
formula will still be very accurate unless someone is buying a large fraction of
all tickets, which is unlikely).
The probability that only one, but not less than one player wins
once in this type of a
lottery is now less than 100%, but is (assuming that L=1) p(1)=1/E=0.368, or
close to 37%, which is still thousands time more than the probability of the
same player winning consecutively in more than one drawings.
1. Peter Olofsson. Probability, Statistics, and Stochastic Processes. (John Wiley & Sons, Hoboken
NJ) 2005.
Mark Perakh's main page:
http://members.cox.net/marperak.
Discussion

