Stephen Meyer's Bogus Information Theory
By Jeffrey Shallit
Posted on January 18, 2010
A couple of months ago, I finished a first reading of Stephen Meyer's new book, Signature in the Cell. It was very slow going because there is so much wrong with it, and I tried to take notes on everything that struck me.Two
things struck me as I read it: first, its essential dishonesty, and
second, Meyer's significant misunderstandings of information theory.
I'll devote a post to the book's many mispresentations another day, and
concentrate on information theory today. I'm not a biologist, so I'll
leave a detailed discussion of what's wrong with his biology to others. In Signature in the Cell,
Meyer talks about three different kinds of information: Shannon
information, Kolmogorov information, and a third kind that has been
invented by ID creationists and has no coherent definition. I'll call
the third kind "creationist information". Shannon's theory is
a probabilistic theory. Shannon equated information with a reduction in
uncertainty. He measured this by computing the reduction in entropy,
where entropy is given by -log2 p and p is a
probability. For example, if I flip two coins behind my back, you don't
know how either of them turned out, so your information about the
results is 0. If I now show you one coin, then I have reduced your
uncertainty about the results by -log2 1/2 = 1 bit. If I show you both, I have introduced your uncertainty by -log2 1/4 = 2 bits. Shannon's theory is completely dependent
on probability; without a well-defined probability distribution on the
objects being discussed, one cannot compute Shannon information. If one
cannot realistically estimate the probabilities, any discussion of the
relevant information is likely to be bogus. In contrast,
Kolmogorov's theory of information makes no reference to probability
distributions at all. It measures the information in a string relative
to some universal computing model. Roughly speaking, the Kolmogorov
information in (or complexity of) a string x of symbols
is the length of the shortest program P and input I such that P outputs
x on input I. For example, the Kolmogorov complexity of a bit string of
length n that starts 01101010001..., where bit i is 1 if i is a prime
and 0 otherwise, is bounded above by log2 n + C, where C is a constant that takes into account the size of the program needed to test primality. Neither Shannon's nor Kolmogorov's theory has anything to do with meaning.
For example, a message can be very meaningful to humans, and yet have
little Kolmogorov information (such as the answer "yes" to a marriage
proposal), and have little meaning to humans, yet have much Kolmogorov
information (such as most strings obtained by 1000 flips of a fair
coin). Both Shannon's and Kolmogorov's theories are
well-grounded mathematically, and there are thousands of papers
explaining them and their consequences. Shannon and Kolmogorov
information obey certain well-understood laws, and the proofs are not
in doubt. Creationist information, as discussed by Meyer, is an
incoherent mess. One version of it has been introduced by William
Dembski, and criticized in detail by Mark Perakh, Richard Wein, and
many others (including me). Intelligent design creationists love to
call it "specified information" or "specified complexity" and imply
that it is widely accepted by the scientific community, but this is not
the case. There is no paper in the scientific literature that gives a
rigorous and coherent definition of creationist information; nor is it
used in scientific or mathematical investigations. Meyer
doesn't define it rigorously either, but he rejects the
well-established measures of Shannon and Kolmogorov, and wants to use a
common-sense definition of information instead. On page 86 he
approvingly quotes the following definition of information: "an
arrangement or string of characters, specifically one that accomplishes
a particular outcome or performs a communication function". For Meyer,
a string of symbols contains creationist information only if it communicates or carries out some function. However, he doesn't say explicitly how much
creationist information such a string has. Sometimes he seems to
suggest the amount of creationist information is the length of the
string, and sometime he suggests it is the negative logarithm of the
probability. But probability with respect to what? Its causal history,
or with respect to a uniform distribution of strings? Dembski's
definition has the same flaws, but Meyer's vague definition introduces
even more problems. Here are just a few. Problem 1: there is no
univeral way to communicate, so Meyer's definition is completely
subjective. If I receive a string of symbols that says "Uazekele?", I
might be tempted to ignore it as gibberish, but a Lingala speaker would
recognize it immediately and reply "Mbote". Quantities in mathematics
and science are not supposed to depend on who is measuring them. Problem
2: If we measure creationist information solely by the length of the
string, then we can wildly overestimate the information contained in a
string by padding. For example, consider a computer program P that
carries out some function, and the identical program P', except n no-op instructions have been added. If he uses the length measure, then Meyer would have to claim that P' has something like n more bits of creationist information than P. (In the Kolmogorov theory, by contrast, P' would have only at most order log n more bits of information.) Problem
3: If we measure creationist information with respect to the uniform
distribution on strings, then Meyer's claim (see below) that only
intelligence can create creationist information is incorrect. For
example, any transformation that maps a string to the same string
duplicated 1000 times creates a string that, with respect to the
uniform distribution, is wildly improbable; yet it can easily be
produced mechanically. Problem 4: If we measure creationist
information with respect to the causal history of the object in
question, then we are forced to estimate these probabilities. But since
Meyer is interested in applying his method to phenomena that are
currently poorly understood, such as the origin of life, all he's
really doing (since his creationist information is sometimes the
negative log of the probability) is estimating the probability of these
events -- something we can't reasonably do, precisely because we don't
know that causal history. In this case, all the talk about
"information" is a red herring; he might as well say "Improbable -
therefore designed!" and be done with it. Problem 5: All Meyer
seems interested in is whether the string communicates something or has
a function. But some strings communicate more than others, despite
being the same length, and some functions are more useful than others.
Meyer's measure doesn't take this into account. A string like "It will
rain tomorrow" and "Tomorrow: 2.5 cm rain" have the same length, but
clearly one is more useful than the other. Meyer, it seems to me, would
claim they have the same amount of creationist information. Problem
6: For Meyer, information in a computational context could refer to,
for example, a computer program that carries out a function. The longer
the program, the more creationist information. Now consider a very long
program that has a one-letter syntax error, so that the program will
not compile. Such a program does not carry out any function, so for
Meyer it has no information at all! Now a single "point mutation" will
magically create lots more creationist information, something Meyer
says is impossible. Even if we accept Meyer's informal
definition of information with all its flaws, his claims about
information are simply wrong. For example, he repeats the following
bogus claim over and over: p. 16: "What humans recognize as information certainly originates
from thought - from conscious or intelligent human activity... Our
experience of the world shows that what we recognize as information
invariably reflects the prior activity of conscious and intelligent
persons." p. 291: "Either way, information in a computational
context does not magically arise without the assistance of the computer
scientist." p. 341: "It follows that mind -- conscious, rational
intelligent agency -- what philosophers call "agent causation," now
stands as the only cause known to be capable of generating large
amounts of specified information starting from a nonliving state." p. 343: "Experience shows that large amounts of specified complexity or information (especially codes and languages) invariably originate from an intelligent source -- from a mind or personal agent." p. 343: "...both common experience and experimental evidence affirms intelligent design as a necessary condition (and cause) of information..." p.
376: "We are not ignorant of how information arises. We know from
experience that conscious intelligent agents can create informational
sequences and systems." p. 376: "Experience teaches that
whenever large amounts of specified complexity or information are
present in an artifact or entity whose causal story is known,
invariably creative intelligence -- intelligent design -- played a role
in the origin of that entity." p. 396: "As noted previously, as
I present the evidence for intelligent design, critics do not typically
try to dispute my specific empirical claims. They do not dispute that
DNA contains specified information, or that this type of information
always comes from a mind..." I have a simple counterexample to
all these claims: weather prediction. Meteorologists collect huge
amounts of data from the natural world: temperature, pressure, wind
speed, wind direction, etc., and process this data to produce accurate
weather forecasts. So the information they collect is
"specified" (in that it tells us whether to bring an umbrella in the
morning), and clearly hundreds, if not thousands, of these bits of
information are needed to make an accurate prediction. But these bits
of information do not come from a mind - unless Meyer wants to claim
that some intelligent being (let's say Zeus) is controlling the
weather. Perhaps intelligent design creationism is just Greek
polytheism in disguise! Claims about information are central to
Meyer's book, but, as we have seen, many of these claims are flawed.
There are lots and lots of other problems with Meyer's book. Here are
just a few; I could have listed dozens more. p. 66 "If the
capacity for building these structures and traits was something like a
signal, then a molecule that simply repeated the same signal (e.g.,
ATCG) over and over again could not get the job done. At best, such a
molecule could produce only one trait." That's not clear at all. The number
of repetitions also constitutes information, and indeed, we routinely
find that different numbers of repetitions result in different
functions. For example, Huntington's disease has been linked to
different numbers of repetitions of CAG. p. 91: "For this
reason, information scienists often say that Shannon's theory measures
the "information-carrying capacity," as opposed to the functionally
specified information or "information content," of a sequence of
characters or symbols. Meyer seems quite confused here. The term "information-carrying capacity" in Shannon's theory refers to a channel,
not a sequence of characters or symbols. Information scientists don't
talk about "functionally specified information" at all, and they don't
equate it with "information content". p. 106: (he contrasts two different telephone numbers, one randomly chosen, and one that reaches someone) "Thus, Smith's number contains specified information or functional information, whereas Jones's does not; Smith's number has information content, whereas Jones' number has only information-carrying capacity (or Shannon information)." This
is pure gibberish. Information scientists do not speak about "specified
information" or "functional information", and as I have pointed out,
"information-carrying capacity" refers to a channel, not a string of
digits. p. 106: "The opposite of a complex sequence is a
highly ordered sequence like ABCABCABCABC, in which the characters or
constituents repeat over and over due to some underlying rule,
algorithm, or general law." This is a common misconception
about complexity. While it is true that in a string with low Kolmogorov
complexity, there is an underlying rule behind it, it is not
true that the "characters or constituents" must "repeat over and over".
For example, the string of length n giving a 1 or 0 depending on
whether i is a prime number (for i from 1 to n) has low Kolmogorov
complexity, but does not "repeat over and over". p. 201 "Building a living cell not only requires specified information; it requires a vast amount of it -- and the probability of this amount of specified information arising by chance is "vanishingly small." Pure
assertion. "Specified information" is not rigorously defined. How much
specified information is there in a tornado? A rock? The arrangement of
the planets? p. 258 "If a process is orderly enough to be described by a law, it does not, by definition, produce events complex enough to convey information." False.
We speak all the time about statistical laws, such as the "law of large
numbers". Processes with a random component, such as
mutation+selection, can indeed generate complex outcomes and
information. p. 293: "Here's my version of the law of
conservation of information: "In a nonbiological context, the amount of
specified information initially present in a system Si, will generally equal or exceed the specified information content of the final system, Sf."
This rule admits only two exceptions. First, the information content of
the final state may exceed that of the initial state, Si, if
intelligent agents have elected to actualize certain potential states
while excluding others, thus increasing the specified information
content of the system. Second, the information content of the final
system may exceed that of the initial system if random processes, have,
by chance, increased the specified information content of the system.
In this latter case, the potential increase in the information content
of the system is limited by the "probabilistic resources" available to the system." Utterly
laughable. The weasel word "generally" means that he can dismiss
exceptions when they are presented. And what does "in a nonbiological
context" mean? How does biology magically manage to violate this "law"?
If people are intelligent agents, they are also assemblages of matter
and energy. How do they magically manage to increase information? p. 337 "Neither
computers by themselves nor the processes of selection and mutation
that computer algorithms simulate can produce large amounts of novel
information, at least not unless a large initial complement of
information is provided." Pure assertion. "Novel
information" is not defined. Meyer completely ignores the large
research area of artificial life, which routinely accomplishes what he
claim is impossible. The names John Koza, Thomas Ray, Karl Sims, and the term "artificial life" appear nowhere in the book's index. p. 357: "Dembski
devised a test to distinguish between these two types of patterns. If
observers can recognize, construct, identify, or describe apttern
without observing the event that exemplifies it, then the pattern
qualifies as independent from the event. If, however, the observer
cannot recognize (or has no knowledge of) the pattern apart from
observing the event, then the event does not qualify as independent." And Dembski's claim to have given a meaningful definition of "independence" is false, as shown in detail in my paper with Elsberry -- not referenced by Meyer. p. 396:
"As noted previously, as I present the evidence for intelligent design,
critics do not typically try to dispute my specific empirical claims.
They do not dispute that DNA contains specified information, or that
this type of information always comes from a mind..." Critics
know that "specified information" is a charade, a term chosen to sound
important, with no rigorous coherent definition or agreed-upon way to
measure it. Critics know that information routinely comes from other
sources, such as random processes. Mutation and selection do just fine. In summary, Meyer's claims about information are incoherent in places and wildly wrong in others. The people who have endorsed this book, from Thomas Nagel to Philip Skell to J. Scott Turner, uncritically accepting Meyer's claims about information and not even hinting that he might be wrong, should be ashamed.
|