A couple of months ago, I finished a first reading of Stephen Meyer's new book, Signature in the Cell. It was very slow going because there is so much wrong with it, and I tried to take notes on everything that struck me.

Two things struck me as I read it: first, its essential dishonesty, and second, Meyer's significant misunderstandings of information theory. I'll devote a post to the book's many mispresentations another day, and concentrate on information theory today. I'm not a biologist, so I'll leave a detailed discussion of what's wrong with his biology to others.

In Signature in the Cell, Meyer talks about three different kinds of information: Shannon information, Kolmogorov information, and a third kind that has been invented by ID creationists and has no coherent definition. I'll call the third kind "creationist information".

Shannon's theory is a probabilistic theory. Shannon equated information with a reduction in uncertainty. He measured this by computing the reduction in entropy, where entropy is given by -log₂ p and p is a probability. For example, if I flip two coins behind my back, you don't know how either of them turned out, so your information about the results is 0. If I now show you one coin, then I have reduced your uncertainty about the results by -log₂ 1/2 = 1 bit. If I show you both, I have introduced your uncertainty by -log₂ 1/4 = 2 bits. Shannon's theory is completely dependent on probability; without a well-defined probability distribution on the objects being discussed, one cannot compute Shannon information. If one cannot realistically estimate the probabilities, any discussion of the relevant information is likely to be bogus.

In contrast, Kolmogorov's theory of information makes no reference to probability distributions at all. It measures the information in a string relative to some universal computing model. Roughly speaking, the Kolmogorov information in (or complexity of) a string x of symbols is the length of the shortest program P and input I such that P outputs x on input I. For example, the Kolmogorov complexity of a bit string of length n that starts 01101010001..., where bit i is 1 if i is a prime and 0 otherwise, is bounded above by log₂ n + C, where C is a constant that takes into account the size of the program needed to test primality.

Neither Shannon's nor Kolmogorov's theory has anything to do with meaning. For example, a message can be very meaningful to humans, and yet have little Kolmogorov information (such as the answer "yes" to a marriage proposal), and have little meaning to humans, yet have much Kolmogorov information (such as most strings obtained by 1000 flips of a fair coin).

Both Shannon's and Kolmogorov's theories are well-grounded mathematically, and there are thousands of papers explaining them and their consequences. Shannon and Kolmogorov information obey certain well-understood laws, and the proofs are not in doubt.

Creationist information, as discussed by Meyer, is an incoherent mess. One version of it has been introduced by William Dembski, and criticized in detail by Mark Perakh, Richard Wein, and many others (including me). Intelligent design creationists love to call it "specified information" or "specified complexity" and imply that it is widely accepted by the scientific community, but this is not the case. There is no paper in the scientific literature that gives a rigorous and coherent definition of creationist information; nor is it used in scientific or mathematical investigations.

Meyer doesn't define it rigorously either, but he rejects the well-established measures of Shannon and Kolmogorov, and wants to use a common-sense definition of information instead. On page 86 he approvingly quotes the following definition of information: "an arrangement or string of characters, specifically one that accomplishes a particular outcome or performs a communication function". For Meyer, a string of symbols contains creationist information only if it communicates or carries out some function. However, he doesn't say explicitly how much creationist information such a string has. Sometimes he seems to suggest the amount of creationist information is the length of the string, and sometime he suggests it is the negative logarithm of the probability. But probability with respect to what? Its causal history, or with respect to a uniform distribution of strings? Dembski's definition has the same flaws, but Meyer's vague definition introduces even more problems. Here are just a few.

Problem 1: there is no univeral way to communicate, so Meyer's definition is completely subjective. If I receive a string of symbols that says "Uazekele?", I might be tempted to ignore it as gibberish, but a Lingala speaker would recognize it immediately and reply "Mbote". Quantities in mathematics and science are not supposed to depend on who is measuring them.

Problem 2: If we measure creationist information solely by the length of the string, then we can wildly overestimate the information contained in a string by padding. For example, consider a computer program P that carries out some function, and the identical program P', except n no-op instructions have been added. If he uses the length measure, then Meyer would have to claim that P' has something like n more bits of creationist information than P. (In the Kolmogorov theory, by contrast, P' would have only at most order log n more bits of information.)

Problem 3: If we measure creationist information with respect to the uniform distribution on strings, then Meyer's claim (see below) that only intelligence can create creationist information is incorrect. For example, any transformation that maps a string to the same string duplicated 1000 times creates a string that, with respect to the uniform distribution, is wildly improbable; yet it can easily be produced mechanically.

Problem 4: If we measure creationist information with respect to the causal history of the object in question, then we are forced to estimate these probabilities. But since Meyer is interested in applying his method to phenomena that are currently poorly understood, such as the origin of life, all he's really doing (since his creationist information is sometimes the negative log of the probability) is estimating the probability of these events -- something we can't reasonably do, precisely because we don't know that causal history. In this case, all the talk about "information" is a red herring; he might as well say "Improbable - therefore designed!" and be done with it.

Problem 5: All Meyer seems interested in is whether the string communicates something or has a function. But some strings communicate more than others, despite being the same length, and some functions are more useful than others. Meyer's measure doesn't take this into account. A string like "It will rain tomorrow" and "Tomorrow: 2.5 cm rain" have the same length, but clearly one is more useful than the other. Meyer, it seems to me, would claim they have the same amount of creationist information.

Problem 6: For Meyer, information in a computational context could refer to, for example, a computer program that carries out a function. The longer the program, the more creationist information. Now consider a very long program that has a one-letter syntax error, so that the program will not compile. Such a program does not carry out any function, so for Meyer it has no information at all! Now a single "point mutation" will magically create lots more creationist information, something Meyer says is impossible.

Even if we accept Meyer's informal definition of information with all its flaws, his claims about information are simply wrong. For example, he repeats the following bogus claim over and over:

p. 16: "What humans recognize as information certainly originates from thought - from conscious or intelligent human activity... Our experience of the world shows that what we recognize as information invariably reflects the prior activity of conscious and intelligent persons."

p. 291: "Either way, information in a computational context does not magically arise without the assistance of the computer scientist."

p. 341: "It follows that mind -- conscious, rational intelligent agency -- what philosophers call "agent causation," now stands as the only cause known to be capable of generating large amounts of specified information starting from a nonliving state."

p. 343: "Experience shows that large amounts of specified complexity or information (especially codes and languages) invariably originate from an intelligent source -- from a mind or personal agent."

p. 343: "...both common experience and experimental evidence affirms intelligent design as a necessary condition (and cause) of information..."

p. 376: "We are not ignorant of how information arises. We know from experience that conscious intelligent agents can create informational sequences and systems."

p. 376: "Experience teaches that whenever large amounts of specified complexity or information are present in an artifact or entity whose causal story is known, invariably creative intelligence -- intelligent design -- played a role in the origin of that entity."

p. 396: "As noted previously, as I present the evidence for intelligent design, critics do not typically try to dispute my specific empirical claims. They do not dispute that DNA contains specified information, or that this type of information always comes from a mind..."

I have a simple counterexample to all these claims: weather prediction. Meteorologists collect huge amounts of data from the natural world: temperature, pressure, wind speed, wind direction, etc., and process this data to produce accurate weather forecasts. So the information they collect is "specified" (in that it tells us whether to bring an umbrella in the morning), and clearly hundreds, if not thousands, of these bits of information are needed to make an accurate prediction. But these bits of information do not come from a mind - unless Meyer wants to claim that some intelligent being (let's say Zeus) is controlling the weather. Perhaps intelligent design creationism is just Greek polytheism in disguise!

Claims about information are central to Meyer's book, but, as we have seen, many of these claims are flawed. There are lots and lots of other problems with Meyer's book. Here are just a few; I could have listed dozens more.

p. 66 "If the capacity for building these structures and traits was something like a signal, then a molecule that simply repeated the same signal (e.g., ATCG) over and over again could not get the job done. At best, such a molecule could produce only one trait."

That's not clear at all. The number of repetitions also constitutes information, and indeed, we routinely find that different numbers of repetitions result in different functions. For example, Huntington's disease has been linked to different numbers of repetitions of CAG.

p. 91: "For this reason, information scienists often say that Shannon's theory measures the "information-carrying capacity," as opposed to the functionally specified information or "information content," of a sequence of characters or symbols.

Meyer seems quite confused here. The term "information-carrying capacity" in Shannon's theory refers to a channel, not a sequence of characters or symbols. Information scientists don't talk about "functionally specified information" at all, and they don't equate it with "information content".

p. 106: (he contrasts two different telephone numbers, one randomly chosen, and one that reaches someone) "Thus, Smith's number contains specified information or functional information, whereas Jones's does not; Smith's number has information content, whereas Jones' number has only information-carrying capacity (or Shannon information)."

This is pure gibberish. Information scientists do not speak about "specified information" or "functional information", and as I have pointed out, "information-carrying capacity" refers to a channel, not a string of digits.

p. 106: "The opposite of a complex sequence is a highly ordered sequence like ABCABCABCABC, in which the characters or constituents repeat over and over due to some underlying rule, algorithm, or general law."

This is a common misconception about complexity. While it is true that in a string with low Kolmogorov complexity, there is an underlying rule behind it, it is not true that the "characters or constituents" must "repeat over and over". For example, the string of length n giving a 1 or 0 depending on whether i is a prime number (for i from 1 to n) has low Kolmogorov complexity, but does not "repeat over and over".

p. 201 "Building a living cell not only requires specified information; it requires a vast amount of it -- and the probability of this amount of specified information arising by chance is "vanishingly small."

Pure assertion. "Specified information" is not rigorously defined. How much specified information is there in a tornado? A rock? The arrangement of the planets?

p. 258 "If a process is orderly enough to be described by a law, it does not, by definition, produce events complex enough to convey information."

False. We speak all the time about statistical laws, such as the "law of large numbers". Processes with a random component, such as mutation+selection, can indeed generate complex outcomes and information.

p. 293: "Here's my version of the law of conservation of information: "In a nonbiological context, the amount of specified information initially present in a system S_i, will generally equal or exceed the specified information content of the final system, S_f." This rule admits only two exceptions. First, the information content of the final state may exceed that of the initial state, S_i, if intelligent agents have elected to actualize certain potential states while excluding others, thus increasing the specified information content of the system. Second, the information content of the final system may exceed that of the initial system if random processes, have, by chance, increased the specified information content of the system. In this latter case, the potential increase in the information content of the system is limited by the "probabilistic resources" available to the system."

Utterly laughable. The weasel word "generally" means that he can dismiss exceptions when they are presented. And what does "in a nonbiological context" mean? How does biology magically manage to violate this "law"? If people are intelligent agents, they are also assemblages of matter and energy. How do they magically manage to increase information?

p. 337 "Neither computers by themselves nor the processes of selection and mutation that computer algorithms simulate can produce large amounts of novel information, at least not unless a large initial complement of information is provided."

Pure assertion. "Novel information" is not defined. Meyer completely ignores the large research area of artificial life, which routinely accomplishes what he claim is impossible. The names John Koza, Thomas Ray, Karl Sims, and the term "artificial life" appear nowhere in the book's index.

p. 357: "Dembski devised a test to distinguish between these two types of patterns. If observers can recognize, construct, identify, or describe apttern without observing the event that exemplifies it, then the pattern qualifies as independent from the event. If, however, the observer cannot recognize (or has no knowledge of) the pattern apart from observing the event, then the event does not qualify as independent."

And Dembski's claim to have given a meaningful definition of "independence" is false, as shown in detail in my paper with Elsberry -- not referenced by Meyer.

Critics know that "specified information" is a charade, a term chosen to sound important, with no rigorous coherent definition or agreed-upon way to measure it. Critics know that information routinely comes from other sources, such as random processes. Mutation and selection do just fine.

In summary, Meyer's claims about information are incoherent in places and wildly wrong in others. The people who have endorsed this book, from Thomas Nagel to Philip Skell to J. Scott Turner, uncritically accepting Meyer's claims about information and not even hinting that he might be wrong, should be ashamed.