Home| Letters| Links| RSS| About Us| Contact Us

On the Frontline

What's New

Table of Contents

Index of Authors

Index of Titles

Index of Letters

Mailing List

subscribe to our mailing list:


Critique of Intelligent Design

Evolution vs. Creationism

The Art of ID Stuntmen

Faith vs Reason

Anthropic Principle

Autopsy of the Bible code

Science and Religion

Historical Notes


Serious Notions with a Smile


Letter Serial Correlation

Mark Perakh's Web Site

Application of the Letter Serial Correlation test to the Voynich Manuscript

Part 1. Experimental data

By Mark Perakh

Posted on October 20, 2009

1. Introduction

2. Experimental procedure

3. Experimental data on measured total LSC sums in VMS

(References are listed in Part 2 of this paper - see http://www.talkreason.org/articles/voynich2.cfm)

1. Introduction

This paper could not be written without the contribution by Dr. Brendan McKay who was the first to suggest the LSC test, developed the computer program for its measurement, conducted the measurements, and critically discussed with me all aspects of this research.

The Voynich manuscript (VMS) has been named after a rare book dealer W. Voynich who, in 1912, purchased that manuscript in Italy. At the present time the manuscript is held in the rare book library at Yale university. This manuscript is sometimes referred to [1] as the most mysterious manuscript in the world. This manuscript is also sometimes referred to as Roger Bacon manuscript, since one of the theories is that it was written in England (in Latin?) by Roger Bacon in 13th century.

VMS is a little less than 300 pages long, contains many illustrations in color, and is written using a script not seen anywhere else.

There is a group of enthusiasts who have devoted considerable time and effort trying to interpret VMS. Among the members of that group there are highly qualified linguists, some of them also well versed in mathematics and/or in cryptology. Besides that group, many other individuals applied their skills attempting to unearth the contents of VMS. Several of those people had claimed to have solved the puzzle of VMS. Among the suggested solutions there are some (for example [2]) marked by an impressive ingenuity and erudition of their authors. However, with a few exceptions (for example see [3]) these alleged solutions have been largely rejected by other scholars of VMS. Some other alleged solutions of the VMS puzzle (for example, see [4]) seem to be arbitrary, often quite fantastic concoctions not based on any factual evidence.

There is even no commonly accepted opinion as to whether VMS contains a meaningful text, or it is a gibberish, a result of a hoax, perhaps undertaken by some inventive medieval crook for the sake of a monetary reward. Of course, there are many examples of very ingenious hoaxes perpetrated by talented even if not very scrupulous adventurers, who sometimes spent enormous time and effort, for example, creating paintings with forged signatures, allegedly belonging to the famous artists of the past. It is known that VMS was indeed purchased, in 16th century, by Bohemian king Rudolph the 2nd for 600 gold ducats, which at that time was a very large sum of money. Therefore, the suggestion that VMS is just a quasi-random conglomerate of characters which, either by design or by fluke, looks like a meaningful text, cannot be dismissed out of hand. On the other hand, the Voynich manuscript displays many features testifying to a rather distinctive presence of order, normally being absent in random texts but common in meaningful texts.

Much information about VMS can be found in [5]. Briefly, the main points of that information are as follows. As indicated by Currier [6] and accepted by many other explorers of VMS, the manuscript in question is a mix of two distinctive components, which often are referred to as "languages" A and B (we will refer to these two parts as VMS-A and VMS-B, or as Voynich-A and Voynich-B). Both components A and B are written in the same script, whose symbols are different from any alphabets known to be used anywhere in the world.

The text of VMS consists of groups of symbols separated by spaces. Usually these symbols' groups are referred to as words. There is no confidence however that the above mentioned spaces have no other possible function besides (or instead) of separating words.

An observation has been made [6] that each line of VMS seems to be a self-contained unit, whose function was though never explained. Of course, lines of text being distinctive units is not unknown since in rhymed (and sometimes in non-rhymed) poetry lines of a text often indeed constitute certain semantic, or even grammatical units. This may give rise to a hypothesis that VMS is written in the form of a poem. Since Lucretius' famous "De natura rerum," offering scientific treatises in poetic form had been a venerable tradition. Therefore a hypothesis that VMS is actually a poem is not as outlandish as it may seem. However, there are certain features of VMS which seem to make the hypothesis in question not very likely, even if not completely out of question. One argument against the hypothesis in question may be the length of lines in VMS which may seem to be too long to be lines of poetry. There are known though works of poetry where lines' length is comparable to that in VMS. Poems written in hexameter (where a line contains six meter feet) could well comprise over forty letters per line, which is close to what we see in VMS. Another argument against the hypothesis of VMS being a poem is that some portions of VMS (for example the so called "recipe part") just do not look like a poetry at all. This argument though may only mean that these parts of VMS are different from the rest of VMS which still can be in the form of a poem.

I am neither a linguist, nor a cryptologist, nor an expert in VMS. Since, though, the Letter Serial Correlation effect described in detail in [7] may be a rather effective tool enabling one to analyze some specific statistical properties of texts [8,10,12,13], it seemed reasonable to apply a LSC test to VMS and to see if such test can shed light on some aspects of the VMS puzzle. In particular, LSC test enables one to distinguish between meaningful texts and the truly random collections of symbols [8] and is also rather sensitive to differences between languages. Applying LSC test could presumably clarify whether VMS is a meaningful text or a gibberish, and whether or not A and B are indeed two different languages, and if not, then what is the difference between them.

One of the features of LSC effect is that it treats a text on the level of letters and is therefore indifferent to such text’s properties as existence, size, and location of whole words, or paragraphs, chapters, etc. This feature may be advantageous for analyzing VMS, since, for example, the question of the meaning of spaces between "words" in VMS is moot if the LSC test is conducted.

Part 2 the discussion of data and a possible interpretation of the Voynich manuscript's nature are offered. Since parts 1 and 2 constitute one paper, the sections, graphs and tables are numbered consecutively throughout parts 1 and 2. To facilitate the navigation through both parts, hyperlinks are supplied where appropriate.

Understanding the following sections requires familiarity with Letter Serial Correlation effect, as it has been laid out and discussed in [7,8,10-13].

2. Experimental procedure

There are several systems offered for the transliteration of Voynich script into a Latin set of characters. One such set of symbols was suggested by P. Currier [6]. We have used the text of VMS transliterated into Latin characters and given in [9]. Excluding extraneous characters, such as periods, comments in English inserted into VMS in [9] etc, we determined that VMS consists of the total of 70445 symbols (from now on referred to as letters) of which 26852 letters are in the parts using "language" A and 43593 letters are in the parts using "language" B. The number of distinctive "letters" (tokens) in VMS was found to be 37, represented in Currier's scheme by letters of the Latin alphabet plus asterisk * and plus numerals from 0 to 9. (To the best of our knowledge, the shortest known alphabet consists of only 11 letters, whereas the longest one, of 71 letters, hence the number of characters in VMS' "alphabet" is well within the range for alphabetic systems of writing, while being too small for a syllabic system).

The Letter Serial Correlation test was conducted separately for the following three versions of the text: a) Voynich manuscript as a whole; b) All parts of VMS, that are in "language" A, concatenated consecutively, and c) All parts of VMS that are written in "language" B, concatenated consecutively.

3. Experimental data on measured "total LSC sums" in VMS

The results of the measurement of the LSC sum, together with the calculated "expected" sum are shown in Figs. 1-4, as functions of chunk's size n [7,10]. The "expected" sum [7] is calculated for a "randomized" text which has the same length as the actual text, contains the same letters of the alphabet as the actual text, these letters being present in the same numbers as in the actual text, but, unlike in the actual text, being distributed in the text in a random fashion.

In Fig. 1 the measured LSC sums are shown for all three versions of the VMS in the entire range of chunks' sizes (from n=1 to n=10000). In Fig. 2, 3 and 4, the zoomed-in graphs are shown for each of the three versions separately, the red curves representing the expected sums Se , and the blue curves - the measured sums Sm.

In Table 1 the characteristic points [10] on the LSC curves are gathered for all three texts in question. These characteristic points are 1)Downcross Point (DCP) 2) Primary Minimum Point (PMP), 3) Upcross Point (UCP) and finally 4) Peak Point (PKP).

Table 1. Characteristic points on LSC graphs for Voynich texts.





















In Part 2 of this paper the discussion of the above data will be presented with the aim of producing arguments in favor of certain hypotheses regarding the nature of VMS as a whole and of its parts A and B.

Originally posted to Mark Perakh's website on July 2, 1999.