subscribe to our mailing list:
Application of the Letter Serial Correlation test to the Voynich Manuscript
Part 1. Experimental data
By Mark Perakh
Posted on October 20, 2009
2. Experimental procedure
3. Experimental data on measured total LSC sums in VMS
(References are listed in Part 2 of this paper - see
This paper could not be written without the
contribution by Dr. Brendan McKay who was the first to suggest the LSC test, developed the
computer program for its measurement, conducted the measurements, and critically discussed
with me all aspects of this research.
The Voynich manuscript (VMS) has been named after a rare book dealer W. Voynich
who, in 1912, purchased that manuscript in Italy. At the present time the manuscript
is held in the rare book library at Yale university. This manuscript is sometimes referred
to  as the most mysterious manuscript in the world. This manuscript is
also sometimes referred to as Roger Bacon manuscript, since one of the theories
is that it was written in England (in Latin?) by Roger Bacon in 13th
VMS is a little less than 300 pages long, contains many illustrations in color,
and is written using a script not seen anywhere else.
There is a group of enthusiasts who have devoted considerable time and effort
trying to interpret VMS. Among the members of that group there are highly qualified
linguists, some of them also well versed in mathematics and/or in cryptology. Besides that
group, many other individuals applied their skills attempting to unearth the contents of
VMS. Several of those people had claimed to have solved the puzzle of VMS. Among the
suggested solutions there are some (for example ) marked by an impressive ingenuity and
erudition of their authors. However, with a few exceptions (for example see
) these alleged solutions have been largely rejected by other scholars of VMS.
Some other alleged solutions of the VMS puzzle (for example, see ) seem to be
arbitrary, often quite fantastic concoctions not based on any factual evidence.
There is even no commonly accepted opinion as to whether VMS contains a
meaningful text, or it is a gibberish, a result of a hoax, perhaps undertaken by some
inventive medieval crook for the sake of a monetary reward. Of course, there are many
examples of very ingenious hoaxes perpetrated by talented even if not very scrupulous
adventurers, who sometimes spent enormous time and effort, for example, creating paintings
with forged signatures, allegedly belonging to the famous artists of the past. It is
known that VMS was indeed purchased, in 16th century, by Bohemian king Rudolph
the 2nd for 600 gold ducats, which at that time was a very large sum of money.
Therefore, the suggestion that VMS is just a quasi-random conglomerate of characters
which, either by design or by fluke, looks like a meaningful text, cannot be
dismissed out of hand. On the other hand, the Voynich manuscript displays many
features testifying to a rather distinctive presence of order, normally being absent in
random texts but common in meaningful texts.
Much information about VMS can be found in . Briefly, the main points
of that information are as follows. As indicated by Currier  and accepted by many other
explorers of VMS, the manuscript in question is a mix of two distinctive components,
which often are referred to as "languages" A and B (we will
refer to these two parts as VMS-A and VMS-B, or as Voynich-A and Voynich-B). Both
components A and B are written in the same script, whose symbols
are different from any alphabets known to be used anywhere in the world.
The text of VMS consists of groups of symbols separated by spaces. Usually
these symbols' groups are referred to as words. There is no confidence however
that the above mentioned spaces have no other possible function besides (or instead) of
An observation has been made  that each line of VMS seems to be a
self-contained unit, whose function was though never explained. Of course, lines of
text being distinctive units is not unknown since in rhymed (and sometimes in non-rhymed)
poetry lines of a text often indeed constitute certain semantic, or even grammatical
units. This may give rise to a hypothesis that VMS is written in the form of a
poem. Since Lucretius' famous "De natura rerum," offering
scientific treatises in poetic form had been a venerable tradition. Therefore a
hypothesis that VMS is actually a poem is not as outlandish as it may seem. However,
there are certain features of VMS which seem to make the hypothesis in question not very
likely, even if not completely out of question. One argument against the hypothesis
in question may be the length of lines in VMS which may seem to be too long to be lines of
poetry. There are known though works of poetry where lines' length is comparable to
that in VMS. Poems written in hexameter (where a line contains six meter feet) could
well comprise over forty letters per line, which is close to what we see in VMS.
Another argument against the hypothesis of VMS being a poem is that some portions of VMS
(for example the so called "recipe part") just do not look like a poetry at
all. This argument though may only mean that these parts of VMS are different from
the rest of VMS which still can be in the form of a poem.
I am neither a linguist, nor a cryptologist, nor an expert in VMS. Since,
though, the Letter Serial Correlation effect described in detail in  may be a rather
effective tool enabling one to analyze some specific statistical properties of texts
[8,10,12,13], it seemed reasonable to apply a LSC test to VMS and to see if such test can
shed light on some aspects of the VMS puzzle. In particular, LSC test enables one
to distinguish between meaningful texts and the truly random collections of
symbols  and is also rather sensitive to differences between languages. Applying LSC
test could presumably clarify whether VMS is a meaningful text or a gibberish, and whether
or not A and B are indeed two different languages, and if not, then what
is the difference between them.
One of the features of LSC effect is that it treats a text on the level of
letters and is therefore indifferent to such texts properties as existence, size,
and location of whole words, or paragraphs, chapters, etc. This feature may be
advantageous for analyzing VMS, since, for example, the question of the meaning of spaces
between "words" in VMS is moot if the LSC test is conducted.
Part 2 the discussion of data
and a possible interpretation of the Voynich manuscript's nature are offered. Since parts
1 and 2 constitute one paper, the sections, graphs and tables are numbered
consecutively throughout parts 1 and 2. To facilitate the navigation through both
parts, hyperlinks are supplied where appropriate.
Understanding the following sections requires familiarity
with Letter Serial Correlation effect, as it has been laid out and discussed in
There are several systems offered for the transliteration of Voynich script
into a Latin set of characters. One such set of symbols was suggested by P. Currier
. We have used the text of VMS transliterated into Latin characters and given in
. Excluding extraneous characters, such as periods, comments in English inserted
into VMS in  etc, we determined that VMS consists of the total of 70445 symbols (from
now on referred to as letters) of which 26852 letters are in the parts using
"language" A and 43593 letters are in the parts using "language" B.
The number of distinctive "letters" (tokens) in VMS was found to be 37,
represented in Currier's scheme by letters of the Latin alphabet plus asterisk * and plus
numerals from 0 to 9. (To the best of our knowledge, the shortest known alphabet
consists of only 11 letters, whereas the longest one, of 71 letters, hence the number of
characters in VMS' "alphabet" is well within the range for alphabetic
systems of writing, while being too small for a syllabic system).
The Letter Serial Correlation test was conducted separately for the following
three versions of the text: a) Voynich manuscript as a whole; b) All parts of VMS, that
are in "language" A, concatenated consecutively, and c) All parts of VMS that
are written in "language" B, concatenated consecutively.
The results of the measurement of the LSC sum, together with the calculated
"expected" sum are shown in Figs. 1-4, as functions of
chunk's size n [7,10]. The "expected" sum  is calculated for a
"randomized" text which has the same length as the actual text, contains the
same letters of the alphabet as the actual text, these letters being present in the same
numbers as in the actual text, but, unlike in the actual text, being distributed in the
text in a random fashion.
In Fig. 1 the measured LSC sums are shown for all three versions of the VMS in
the entire range of chunks' sizes (from n=1 to n=10000). In Fig.
2, 3 and 4, the zoomed-in graphs are shown for each of the three versions separately, the
red curves representing the expected sums Se ,
and the blue curves - the measured sums Sm.
In Table 1 the characteristic points  on the LSC curves are gathered for
all three texts in question. These characteristic points are 1)Downcross Point (DCP) 2)
Primary Minimum Point (PMP), 3) Upcross Point (UCP) and finally 4) Peak Point (PKP).
Table 1. Characteristic points on LSC graphs for Voynich texts.
In Part 2 of this paper the
discussion of the above data will be presented with the aim of producing arguments in
favor of certain hypotheses regarding the nature of VMS as a whole and of its parts A
Originally posted to Mark Perakh's website on July 2, 1999.