VMS text specifics

Text specific statistics

After many (too much?) research hours, i have collected information about the text that is objective hard data. Since i do not know where to put it, i created this page which will contain this raw data.


“paragraph textpages” (=”text paragraphs pages”) and “label pages” definition, see here.

There you can also find downloadable Excel sheets.



Most common letters used in the VMS:  the percentage is mentioned in the rows



Letter positions

If you want to know more about the letter positions. Have a look here.



Gallow characters

In the CAB total text, 20314 times a gallow can be counted.
absolute gallow count

If we look at the “paragraph textpages”, there are 3905 paragraphs in 203 pages.

If we look at the first words of those paragraphs, the first words of lines in other words, there are 890 cases where the first letter is a gallow character and in 479 cases the second letter.

This adds up to 1369 which is 35% of all paragraphs that have a gallow in the first word.
Again, observe that 1369 times a gallow only reflects the first 2 letters of a first word of a paragraph.

If we only look at the first words, of every paragraph textpage, there are 193 out of 203 pages that have a gallow on the first or second character. That is 95%.

In 12 of those cases the gallow character occurred on the second position of that word:

2nd char
f8v.P.1 t cthod 1
f30r.P.1 k okchesy 1
f30v.P.1 t cthscthain 1
f35r.P.1 t cthoo 1
f38v.P.1 k okchop 1
f56r.P.1 t otchal 1
f65v.P.1 p cphy 1
f66v.P.1 k okeodof 1
f70r2.P.1 t otchey 1
f85r2.P.1 k okees 1
f90v2.P.1 p cphdaithy 1
f100v.P.1 t cthdeecthy 1


Here you see the positions of all gallow characters in all lines, in the “paragraph textpages”:

(horizontal: pages, vertical: line position)

gallow positions per line


Gallow character ligatures

Looking at the special ligatures (they occur 1832 times in the total text):
ckh (779 times), cph (185 times), cth (805 times), cfh (63 times). Compared to the 20314 gallow count, 1832 is only 9%.

Here you see the positions of all gallow ligatures in all lines, in the “paragraph textpages”:

(horizontal: pages, vertical: line position)

gallow ligature positions


ckh cph cth cfh

1832 times in the total text consist of 1694 in the “paragraph text”  and 138 (-141 some are disputable) times in the “label text”.   ( for “paragraph text”: cfh cph ckh cth 63 176 708 747)

From the first words of a page that have a gallow, 193 words, only 10 times these words have gallow ligature. They are:

(the number in the column gives the position in the word)

first page sentence, if gallow on 1st or second
first word
f8v cthod.soocth.sol.shol.otol.chol.opcheaiin.opydaiin.saiin. cthod 1
f10r pchocthy.shor.octhody.chorchy.pchodol.chopchal.ypch.kom. pchocthy 5
f30v cthscthain.shosaiin.chocthey.sho.chepchy.shor.sheaiin. cthscthain 1
f35r cthoo.r.choly.cthy.choty.char.dy. cthoo 1
f37r tocphol.shaiin.qotor.ofchor.oty.chory.daiin.otod.or. tocphol 3
f46r pcheocphy.qotedy.chety.dy.chepchx.yfcheky.osaiin.shee.qoteol.daiin.shee.dy.daly. pcheocphy 6
f52r tdokchcfhy.ycphko.ytair.shar.qofydaiin.ypchy.otchol.das.yty. tdokchcfhy 7
f65v cphy.fchecfhy.dy.dchepain.shety.qopy.fol.chpdy. cphy 1
f90v2 cphdaithy.qocfhey.opol.raiin.ofchedol.rs.shese.shodaiin.sheos. cphdaithy 1
f100v cthdeecthy.sheocphy.qoteody.ckhoor.ar.chor.oteey.daiin.qokomo. cthdeecthy 1


If we combine those two, normal gallow on First Pageword, on second position and gallow ligature on the First Pageword, we see that these are almost the same

f8v f8v.P.1
f10r f30r.P.1
f30v f30v.P.1
f35r f35r.P.1
f37r f38v.P.1
f46r f56r.P.1
f52r  (tdokchcfhy)
f65v f65v.P.1
f90v2 f90v2.P.1
f100v f100v.P.1


Unique words


unique words text paragraph pages 6920 of which 4695 * occur only once in itself
unique words label pages 2492 of which 1954 * occur only once in itself
unique words full combined text 8302
overlapping words in text and labels 1110
words in labels pages that do not occur in text paragraph pages
so these are ms unique
1382 * (2492-1110)

* : a word is also considered to be a ‘vord’ in this search which means that if a part of a word is found, that is counted as 1 hit. F.e. if we search for ‘kain’, and we find okain and qokain, we counted 2 occurences for kain.

Thus, 1110 label words do occur in the (other) text pages, that is 45% of all words in the label pages and 16% of the words in the text pages could refer to the label pages. That seems fair.

On the other there are 1382 words in the label pages, that do no occur in the entire manuscript anywhere else than on the label pages. That is strange, unless those words are so unique that they represent a unique number, such as for a catalogue or a reference nr. It is also possible that those words are real unique names or thirdly these words are in essence the same as other words (stem) but they have been conjugated or changed in the context or perhaps more specified.  For example:  carr_ in the text and in the label pages carrota.


Reverse word lookup on specific pages 

<..something to do perhaps..>



Unique words without first or last letter

If we would remove the LEFT letter of any word (i simply took all words found)
the most repeated word would be repeated 15 times in the entire list of found words.

ol 15
ar 14
aiin 13
chedy 13
y 12
or 12
oiin 12
chy 11

if we would remove the RIGHT letter of any word from that list
the most repeated word would be repeated 12 times only inside.

Top words from that exercise:

o 12
cho 12
che 12
chee 12
she 10
d 10
ch 10
l 10


Unique words, repeats

Unique words for the Entire text and per section were analyzed (words unique).

Then the column “of which with 1 repeat” show the counted words that are unique, but only occur once. All the other words are the “repeated uniques” and they have a specific higher count.


Below are only the sections of the VMS shown, not the entire text an/or the label pages:

(the last / most right columns are the repeated uniques)


As you can see, there is something different in the bio section:

the words that are repeated more than once are higher in that section only. In all other sections the words that have only 1 repeat are higher.

Take for example these other sources:


Single letter words

Single letter words are stand-alone letters.  It does not say for example ‘daiin’ but only ‘y’.

Sometimes such a letter belongs to a word nearby really, but since can not tell really, those were also counted:


rep=count repeated
avg word dist = average word distance counted in words (in this instance the word=1 long, so a word is here one long)