Text specific statistics
After many (too much?) research hours, i have collected information about the text that is objective hard data. Since i do not know where to put it, i created this page which will contain this raw data.
“paragraph textpages” (=”text paragraphs pages”) and “label pages” definition, see here.
There you can also find downloadable Excel sheets.
Most common letters used in the VMS: the percentage is mentioned in the rows
If you want to know more about the letter positions. Have a look here.
If we look at the “paragraph textpages”, there are 3905 paragraphs in 203 pages.
If we look at the first words of those paragraphs, the first words of lines in other words, there are 890 cases where the first letter is a gallow character and in 479 cases the second letter.
This adds up to 1369 which is 35% of all paragraphs that have a gallow in the first word.
Again, observe that 1369 times a gallow only reflects the first 2 letters of a first word of a paragraph.
If we only look at the first words, of every paragraph textpage, there are 193 out of 203 pages that have a gallow on the first or second character. That is 95%.
In 12 of those cases the gallow character occurred on the second position of that word:
Here you see the positions of all gallow characters in all lines, in the “paragraph textpages”:
(horizontal: pages, vertical: line position)
Gallow character ligatures
Looking at the special ligatures (they occur 1832 times in the total text):
ckh (779 times), cph (185 times), cth (805 times), cfh (63 times). Compared to the 20314 gallow count, 1832 is only 9%.
Here you see the positions of all gallow ligatures in all lines, in the “paragraph textpages”:
(horizontal: pages, vertical: line position)
1832 times in the total text consist of 1694 in the “paragraph text” and 138 (-141 some are disputable) times in the “label text”. ( for “paragraph text”: cfh cph ckh cth 63 176 708 747)
From the first words of a page that have a gallow, 193 words, only 10 times these words have gallow ligature. They are:
(the number in the column gives the position in the word)
first page sentence, if gallow on 1st or second
If we combine those two, normal gallow on First Pageword, on second position and gallow ligature on the First Pageword, we see that these are almost the same
|unique words||text paragraph pages||6920||of which||4695||* occur only once in itself|
|unique words||label pages||2492||of which||1954||* occur only once in itself|
|unique words||full combined text||8302|
|overlapping words in text and labels||1110|
|words in labels pages that do not occur in text paragraph pages
so these are ms unique
* : a word is also considered to be a ‘vord’ in this search which means that if a part of a word is found, that is counted as 1 hit. F.e. if we search for ‘kain’, and we find okain and qokain, we counted 2 occurences for kain.
Thus, 1110 label words do occur in the (other) text pages, that is 45% of all words in the label pages and 16% of the words in the text pages could refer to the label pages. That seems fair.
On the other there are 1382 words in the label pages, that do no occur in the entire manuscript anywhere else than on the label pages. That is strange, unless those words are so unique that they represent a unique number, such as for a catalogue or a reference nr. It is also possible that those words are real unique names or thirdly these words are in essence the same as other words (stem) but they have been conjugated or changed in the context or perhaps more specified. For example: carr_ in the text and in the label pages carrota.
Reverse word lookup on specific pages
<..something to do perhaps..>
Unique words without first or last letter
If we would remove the LEFT letter of any word (i simply took all words found)
the most repeated word would be repeated 15 times in the entire list of found words.
if we would remove the RIGHT letter of any word from that list
the most repeated word would be repeated 12 times only inside.
Top words from that exercise:
Unique words, repeats
Unique words for the Entire text and per section were analyzed (words unique).
Then the column “of which with 1 repeat” show the counted words that are unique, but only occur once. All the other words are the “repeated uniques” and they have a specific higher count.
Below are only the sections of the VMS shown, not the entire text an/or the label pages:
(the last / most right columns are the repeated uniques)
As you can see, there is something different in the bio section:
the words that are repeated more than once are higher in that section only. In all other sections the words that have only 1 repeat are higher.
Take for example these other sources:
Single letter words
Single letter words are stand-alone letters. It does not say for example ‘daiin’ but only ‘y’.
Sometimes such a letter belongs to a word nearby really, but since can not tell really, those were also counted:
avg word dist = average word distance counted in words (in this instance the word=1 long, so a word is here one long)