Greek text attempt (2015)
febr 2015
Orthographic transcription is a transcription method that employs the standard spelling system of each target language. Transcription as a mapping from sound to script must be distinguished from transliteration, which creates a mapping from one script to another that is designed to match the original script as directly as possible.
Transliteration is the conversion of a text from one script to another.
The Voynich manuscript (VMS) is transliterated (or mapped) into EVA script. Where a Voynich letter is transliterated as ‘f’. It is my goal to translate that into a plain language. Which finally will be translated into another modern language such as English, to enable us to understand the text. (which is called transcription)
The path I follow is interesting and could be of interest during future investigations in similar textual transliteration efforts. I do not claim this method is the best or new, i simply write about my effort and use this blog to free my mind of my thoughts. By writing and creating this blog it gives me a satisfactory result, also when i do not succeed in a final good result.
Now I assume the text is written in ancient Greek because it has the best DNA language match.
Step 1. Choose a piece of text that is representative for the VMS as a whole.
I took the entire text and removed the text that is used as labels for the drawings and also textfragments that are standing on itself and are very small. That final working text is called VMS CAB NST. (nst=no small text).
One step later I found out that the text can be improved by splitting up the transcribed as ‘sh’ into 3-characters ‘c6h’. This because the symbols resemble the two characters better with something added than the Eva letter ‘s’ with an ‘h’ attached.
After comparing the word, and language dna on that change. It seemed logical to me. The text is changed an this will be referred as CAB NST2.
Also after analysis, a new text is made, which i can compare with that, the new text is called CAB NST3 and has the following replacements:
sh => c6h => 5
ch => 7
ai => 3
Step2. Identify the correct transliteration of the VMS into EVA.
Remove false character-combinations in 2grams, 3grams, 4 grams and possibly 5 grams.
Let us call that “defects”.
Step3. Try to match characters from the language into the chosen language. Use
- language DNA match
- 1 letter word match based on repeats
- 2grams match based on repeats and AVG
- etc..
The difficulty lies in the fact that a specific match on one part may give a contradictionary result on another. This is because one can not follow the path 1-on-1 but will have to skip and mix letter paths.
After many iterations of the steps a language was chosen that has the best match: Greek
febr 2015
The 2grams in the VMS were compared with 2grams in Greek
Also measured are the repeats of those 2grams in the text, and their average repeat distance.
For any 2gram that can be expressed as a % of the total amount of that average repeat distance (AVG)
In any normal language (we take Modern Greek in this example) there is a simple exponential connection between the AVG and the % of the total. That is logical because if for example a 2gram repeat itself at a high frequency, then the % of the total repeats of all 2grams must be equally high. If that is not the case, we call that a faulty 2gram. (or “defects” in the 2gram)
In the following screen view you see the CAB NST text, where the columns are sorted first on AVG (low to high) and then sorted on ‘repeated’ (high to low).
The faulty 2grams are:
rr, eh, s*, l*,
‘eh’ is remarkable because it is repeated only 4 times in 3 consecutive lines.
Ah, I did investigate that before!
See this page. Which now has been proven by this method.
3grams
The 3grams in the VMS were analyzed the same way.
For any 3gram that can be expressed as a % of the total amount of that average repeat distance (AVG) . The 3gram that have an irregular repeat, compared to the % of the total.
But in order to get the right pattern we first have to remove the very low repeats,
cause otherwise the Greek language would be minimized to almost nothing:
The removed VMS faulty 3grams are:
3gram | repeated |
ehd | 2 |
htl | 2 |
keh | 3 |
ehy | 2 |
fyc | 2 |
ot* | 2 |
ate | 2 |
ey* | 3 |
ol* | 3 |
cod | 2 |
tlo | 2 |
kil | 2 |
kyr | 2 |
Now i removed the low frequency letters and used the CAB NST2 (!) text.
This behaves immediately very good, cause there are no defects in the text anymore.
4grams
Comparing the 3grams and 4grams, I stumbled upon a big problem: the occurence percentages are much too high!
On the right you see Latin. The VMS has 4 to 6 times more occurring on ‘aiin’ ! How can we lower that ?
cab NST2 | graph | % | delta | repeated | avg dist.in graphs | delta | x | latin | % | delta | repeated | avg dist.in graphs |
1 | che | 4,242403 | 0,43 | 4850 | 23,65 | 4,15 | que | 1,022 | 0,23 | 858 | 98,11 | |
2 | c6h | 3,81204 | 0,24 | 4358 | 26,34 | 2,69 | 4,79 | ter | 0,7957 | 0,10 | 668 | 126,1 |
3 | iin | 3,57324 | 0,00 | 4085 | 28,07 | 1,73 | 5,17 | est | 0,6908 | 0,06 | 580 | 145,35 |
4 | edy | 3,569742 | 0,03 | 4081 | 25,19 | -2,88 | 5,68 | qua | 0,6289 | 0,03 | 528 | 159,5 |
5 | aii | 3,534753 | 0,82 | 4041 | 28,38 | 3,19 | 5,89 | ili | 0,6003 | 0,01 | 504 | 164,76 |
6 | qok | 2,711639 | 0,45 | 3100 | 36,7 | 8,32 | 4,58 | ent | 0,592 | 0,02 | 497 | 169,41 |
7 | cho | 2,262032 | 0,05 | 2586 | 44,32 | 7,62 | 3,93 | qui | 0,5753 | 0,05 | 483 | 173,98 |
8 | 6he | 2,213047 | 0,12 | 2530 | 45,35 | 1,03 | 4,18 | unt | 0,53 | 0,01 | 445 | 188,54 |
Even in Greek, the factor is high. Although the pattern is different.
cab NST2 | graph | % | delta | repeated | avg dist.in graphs | delta | x | GRE | % | delta | repeated |
1 | che | 4,242403 | 0,43 | 4850 | 23,65 | 1,19 | καὶ | 3,5751 | 1,70 | 3009 | |
2 | c6h | 3,81204 | 0,24 | 4358 | 26,34 | 2,69 | 2,03 | αὐτ | 1,8737 | 0,01 | 1577 |
3 | iin | 3,57324 | 0,00 | 4085 | 28,07 | 1,73 | 1,91 | τοῦ | 1,8677 | 0,89 | 1572 |
4 | edy | 3,569742 | 0,03 | 4081 | 25,19 | -2,88 | 3,64 | ὐτο | 0,9814 | 0,21 | 826 |
5 | aii | 3,534753 | 0,82 | 4041 | 28,38 | 3,19 | 4,57 | τὸν | 0,7735 | 0,06 | 651 |
6 | qok | 2,711639 | 0,45 | 3100 | 36,7 | 8,32 | 3,79 | των | 0,7153 | 0,03 | 602 |
7 | cho | 2,262032 | 0,05 | 2586 | 44,32 | 7,62 | 3,29 | σεν | 0,6867 | 0,09 | 578 |
8 | 6he | 2,213047 | 0,12 | 2530 | 45,35 | 1,03 | 3,70 | εὶπ | 0,5976 | 0,03 | 503 |
9 | hed | 2,090586 | 0,02 | 2390 | 43,07 | -2,28 | 3,67 | πεν | 0,5703 | 0,00 | 480 |
10 | oke | 2,070468 | 0,21 | 2367 | 48,21 | 5,14 | 3,65 | της | 0,5667 | 0,01 | 477 |
cab NST2 | graph | % | delta | repeated | avg dist.in graphs | delta | x | latin | % | delta | repeated | avg dist.in graphs | ||
1 | aiin | 4,650814 | 1,49 | 3726 | 21,92 | 6,77 | fili | 0,6872 | 0,17 | 433 | 23,65 | |||
2 | c6he | 3,15796 | 0,70 | 2530 | 32,3 | 10,38 | 6,14 | terr | 0,5142 | 0,02 | 324 | 26,34 | ||
3 | hedy | 2,456469 | 0,54 | 1968 | 37,35 | 5,05 | 4,99 | tque | 0,492 | 0,01 | 310 | 28,07 | ||
4 | ched | 1,917244 | 0,08 | 1536 | 47,92 | 10,57 | 4 | erra | 0,4793 | 0,01 | 302 | 25,19 | ||
5 | daii | 1,836111 | 0,11 | 1471 | 55,5 | 7,58 | 3,93 | omin | 0,4666 | 0,07 | 294 | 28,38 | ||
6 | qoke | 1,722524 | 0,13 | 1380 | 58,76 | 3,26 | 4,39 | ixit | 0,392 | 0,00 | 247 | 36,7 | ||
7 | okee | 1,593959 | 0,10 | 1277 | 63,25 | 4,49 | 4,12 | itqu | 0,3873 | 0,00 | 244 | 44,32 | ||
8 | eedy | 1,489109 | 0,19 | 1193 | 61,61 | -1,64 | 3,86 | ibus | 0,3857 | 0,01 | 243 | 45,35 | ||
9 | cheo | 1,298134 | 0,02 | 1040 | 78,07 | 16,46 | 3,42 | domi | 0,3793 | 0,01 | 239 | 43,07 | ||
10 | okai | 1,279411 | 0,02 | 1025 | 79,53 | 1,46 | 3,43 | erun | 0,373 | 0,00 | 235 | 48,21 |
cab NST2 | graph | % | delta | repeated | avg dist.in graphs | delta | x | GRE | % | delta | repeated | avg dist.in graphs | ||
1 | aiin | 4,650814 | 1,49 | 3726 | 21,92 | 3,02 | αυτο | 1,541 | 0,23 | 854 | 67,61 | |||
2 | c6he | 3,15796 | 0,70 | 2530 | 32,3 | 10,38 | 2,41 | υτου | 1,3082 | 0,54 | 725 | 79,64 | ||
3 | hedy | 2,456469 | 0,54 | 1968 | 37,35 | 5,05 | 3,18 | ιπεν | 0,7723 | 0,01 | 428 | 135,76 | ||
4 | ched | 1,917244 | 0,08 | 1536 | 47,92 | 10,57 | 2,53 | ειπε | 0,7578 | 0,09 | 420 | 137,94 | ||
5 | daii | 1,836111 | 0,11 | 1471 | 55,5 | 7,58 | 2,75 | αυτω | 0,6676 | 0,09 | 370 | 156,32 | ||
6 | qoke | 1,722524 | 0,13 | 1380 | 58,76 | 3,26 | 2,99 | ησεν | 0,5756 | 0,07 | 319 | 182,56 | ||
7 | okee | 1,593959 | 0,10 | 1277 | 63,25 | 4,49 | 3,14 | αυτη | 0,507 | 0,05 | 281 | 198,4 | ||
8 | eedy | 1,489109 | 0,19 | 1193 | 61,61 | -1,64 | 3,26 | τους | 0,4565 | 0,04 | 253 | 229,85 | ||
9 | cheo | 1,298134 | 0,02 | 1040 | 78,07 | 16,46 | 3,1 | κυρι | 0,4186 | 0,01 | 232 | 244,74 | ||
10 | okai | 1,279411 | 0,02 | 1025 | 79,53 | 1,46 | 3,1 | πρὸς | 0,4132 | 0,02 | 229 | 246,33 |