basic (crypto)analysis V: Bacon
BACON cipher.
Mr. Francis Bacon (year 1605), (not to be confused with Roger Bacon (year 1213)),
used an alphabet with 24 -letters and replaced each letter with
an block of 5 characters, containing only ‘a’ or ‘b’.
For example there c encodes to => aaaba
An encoding example:
Chedy => aaaba aabbb aabaa aaabb babba
We can now follow different rules on decoding or,
and that is the method i choose, we change the encoding when neccessary.
Always the decoding follows the Bacon rules.
From transciption to encoding we could take other rules.
We could use:
for example an a will become aaaaa etc.
or an a will become a Bacon ‘a’ as abrev. for ‘aaaaa’, but only as exception
we could make only o,e,c a Bacon ‘a’ and other letters are a ‘b’
and many other variations.
To make software search for a good combination these would be the characteristics
of a good Bacon encoded text:
For Latin: about 60-70% of the text will contain ‘a’, and the rest ‘b’
The distance of an ‘a’ is avg 1.5 and for ‘b’ it is 3.
The maximum number of ‘a’-s in a row is 17 and for ‘b’ the maximum is 5
The number of occurences of ‘bb’ is 10% with an avg distance of 9
The number of occurences of ‘aa’ is 40% with an avg distance of 2 till 3
Using First or Last position letters for Bacon Encoding
If we took all first letters from all words in the entire text cAB
or, if we would take every last letters of every word for that matter,
we would have 37,905 characters.
There are then 2 obvious possibilities if we would have used Bacon
on these letters:
all letters represent a linear Bacon group and thus form words
and must be encoded into Bacon code and then decoded.
The first case is the most obvious, cause the letters would form a Bacon serie.
F.e.
kcoocca => could form into (using 24 alphabet) =>
abaabaaabaabbababbabaaabaaaabaaaaaa*
decoding this would not form
kcoocca
because the length of that word is not a multiple of 5
But why would we encode and decode if it still comes down to the same?
yes, that is a good point.
We could have the same problem: which VMS letter represents which Latin letter?
The second Bacon approach would mean that we would not use the standard
Bacon alphabet but another encoding principle, for example
only ‘y”, “o”, “c” in the VMS would form an bacon ‘b’ and
all other letters form an bacon ‘a’.
Although all that is a bit farfetched, everything is possible of course.
Let us look at the possible length of the text then.
With 37905 characters decoding we could get a text that is smaller that is
a multiple of 5 smaller, that is 7581 characters.
Even if we would leave out the text in the drawings and tags, this becomes even smaller.
These 7500 characters would represent the entire message and that
is about 8 to 9 pages of written VMS text.
All that trouble for 8 pages of real text ? Very unlikely.
That is the reason i leave this option where it is and also because if i program
all Bacon permutation possibilities for the whole text, i could always do that for a part of the text.
Also I looked at the DNA of those letters and it is almost the same as the Dna of cAB.
Words with 5 characters only
Also I investigated the possibility of using only words that are 5 characters long, cause Bacon codes in 5 characters.
The Dna is the same as the entire VMS.
Then there are 9629 words in cAB.
Encoding and decoding those into readable characters would give the same amount of characters.
With an avg of 850 chars on a page, that is 11 pages of written text.
Is that worth all the programming and CPU time ?
At this point i hesitate. When i successfully can rule out the Bacon 5 chars theory,
i can use the exact same routine to search for other combinations.
And in a few weeks time i could have processed every combination.
First have to filter out the label texts and then i would have to write
a permutation table on the 17 VMS letters and a Bacon ‘a/b/null’.
They will have to be combined with ‘a / b / nothing’ till everything has been tried.
If that will not provide any readable text this option of Bacon can be ruled out !