Breaking the Vigenère cipher…

Whenever I encounter a problem that interests me, I get some kinda obsession to solve it. Happens all the time! And this time, it was in a course on “Cryptography” (in Coursera). Well, it’s been an year since I pursued online courses, because I’m simply more occupied by my own works, and I also got bored of those submissions, deadlines, quizzes, etc.

One fine day, I got a mail advertising this course, which got me curious. I wanna know how it’s gonna be. After all, cryptography is the one that got me into some serious programming in Python & Javascript (about a year ago). Anyways, there was this problem, where we’re expected to decipher a “ciphertext” encrypted using a variant of the Vigenère cipher. I got interested as soon as I saw that…

Even though they expect people to solve the problem in C/C++, I (being a Pythonist) decided to get it done with Python! The Vigenère cipher is nothing but a major update to the Caesar cipher. I presume you already know about the Caesar cipher has a keyspace of “1” – just a number to say how much to shift all the characters (so, anyone with a pen & paper can crack it, as it contains only 26 possibilities), but Vigenère extended it, giving additional security.

The Vigenère cipher

Put it simply, it’s just a byte-wise Caesar cipher. That means, every character in the plaintext (phrase) is shifted according to every character in the key[*] (if the key is of length n, then the ciphertext is gonna have 26^n possibilities!). Here’s a sample plaintext (based on uppercase English), say “UNICORNS” being encrypted with the key “CATS” (because I like both!)…

[*] Newbies: For example, “U” is the 20th alphabet, and “C” is the second one (assuming “A” is the zeroth one). Now, 20+2 = 22, which means the 22nd alphabet is “W”.

As you can see, the thing returns a gibberish horror because each character is shifted based on Caesar cipher. Now, recovering the plaintext from it is impossible! Because, you gotta check each & every character with a key, until some reasonable message comes out of it (???). But, what’s a reasonable message? There are a hell lot of reasonable messages in that thing. Think about it for a while. While shifting “WNBUQRGK” according to “CATS” returns “UNICONRS” back, there should be a hell lot of keys other than “CATS” which could return a reasonable message. For instance, if we use the key “KJKIQJDS” for decrypting, we get “MERMAIDS”. Well, doesn’t it seem reasonable?

So, this scheme survive for a few centuries as a result of this confusion & delicacy in the resulting ciphertext, until someone came up with a nice idea. Now, our problem is slightly complicated in the sense that it makes use of XOR and hexadecimals instead of usual addition (because it’s way cooler than the alternatives, that every crypto scheme makes use of both of them!). And, it also uses all 256 ASCII characters instead of the 26 English alphabets, leading to 256^n possibilities! While this scheme ultimately renders the brute-forcing thing impossible, it’s still crack’able!

My job is to find the hole, crack it, and decrypt the monstrous thing!

Where to start?

The idea is to make use of known letter frequencies. In every spoken/written language (like English), there are readable words, which means some characters (like vowels) are repeated and so there’s a greater chance to find them in a sentence compared to rare ones like “z”. When I first began the problem, I mistakenly thought that the frequencies for every single character are required for the analysis. So, I went ahead in the lookout for the data…

After some exhaustive search, I found a few websites (like this one), where the people had published their analysis of letter frequencies. I wasn’t satisfied with those, because the statistics varied between websites. Some say that “e” is the most used character (some say “a”), but they never mention whether it’s the uppercase or lowercase species. When they do include it, they never mention their work, like how they did it, what text corpus they had chosen, etc. Plus, I had a reasonable doubt on this, because logically speaking, “space” should be the mostly frequent character in English (and, that was also the basis of my solution to a Project Euler problem)…

I also encountered Google’s Ngram viewer along the way. Although it’s beautiful, searching through its data would take a lifetime! To get a gist of it, the database of how frequent “a” has appeared in Google’s book-scanner occupies more than 300 MB, because there’s a timestamp which answers the “when and where it’s been found” in the database. It’s simply too much for my work! So, these failures induced an urge to find the data all by myself! That became my obsession… (Obsession within an obsession)

Now, I needed a corpus (just a fancy name for the bunch of texts containing our usual written English – say, novels). In the meantime, I found an awesome website, where a guy had analyzed the letter frequencies, which was an interesting read. It would’ve been useful for me, had he not converted everything to uppercase! Anyways, he had mentioned the source of his corpus (a website named “Project Gutenberg”, which owns a library of old books whose copyright has expired).

Frequencies of characters…

I surfed through the website and found a page which listed the top downloaded books of all time. I was happy about two things. Firstly, the site had an option to provide text files. And second, there was a specific format to how I can obtain the texts, which means that I don’t have to go around and download one by one. There’s Python with its “url-lib” and “lxml.html” packages which can dismantle the whole webpage. In fact, I use them to download stuff whenever I find a specific format to accessing the contents. Anyways, I downloaded about 60 books for my analysis.

One needs to be careful about the load of requests one sends to their server, because they block the particular IP for 24 hours if they get a lot of requests. I don’t really have to worry about that, because my internet provider insists me to make use of dynamic IP. So, whenever I get a “403-Forbidden” page (which you can notice, because all of a sudden, one starts getting ~9 kB files), I stop the download, reset my IP and start it allover again… (Easy PC Python-easy)

Now, these files (so called “corpus”), have some rubbish things at the starts & ends, which needs to be clipped off! For instance, there’s a shitty phrase like, “START OF PROJECT GUTENBERG…” and a similar one at the end (along with a license) in every file, which is definitely gonna affect our frequency analysis. So, I wrote a few lines to revise the code, and finally count the frequencies in a given ASCII range. For our analysis, we don’t really need tabs, spaces and probably any character with ASCII value less than 32, or greater than 127. So, my helper script was finally complete. Now, I do something like…

get()       # Downloads the books
revise()    # Though it makes a rough revision, it needs some hand-checkup
stats()     # Counts the characters, formats and prints to an output file

And, I’m done. This gives an interesting output like this…

ASCII Character Frequencies: Scanned a total of 56951127 characters!

Most frequently found character: ' '
Least frequently found character: '~'

ASCII      Char     Average (%)               Occurrences
=====      ====     ===========               ===========
32                 17.9291043705 %            10210827
101        e        9.5346366017 %            5430083
116        t        6.8514974252 %            3902005
97         a        5.9772039279 %            3404085
111        o        5.8514469784 %            3332465
110        n        5.2419893289 %            2985372
104        h        4.9938941507 %            2844079
105        i        4.8321150870 %            2751944
115        s        4.7474495105 %            2703726
114        r        4.4186447794 %            2516468
100        d        3.3209983711 %            1891346
108        l        3.0210798111 %            1720539
117        u        2.1814967771 %            1242387
109        m        1.9031827061 %            1083884
99         c        1.7308331756 %            985729
102        f        1.7051058533 %            971077
119        w        1.6535423434 %            941711
44         ,        1.5712349292 %            894836
121        y        1.4854912353 %            846004
103        g        1.4438291976 %            822277
112        p        1.1966611302 %            681512
98         b        1.0497263733 %            597831
46         .        0.8777631389 %            499896
118        v        0.7431038195 %            423206
107        k        0.5683627648 %            323689
73         I        0.4847261407 %            276057
34         "        0.3529447275 %            201006
84         T        0.3004066978 %            171085
39         '        0.2820225138 %            160615
65         A        0.2626480069 %            149581
45         -        0.2614557566 %            148902
59         ;        0.1900331138 %            108226
83         S        0.1882701988 %            107222
72         H        0.1573243669 %            89598
69         E        0.1528942526 %            87075
79         O        0.1382799677 %            78752
77         M        0.1337199174 %            76155
58         :        0.1316146035 %            74956
78         N        0.1189493581 %            67743
87         W        0.1182241749 %            67330
82         R        0.1160538930 %            66094
67         C        0.1144174021 %            65162
120        x        0.1123577414 %            63989
76         L        0.1113516156 %            63416
66         B        0.1093692141 %            62287
68         D        0.0974045694 %            55473
63         ?        0.0942035791 %            53650
80         P        0.0891325645 %            50762
33         !        0.0820159362 %            46709
49         1        0.0790888651 %            45042
71         G        0.0776279634 %            44210
70         F        0.0776209398 %            44206
113        q        0.0719336072 %            40967
106        j        0.0668819074 %            38090
50         2        0.0499375544 %            28440
89         Y        0.0495056050 %            28194
122        z        0.0399430199 %            22748
85         U        0.0388929968 %            22150
74         J        0.0383346233 %            21832
51         3        0.0359343196 %            20465
75         K        0.0296692987 %            16897
86         V        0.0276166616 %            15728
95         _        0.0276149057 %            15727
52         4        0.0272479243 %            15518
48         0        0.0242049644 %            13785
53         5        0.0238994393 %            13611
54         6        0.0215974655 %            12300
56         8        0.0212252165 %            12088
55         7        0.0200435015 %            11415
57         9        0.0181278239 %            10324
41         )        0.0174798999 %            9955
40         (        0.0173604993 %            9887
91         [        0.0139768964 %            7960
93         ]        0.0139470462 %            7943
81         Q        0.0100647701 %            5732
88         X        0.0083878937 %            4777
61         =        0.0052202654 %            2973
42         *        0.0051166678 %            2914
90         Z        0.0042703281 %            2432
124        |        0.0031518252 %            1795
62         >        0.0008164895 %            465
60         <        0.0008147337 %            464
38         &        0.0006830418 %            389
43         +        0.0004530200 %            258
123        {        0.0003423988 %            195
125        }        0.0003388871 %            193
47         /        0.0002370454 %            135
36         $        0.0001246683 %            71
35         #        0.0000193148 %            11
96         `        0.0000158030 %            9
94         ^        0.0000105353 %            6
37         %        0.0000035118 %            2
126        ~        0.0000017559 %            1

So, “space” is the most frequent character (which I already know). I was happy with the output, because I have the statistics for all ASCII letter frequencies. Now, it’s time to crack that ciphertext!

Finding the key length…

This is the necessary part before doing anything. We wanna know where the key ends. That’s what makes this scheme vulnerable. Because, once the key ends, the key is repeated throughout the remaining text. In the above example, you can notice that once “CATS” is used twice, because the key-length is less than that of the plaintext. If that’s equal, then the scheme is invulnerable. This gives the opportunity to peek inside, because some characters would be shifted using the same key.

Take the above example. Let’s split the text into sequences which have characters that are shifted using the same key character. This is the weak spot!

To find the length of the key, I used the “Index of Coincidence“. It gives the chance of picking two similar characters in a phrase. If the language is English, then we can expect a greater chance. If it’s gibberish, then the result is gonna be negligible. So, our ciphertext (even though it’s encrypted), is liable to have a lot of similar characters, since it capsules English.

Obviously, the key won’t be of length 1, as it’d then be just a Caesar cipher. So, I began with a keyspace of 2. I split every second byte, bunch it into a sequence, and we get two sequences. Then, assume a keyspace of 3, split every third byte and so on…

In our example, the keyspace is 4, and so we get four sequences.

This is just a simple example. Actually, the larger the ciphertext, the more vulnerable it is, because we have a lot of data to carry out our statistics. Once this splitting is done, we keep on calculating the IC for all possible sequences until we find the hole! When we do that to the 470-characters long ciphertext, we get something like this…

Assuming [keyspace=2]...
Sequence 1 (length=235): 0.009638
Sequence 2 (length=235): 0.013966
Average IC: 0.011802

Assuming [keyspace=3]...
Sequence 1 (length=157): 0.012821
Sequence 2 (length=157): 0.008901
Sequence 3 (length=156): 0.011497
Average IC: 0.011073

Assuming [keyspace=4]...
Sequence 1 (length=118): 0.008982
Sequence 2 (length=118): 0.013038
Sequence 3 (length=117): 0.008252
Sequence 4 (length=117): 0.013852
Average IC: 0.011031

Assuming [keyspace=5]...
Sequence 1 (length=94): 0.013956
Sequence 2 (length=94): 0.009380
Sequence 3 (length=94): 0.013269
Sequence 4 (length=94): 0.010981
Sequence 5 (length=94): 0.009380
Average IC: 0.011393

Assuming [keyspace=6]...
Sequence 1 (length=79): 0.007141
Sequence 2 (length=79): 0.010386
Sequence 3 (length=78): 0.010989
Sequence 4 (length=78): 0.019314
Sequence 5 (length=78): 0.006993
Sequence 6 (length=78): 0.009657
Average IC: 0.010747

Assuming [keyspace=7]...
Sequence 1 (length=68): 0.072432
Sequence 2 (length=67): 0.143826
Sequence 3 (length=67): 0.056536
Sequence 4 (length=67): 0.060606
Sequence 5 (length=67): 0.057440
Sequence 6 (length=67): 0.087743
Sequence 7 (length=67): 0.048847
Average IC: 0.075347

As you can see, the IC is much higher when the length of key is 7 (about 7.5%, which we can safely assume as the perfect split!). If we proceed, we get a higher value for other multiples of 7 for obvious reasons. Next stop – crack the cipher!

Revealing the key!

When I did some googling, I noticed people talking about something called “Chi-squared statistic“, which makes use of the frequency analysis that I had just created. It just felt like it’s gonna consume too much of my time. So, I decided to abandon it!

My idea was simple. It makes use of those sequences again. So, I take a sequence, and XOR with all 256 ASCII characters (Yep, we get 256 sequences from all seven sequences). Now, we make sure that only readable characters exist in the recovered sequences (which means, only those with an ASCII value above 31, and below 128 – that’s readable English). I found the answer already.

Just to make this task automated, I added another snippet which turns all alphabets into lowercase, counts the ASCII values of the alphabets and neglects the other ASCII values. This gives a score for each sequence, and our text is the one which has the highest score, because our text should have the most English alphabets, which means highest possible score of all the other garbage!

When I run my script, I get something like this, which shows the potential sequence along with the key used to obtain it (from counting the spaces and so forth, which is far enough to filter the plaintext sequences logically). There’s also the alphabet score (which filters out the key automatically).


    SEQUENCE 1 - analysis results...
Crsr utu, t  i  ctstyedu,taf,rnpeen taycre,i thsheeeatn,endees,ibe .   BA     5579
M|}|.{z{".z..g..mz}zwkj{"zoh"|`~kk`.zowm|k"g.zf}fkkkoz`"k`jkk}"glk.    B4     3956

    SEQUENCE 2 - analysis results...
3 a  %$$a.)2""(1$5o.a$a/aa/aa  5,a$$$aaao9a&$($a5aa"'a7aa.a9a5a5$/$    5E     2356
7$e$$!  e*-6&&,5 1k*e e+ee+ee$$1(e   eeek=e" , e1ee&#e3ee*e=e1e1 +     5A     2436
1"c""'&&c,+0  *3&7m,c&c-cc-cc""7.c&&&cccm;c$&*&c7cc %c5cc,c;c7c7&-&    5C     2396
6%d%% !!d+,7''-4!0j+d!d*dd*dd%%0)d!!!dddj<d#!-!d0dd'"d2dd+d<d0d0!*!    5B     2412
4'f''"##f).5%%/6#2h)f#f(ff(ff''2+f###fffh>f!#/#f2ff% f0ff)f>f2f2#(#    59     2452
5&g&&#""g(/4$$.7"3i(g"g)gg)gg&&3*g"""gggi?g "."g3gg$!g1gg(g?g3g3")"    58     2476
:)h)),--h' ;++!8-<f'h-h&hh&hh))<%h---hhhf0h/-!-h<hh+.h>hh'h0h<h<-&-    57     2492
;(i((-,,i&!:** 9,=g&i,i'ii'ii((=$i,,,iiig1i., ,i=ii*/i?ii&i1i=i=,',    56     2516
</n//*++n!&=--'>+:`!n+n nn nn//:#n+++nnn`6n)+'+n:nn-(n8nn!n6n:n:+ +    51     2420
=.o..+**o '<,,&?*;a o*o!oo!oo..;"o***oooa7o(*&*o;oo,)o9oo o7o;o;*!*    50     2636
?,m,,)((m"%>..$=(9c"m(m#mm#mm,,9 m(((mmmc5m*($(m9mm.+m;mm"m5m9m9(#(    52     2596
ra aadee ohsccipet.o e n  n  aatm eee   .x geie t  cf v  o x t tene    1F     4563
|o.oojkk.af}mmg~kz a.k.`..`..oozc.kkk... v.ikgk.z..mh.x..a.v.z.zk`k    11     4038
$7v77233v9>%55?&3"x9v3v8vv8vv77";v333vvvx.v13?3v"vv50v vv9v.v"v"383    49     2836
*9x99<==x70+;;1(=,v7x=x6xx6xx99,5x===xxxv x?=1=x,xx;>x.xx7x x,x,=6=    47     2876
"1p11455p?8#339 5$~?p5p>pp>pp11$=p555ppp~(p7595p$pp36p&pp?p(p$p$5>5    4F     2464
 3r33677r=:!11;"7&|=r7r<rr<rr33&?r777rrr|*r57;7r&rr14r$rr=r*r&r&7<7    4D     2508
&5t55011t;<'77=$1 z;t1t:tt:tt55 9t111tttz,t31=1t tt72t"tt;t,t t 1:1    4B     2796
!2s22766s<; 00:#6'}<s6s=ss=ss22'>s666sss}+s46:6s'ss05s%ss<s+s's'6=6    4C     2530

    SEQUENCE 3 - analysis results...
5<8/"5/?-8%)#-">l-l+$"*(%8(5.( #?((:(*-!l-8)"#!8l?*-8;).";8)(>8lll-    DD     756
 )-:7 :*8-0<687+y8y>17?=0-= ;=56*==/=?84y8-<764-y*?8-.<;7.-<=+-yyy8    C8     847
+"&1<+1!3&;7=3< r3r5:<46;&6+06>=!66$643?r3&7<=?&r!43&%70<%&76 &rrr3    C3     798
*#'0=*0 2':6<2=!s2s4;=57:'7*17?< 77%752>s2'6=<>'s 52'$61=$'67!'sss2    C2     805
) $3>)3#1$95?1>"p1p78>649$4)24<?#44&461=p1$5>?=$p#61$'52>'$54"$ppp1    C1     784
/&"58/5%7"?3978$v7v1>802?"2/42:9%22 207;v7"389;"v%07"!348!"32$"vvv7    C7     826
.'#49.4$6#>2869%w6w0?913>#3.53;8$33!316:w6#298:#w$16# 259 #23%#www6    C6     833
-$ 7:-7'5 =1;5:&t5t3<:20= 0-608;'00"0259t5 1:;9 t'25 #16:# 10& ttt5    C5     812
yptcnycsatieoanr a ghnfditdybdlosddvdfam atenomt sfatwebnwtedrt   a    91     6441
7>:- 7-=/:'+!/ <n/n)& (*':*7,*"!=**8*(/#n/:+ !#:n=(/:9+, 9:+*<:nnn/    DF     770
6?;,!6,<.;&* .!=o.o('!)+&;+6-+# <++9+)."o.;*! ";o<).;8*-!8;*+=;ooo.    DE     777
4=9.#4.>,9$(",#?m,m*%#+)$9)4/)!">));)+, m,9(#" 9m>+,9:(/#:9()?9mmm,    DC     763
09=*'0*:(= ,&(';i(i.!'/- =-0+-%&:--?-/($i(=,'&$=i:/(=>,+'>=,-;=iii(    D8     735
18<+&1+;)<!-')&:h)h/ &.,!<,1*,$';,,>,.)%h)<-&'%<h;.)<?-*&?<-,:<hhh)    D9     728
>73$)>$4&3."(&)5g&g /)!#.3#>%#+(4##1#!&*g&3")(*3g4!&30"%)03"#53ggg&    D6     721
?62%(?%5'2/#)'(4f'f!.( "/2"?$"*)5""0" '+f'2#()+2f5 '21#$(12#"42fff'    D7     714
<51&+<&6$1, *$+7e$e"-+#!,1!<'!)*6!!3!#$(e$1 +*(1e6#$12 '+21 !71eee$    D4     707
=40'*='7%0-!+%*6d%d#,*" -0 =& (+7  2 "%)d%0!*+)0d7"%03!&*30! 60ddd%    D5     700
:37 -: 0"7*&,"-1c"c$+-%'*7':!'/,0''5'%".c"7&-,.7c0%"74&!-47&'17ccc"    D2     693
;26!,;!1#6+'-#,0b#b%*,$&+6&; &.-1&&4&$#/b#6',-/6b1$#65' ,56'&06bbb#    D3     686
815"/8"2 5($. /3a a&)/'%(5%8#%-.2%%7%' ,a 5$/.,5a2' 56$#/65$%35aaa     D0     679

    SEQUENCE 4 - analysis results...
phhtd h mhncmt eocCra orfhseui s e a adaFmhncnehteodeanuo,hreahcbvs    B2     6073

    SEQUENCE 5 - analysis results...
tyei onfoegumitsfkrasure o,attcywsalii nopeer  oocrersttw icmtaarei    53     6527
$)59p?>6?57%=9$#6;"1#%"5p?|1$$3)'#1<99p>? 55"pp??3"5"#$$'p93=$11"59    03     672
&+7;r=<4=75'?;&!49 3!' 7r=~3&&1+%!3>;;r<="77 rr==1 7 !&&%r;1?&33 7;    01     684
!,0<u:;3:02 8<!&3>'4& '0u:y4!!6,"&49<<u;:%00'uu::6'0'&!!"u<68!44'0<    06     823
 -1=t;:2;13!9= '2?&5'!&1t;x5  7-#'58==t:;$11&tt;;7&1&'  #t=79 55&1=    07     816
#.2>w891820":>#$1<%6$"%2w8{6##4. $6;>>w98'22%ww884%2%$## w>4:#66%2>    04     714
- <0y67?6<>,40-*?2+8*,+<y6u8--: .*8500y76)<<+yy66:+<+*--.y0:4-88+<0    0A     843
=0, i&'/&,.<$ =:/";(:<;,i&e(==*0>:(%  i'&9,,;ii&&*;,;:==>i *$=((;,     1A     731
;6*&o !) *(:"&;<)$=.<:=*o c.;;,68<.#&&o! ?**=oo  ,=*=<;;8o&,";..=*&    1C     765
94($m"#+"(*8 $9>+&?,>8?(m"a,99.4:>,!$$m#"=((?mm"".?(?>99:m$. 9,,?($    1E     751
:7+'n! (!+);#':=(%</=;<+n!b/::-79=/"''n !>++<nn!!-<+<=::9n'-#://<+'    1D     758
85)%l#"*#)+9!%8?*'>-?9>)l#`-88/5;?- %%l"#<))>ll##/>)>?88;l%/!8-->)%    1F     648
7:&*c,-%,&$6.*70%(1"061&c,o"77 :40"/**c-,3&&1cc,, 1&10774c* .7""1&*    10     705
58$(a./'.$&4,(52'*3 243$a.m 55"862 -((a/.1$$3aa.."3$32556a(",5  3$(    12     691
2?#/f)( )#!3+/25 -4'534#f)j'22%?15'*//f()6##4ff))%4#45221f/%+2''4#/    15     718
3>".g()!(" 2*.34!,5&425"g(k&33$>04&+..g)(7""5gg(($5"54330g.$*3&&5".    14     725
1< ,e*+#* "0(,16#.7$607 e*i$11&<26$),,e+*5  7ee**&7 76112e,&(1$$7 ,    16     711
?2."k$%-$.,>&"?8- 9*8>9.k$g*??(2<8*'""k%$;..9kk$$(9.98??<k"(&?**9."    18     745

    SEQUENCE 6 - analysis results...
*ee&6#,*+7670*- e <5e6e!+0e7e,76 ,+0+7-+7)e7<620e0e6ee ee$6,* 1+*7)    88     1414
+dd'7"-+*6761+,!d!=4d7d *1d6d-67!-*1*6,*6(d6=731d1d7dd!dd%7-+!0*+6(    89     1400
#ll/?*%#">?>9#$)l)5<l?l("9l>l%>?)%"9">$"> l>5?;9l9l?ll)ll-?%#)8"#>     81     1512
8ww4$1>89%$%"8?2w2.'w$w39"w%w>%$2>9"9%?9%;w%.$ "w"w$ww2ww6$>82#98%;    9A     1666
<ss0 5:<=! !&<;6s6*#s s7=&s!s:! 6:=&=!;=!?s!* $&s&s ss6ss2 :<6'=<!?    9E     1610
?pp3#69?>"#"%?85p5) p#p4>%p"p9"#59>%>"8>"<p")#'%p%p#pp5pp1#9?5$>?"<    9D     1568
=rr1!4;=< ! '=:7r7+"r!r6<'r r; !7;<'< :< >r +!%'r'r!rr7rr3!;=7&<= >    9F     1596
o  csfionrsruohe eyp s dnu r irseinunrhnrl ryswu u s  e  asioetnorl    CD     5853
;tt7'2=;:&'&!;<1t1-$t't0:!t&t=&'1=:!:&<:&8t&-'#!t!t'tt1tt5'=;1 :;&8    99     1624
:uu6&3<:;'&' :=0u0,%u&u1; u'u<'&0<; ;'=;'9u',&" u u&uu0uu4&<:0!;:'9    98     1638
6yy:*?067+*+,61<y< )y*y=7,y+y0+*<07,7+17+5y+ *.,y,y*yy<yy8*06<-76+5    94     1694
)ff%5 /)(4543).#f#?6f5f"(3f4f/45#/(3(4.(4*f4?513f3f5ff#ff'5/)#2()4*    8B     1428
.aa"2'(./3234.)$a$81a2a%/4a3a(32$(/4/3)/3-a38264a4a2aa$aa 2(.$5/.3-    8C     1358
 oo,<)& !=<=: '*o*6?o<o+!:o=o&=<*&!:!='!=#o=6<8:o:o<oo*oo.<& *;! =#    82     1554
!nn-=('! <=<;!&+n+7>n=n* ;n<n'<=+' ; <& <"n<7=9;n;n=nn+nn/='!+: !<"    83     1540
,cc 0%*,-1016,+&c&:3c0c'-6c1c*10&*-6-1+-1/c1:046c6c0cc&cc"0*,&7-,1/    8E     1386
&ii*:/ &';:;<&!,i,09i:i-'<i;i ;:, '<';!';%i;0:><i<i:ii,ii(: &,='&;%    84     1470
'hh+;.!'&:;:=' -h-18h;h,&=h:h!:;-!&=&: &:$h:1;?=h=h;hh-hh);!'-<&':$    85     1456

    SEQUENCE 7 - analysis results...
gipet qrg ,ennenarphbehsosostoytrgda loe eVepcagbrd iidwkn sns  kyy    3E     6240

    POSSIBLE KEY: ['BA', '1F', '91', 'B2', '53', 'CD', '3E']

And, that was my key. When I decrypted the ciphertext using that, I got the wonderful plaintext…

Cryptography is the practice and study of techniques for, among other things, secure communication in the presence of attackers. Cryptography has been used for hundreds, if not thousands, of years, but traditional cryptosystems were designed and evaluated in a fairly ad hoc manner. For example, the Vigenere encryption scheme was thought to be secure for decades after it was invented, but we now know, and this exercise demonstrates, that it can be broken very easily.

Pfft, looks like that frequency analysis was of no use, but well I liked its results. And, I really enjoyed the problem last week that I’m now ready to find my next obsession…

C̻͇͔̪̤̠̤͉̭͜r̬͉̖̠̼̹͚͔͚ͅy̢̤̗̟̣͚̗̭͓̺p̧̨͎̩͓̻̜̞ͅͅt̨̨͎̞͙̦̠̯̬͜ǫ̧̖͓̤̟͇͇̪̤g̘͓̰̤͇̘̦̟͍̫r̨̧̨̢̥͇̮̱̖ͅą̧̭̮͖͓̼̮̺͇p͙͚̥͓͉̮̙̝̜͜ḩ̜͍͍̹̻̥̱̬̣y͎͓͕̟̘̬̟̥̗̜ ̢̧̘͇̬͎̯̖̜̼r̼͇̝͕̝̞̞͎̖̤o͍͔̭͇͕̗̦̟͍̣c̨̡̡͎̼̩̜͎̰̳ķ̨͙̪̗̦̥̜͇͍s̪̪̠̲̥͔̝̠͇̲!̧̢̺͓̯̰̘̪͔̫

Advertisements

Tagged: , ,

One thought on “Breaking the Vigenère cipher…

  1. Sergey October 14, 2016 at 4:25 PM Reply

    hello! do you have some automatic program to decrypt message without key (it looks like 256 alphabet-characters)?

Wanna Reply?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: