Invented by Lester S. Hill in 1929, the Hill cipher is a polygraphic substitution cipher based on linear algebra.
Hill used matrices and matrix multiplication to mix up the plaintext.
To counter charges that his system was too complicated for day to day use, Hill constructed
a cipher machine for his system using a series of geared wheels and chains. However, the machine never really sold.
Hill's major contribution was the use of mathematics to design and analyse cryptosystems. It is important to note that
the analysis of this algorithm requires a branch of mathematics known as 'number theory'. Many elementary number theory
text books deal with the theory behind the hill cipher, with several talking about the cipher in detail (e.g. "Elementary
Number Theory and its applications", Rosen, 2000). It is advisable to hire a book such as this one from the library and
trying to learn a bit if you want to understand this algorithm in any depth.
Example
This example will rely on some linear algebra and some number theory. The 'key' for a hill cipher is
a matrix e.g.
|2 4 5|
|9 2 1|
|3 8 7|
In the above case, we have taken the size to be 3x3, however it can be any size (as long as it is square).
We now take 3 characters from our plaintext e.g. 'DEF' and create a vector that corresponds to the letters
(replace 'A' with 1, 'B' with 2 etc.) to get: [4 5 6] (this is ['D' 'E' 'F']).
To get our ciphertext we perform a matrix multiplication
This process is performed for all 3 letter blocks in the plaintext. The plaintext may have to be padded with some
extra 'x's to make sure that there is an even number of blocks.
Now for the tricky part, the decryption. We need to find an inverse matrix modulo 26 to use as our 'decryption key'.
i.e. we want something that will:
|x x x|| 6| |4|
|x x x||22| mod 26 = |5| = 'DEF'
|x x x||16| |6|
To do this, we must use a bit of maths. It turns out that the x matrix above can be calculated from our key.
A lengthy discussion will not be included here, as it is all just abstract maths, which can be found in most number theory
textbooks or on the internet. The important things to know
are inverses (mod m), determinants of matrices, and adjoints of matrices.
Let K be the key matrix. Let d be the determinant of K. We wish to find Ki, or the inverse of K (modulo 26). i.e. the matrix Ki
such that K*Ki = I (mod 26), where I is the identity matrix.
Ki = di*adj(K)
where Ki = inv(K)(mod 26), di = inv(d)(mod 26), adj(K) is the adjoint matrix of K.
d is calculated normally for K. The inverse, di, is found by finding a number such that d*di = 1 (mod 26). The simplest way
of doing this is to loop through the numbers 1..25 and find the one such that the equation is satisfied. There is no solution
(i.e. chose a different key) if gcd(d,26) != 1 (if this is the case K can not be inverted).
That is it. Once Ki is found, decryption can be performed.
JavaScript Example of the Hill Cipher
This is a JavaScript implementation of the Hill Cipher. The case here is restricted to 2x2 case of the hill cipher
for now, it may be expanded to 3x3 later. coming soon ..
The 'key' should be input as 4 numbers, e.g. '3 4 19 11'. These numbers will form the key (top row, bottom row).
Plaintext
key =
Ciphertext
Cryptanalysis
Cryptanalysis is the art of breaking codes and ciphers.
When attempting to crack a Hill cipher, frequency analysis will be practically useless, especially
as the size of the key block increases. For very long ciphertexts, frequency analysis may be useful when applied
to bigrams (for a 2 by 2 hill cipher), but for short ciphertexts this will not be practical.
The basic Hill cipher is vulnerable to a
known-plaintext attack, however,(if you know the plaintext and corresponding ciphertext the key can be recovered)
because it is completely linear. An opponent who intercepts several plaintext/ciphertext character pairs can
set up a linear system which can (usually) be easily solved; if it happens that this system is indeterminate,
it is only necessary to add a few more plaintext/ciphertext pairs[1]. The known ciphertext attack is the best
one to try when trying to break the hill cipher, if no sections of the plaintext are known, guesses can be made.
For the case of a 2 by 2 hill cipher, we could attack it by measuring the frequencies of all the digraphs that occur in
the ciphertext. In standard english, the most common digraph is TH, followed by HE. If we know the hill cipher has been employed
and the most common digraph is KX, followed by VZ (for example), we would guess that KX and VZ correspond to TH and HE, respectively.
This would mean 19 7 and 7 4 are sent to 10 23 and 21 25 respectively. If A was the encrypting matrix, we would have:
which gives us a possible key. After attempting to decrypt the ciphertext with
inv(A) = |2 9|
|5 23|
we would know whether our guess was correct. If it is not, we could try other combinations of common ciphertext
digraphs until we get something that is correct.
In general, the hill cipher will not be used on its own, since it is not all that secure. It is,
however, still a useful step when combined with other non-linear operations, such as S-boxes (in modern ciphers).
It is generally used because matrix multiplication provides good diffusion (it mixes things up nicely).
Some modern ciphers use a matrix multiplication step to provide diffusion e.g. AES and Twofish use matrix multiplication
as a part of their algorithms.
Code
I have included here some C code that does encryption and decryption of
the Hill cipher. It is only meant to show the working of the
algorithm, not be a final polished solution. C Implementation of Hill cipher
References
[1] Wikipedia
has a good description of the encryption/decryption process, history
and cryptanalysis of this algorithm
Rosen, K (2000). Elementary Number Theory and its applications. Addison Wesley Longman,
ISBN 0-321-20442-5.