Reply to Club Topic #3: Encoding methods

Cryptography is a study of encryption.

Encryption is converting an original object(number, string, picture, file, anything transferable) called “plaintext” into a special form, so that only a few, allowed people can access to the information. And then those who has the encoded plaintext, called “ciphertext”, will decode in a particular way to retrieve plaintext.

Ⅰ. Single Substitution method(a.k.a. Caesar password)

The oldest encryption method is called the Caesar password, which encodes alphabetic string by “pushing” the alphabets certain times. This kind of encryption method is called single-substitution method because the parameter related to encryption is independent to each characters.

Ⅱ. Matrix, and multiple substitution method

After the Caesar password, people needed more complicated method. This dream was realized by discovery of matrix.

Each characters correspond to numbers(ex. A=1, B=2, …) and plain texts are transformed into particularly shaped arrays of numbers, called a matrix. This matrix is multiplied by another constant matrix, called a “Encoding Matrix”. Without this matrix, decoding the ciphertext is nearly impossible because matrix has a lot more number than a integer, and range of every entry in the matrix is unlimited, so figuring out the encoding matrix is nearly impossible.

With this method, those who has encoding matrix can multiply inverse matrix to the ciphertext, and retrieve plaintext. However, this cryptogram is not that complex because character-encoding mechanism is so simple that figuring out matrix is possible. I’ll explain it with simple math problem.

Let plain text be “CAKE”, and we will encrypt this with 2 by 2 matrix X. This will be encoded into two 2 X 1 row matrices [3  1], [11  5]. If cipher text is found to be [5 9] and [21 37], then we can set two 2-dimensional systems of equations by setting entries of matrix X by a, b, c, and d.

For a11 = a, a12 = b, a21 = c, a22 =d (entry of mth row and nth column of matrix X is denoted as amn),

3a + 1c = 5, 11a + 5c = 21

3b+2d = 9, 11b + 5d = 37

Then by solving these systems, we can obtain encoding matrix X which has (a,b,c,d)=(1,2,2,3) as entries.

Mathematically, obtaining n*n encoding matrix requires at least n pairs of cipher text and plain text, and solving n sets of n-dimensional regular system of equations.

# Regular system of equations is a system which has same number of variables and equations.

To make this encoding method more intricate, a more complex character encoding method is required. For example, a cryptogram used in the WW2, which used encoding matrix, encoded 26 alphabets and 10 punctuation marks or white spaces by corresponding to 2 pairs of alphabets among six choices. (Choosing two alphabets pairs among A,B,C,D,E,F produces 6*6=36 choices.)

Ⅲ. RSA Encryption System

However, thanks to faster multi-processor computers, we can easily solve these kind of passwords. Because of this, a new encryption system was invented in 1978, by Ron Rivest, Adi Shamir, and Leonard Adleman. It was the “RSA password”. RSA password is based on two mechanisms; difficulty of dividing a big composite number into its prime factors, and asymmetrical encryption system.

#2*3*7=42. This is multiplication, a simple math. But dividing 42 by every prime number smaller than 42(2,3,5,7,11,13,17,19,…) and figuring out if 42 is divided or not, requires long amount of time.

  1. Symmetric and Asymmetric encryption

Symmetric encryption is the traditional method of encryption. When Alice wants to send Bob a message, then she can give a message in a box locked with a padlock. If Bob has a key for the lock, he can unlock and see what’s inside. This method is simple, but has a problem. It requires too many keys. If 20 people is using this system, then 20C2(number of cases of choosing two people without order, among 20) = 190 keys are needed. And each person should keep 19 keys to open 19 other people’s passwords.

Keeping a number of keys, was the biggest problem of traditional symmetric encryption methods.

2. Asymmetric Encryption method

To overcome the problem of symmetric encryption, asymmetric encryption uses two keys; public key and private key.

Public key and private key can both encrypt plain text, but only private key can decipher cipher text. And mathematical algorithm(the algorithm will be explained later) prevents obtaining private key from public key.

In analogy, let’s call Alice and Bob again. They both has each other’s own padlock. So if Alice wants to send message to Bob, she can lock the box with message inside, with Bob’s padlock that she has. And then Bob will unlock the box with his own key. However, obtaining Bob’s key with Bob’s padlock that Alice has, must be very hard.

This is the mechanism of asymmetric encryption. Each others’ padlock, or the public key is open to everyone. Everyone can send message by encrypting plain text with other’s public key. But they can’t snatch others’ message, and decrypt with public key, because you can’t unlock a box with padlock.

And let’s see how public and private keys are generated in RSA algorithm.

(1) RSA algorithm starts with two prime numbers p, and q.

(2) Obtain a number e(for “e”ncryption), which is relatively prime to (p-1) and (q-1).

# If a and b is relatively prime, then a and b doesn’t have any common factors other than 1. In other words, greatest common divisor of a and b (GCD(a,b)) is 1.

(3)Obtain a number d(for “d”ecryption), which remainder of ed divided by (p-1)(q-1) is 1.

#Positive integer d can be easily obtained by using Bezout’s identity.

Bezout’s identity means there exists integer x, y such that ax + by = gcd(a,b), and it’s able to find them with simple calculations. However bezout’s identity and its process is too long and slightly off the topic, we will just skip the detailed method.

About Bezout’s identity: https://en.wikipedia.org/wiki/Bézout’s_identity

Calculation Process:

KUNG-WEI YANG, Euclid’s Algorithm=Reverse Gaussian Elimination, Mathematics Magazine Vol.63(1990), pp163-164

 

(4) Public key is N=p*q, e. Private key is d. As p , q, (p-1)(q-1) is now unnecessary, and may cause security problems, they are automatically dumped.

Each individual only needs to keep private key d.

-Encryption: encrypt plain text a by  x≡a​^e mod N.

-Decryption: decrypt cipher text by a​≡x​^d mod N.

a, original plain text, and a’, decoded message, is identical. This will be proven at Appendix.

# a ≡ x mod N means “a and x has same remainder divided by N”. N is called ‘modulo’, a dividing number. This statement is read “a is equivalent to x by mod(modulo) N.”

 

As RSA password was the first password able to make website license, a lot of websites are using RSA-2048 key. This key uses 2048 bit (617 decimal) number as public key. This number is still not factorized yet, and anyone who factorize this will receive US$200,000.

 

Appendix: Mechanism of RSA encryption [number theory

RSA Encryption is based on Fermat’s Little Theorem.

For arbitrary integer a and N, following always holds.
a^(φ(N)) ≡  1 (mod N)
φ(N) is called the Euler-pi function, which returns number of natural numbers relatively prime to N, and not bigger than N.

We won’t deal with general form of euler pi function. However, with N=pq where p and q are both prime numbers, φ(N) = (p-1)(q-1).

This can be easily proven; there are pq numbers from 1 to N, and there are q numbers that are multiple of p (p,2p,3p,…pq) and p numbers that are multiple of q. And since pq is counted twice, we should subtract 1 from the counting. So there are p+q-1 numbers that are not relatively prime with N.

So, numbers not bigger to N that are relatively prime to n, or euler pi function is calculated by φ(N)=φ(pq)=pq-p-q+1 = (p-1)(q-1).

With Fermat’s little theorem, you can recognize that mathematical algorithm of RSA encryption is quite easy. We will prove orignal plain text a is identical to decoded cipher text a’.

(1) According to RSA algorithm, ed = b(p-1)(q-1) +1 (because d is defined as an integer which remainder of ed divided by (p-1)(q-1) is 1)

where b is an integer representing quotient.

(2)According to simple properties of congruence(equations dealing with properties of remainders expressed by modulo notation.), ab mod N = (a mod N)(b mod N), a^b mod N = (a mod N)^b

where a mod N is remainder of a divided by N.

By these two properties:

a’ ≡ x^d mod N ≡ (a^e mod\ N)^d  mod N ≡ (a^e)^d mod N ≡ a^ed  mod N
≡ a^{b φ(N) + 1}   mod N

≡{a^φ(N)}^b * a  ≡ 1^b  *a mod N   [By Fermat’s little theorem]

= a (Original Plain Text)

Example. Set p = 11, q = 37

N=pq = 437, e = 7 (because 7 is relatively prime with both 10 and 36.)

This pair, (437, 7) will be public key.

Then let’s encrypt 240 with this simple-RSA code.

Remainder of 7(e)-th power of 240 divided by 437 is 235.

And this cipher text 235 will be send to someone who knows euler pi-function of 407. (In this case euler function is easily calculated, but in real RSA code, composite number N used is about several thousand digits.)

Euler pi function of 407 is (11-1)(37-1)=360=φ(N)

d such that ed = φ(N)*b + 1 is the decryption code.

since e = 7 and φ(N)=360, we can find d from Bezout’s identity.

7*103 = 360*2 + 1, decryption code d is 103.

So we can retrieve plain text by calculating remainder of 103(d)th power of 235(cipher text) divided by 407(N), and it’s 240, same as original plain text.

#Python language Implementation

transform

Encoding a plain text into a long, long,long natural number, and decoding it backwards.

encrypt

RSA password implementation. Using two prime numbers 10037 and 10061, we can generate Public and Private keys. We can encrypt text “LOVE” into integer 26169002 with public key N and encryption key e.

And with decryption key d, we can decrypt cipher text into original text, ‘LOVE’.

Those who want to figure out how exactly this code works:

def isPrime(n):
if n == 1:
return False
for i in range(2, n/2 + 1):
if n%i==0:
return False
return True

def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x

def rel_prime(x,y):
return gcd(x,y)==1

def bezout(a, b):
x,y, u,v = 0,1, 1,0
while a != 0:
q, r = b//a, b%a
m, n = x-u*q, y-v*q
b,a, x,y, u,v = a,r, u,v, m,n
gcd = b
return x

def power_mod(a,b,c):
x = 1;
for i in xrange(0,b):
x = (x*a)%c
return x

#####################################################

characters = [‘=’,’=’,’=’,’=’,’=’,’=’,’=’,’=’,’=’,’=’,’=’,’A’,’B’,’C’,’D’,’E’,’F’,’G’,’H’,’I’,’J’,’K’,’L’,’M’,’N’,’O’,’P’,’Q’,’R’,’S’,’T’,’U’,’V’,’W’,’X’,’Y’,’Z’,’a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’,’k’,’l’,’m’,’n’,’o’,’p’,’q’,’r’,’s’,’t’,’u’,’v’,’w’,’x’,’y’,’z’,’ ‘,’.’,’,’,’!’,’?’,’1′,’2′,’3′,’4′,’5′,’6′,’7′,’8′,’9′,’0′,’-‘,’@’,’#’,’$’,’%’,’^’,’&’,’*’,’\n’,”‘”,'”‘,’\r’]
def char_num(string):
encoded = ”
for letter in string:
assert letter in string
encoded += str(characters.index(letter))
return int(encoded)

def num_char(num):
decoded = ”
num = str(num)
for i in range(0,len(num),2):
decoded += characters[int(num[i])*10+int(num[i+1])]
return decoded

#####################################################

def set_encryption():
p = input(“First prime number”)
q = input(“Second prime number”)
p, q = min(p,q), max(p,q)
#Without loss of Generality, suppose p <= q.
assert isPrime(p)
assert isPrime(q)
N = p*q
#Encryption code e
e = p-1
while e <= p-1:
if rel_prime(e,p-1) and rel_prime(e,q-1):
break
else:
e -= 1
#Decryption code d
d = bezout(e,(p-1)*(q-1))
return [N,e,d]

def public_key():
return set_encryption()[0:2]

def private_key():
return set_encryption()[2]

#####################################################
def encode(message,N,e):
x = power_mod(message,e,N)
return x

def decode(cipher,N,d):
a = power_mod(cipher,d,N)
return a

def encrypt():
strVar = raw_input(“Input message to encrypt.”)
message = char_num(strVar)
N = input(“Input public key N.”)
e = input(“Input encryption key e.”)
return encode(message,N,e)

def decrypt():
ciphertext = input(“Input integer ciphertext to decrypt.”)
N = input(“Input public key N”)
d = input(“Input decrpytion key d.”)
plain_int = decode(ciphertext,N,d)
return num_char(plain_int)

 

LEGENDS

  • encode and decode deal with integers, and encrypt and decrypt deal with strings(with alphabets and punctuation marks).
  • set_encryption() generates N, e, d at once by getting two prime numbers. public_key() generates N,e and private_key() generates d.
  • isPrime returns if an integer is a prime number.
  • gcd(a,b) returns Greatest Common Divisor between a and b.
  • rel_prime(a,b) returns if a and b are relatively prime(or gcd(a,b)=1)
  • bezout(a,b) returns an integer x such that ax+by=gcd(a,b), or the Bezout’s Identity. This is used for obtaining decryption code for RSA encryption.
  • power_mod(a,b,c) returns remainder of a^b divided by c.
  • char_num encrypts character into integer, and num_char decodes it. A=11, B=12,….
  • # mark is for nothing! Actually, python only ignores # mark; others such as % or $ ends up with syntax error… It’s just to draw a border so never mind 😀

 

Advertisement

One thought on “Reply to Club Topic #3: Encoding methods

  1. Ah! the Caesar method! I am very thankful how you had introduced an ancient, yet a very simple method that will significantly help others have a basic understanding of cryptography. Thank you so much for a helpful post! 🙂

    P.S. great emphasis on “nearly impossible”!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s