Department of Computer Science
Helsinki University of Technology
Janne.Frosen@hut.fi
2.11.1995
Some general terms used in cryptography
Plaintext or Cleartext | The original data. This may be a text message, file or a stream of communication |
Encryption | Encoding a message so that hides it hides the contents from outsiders |
Ciphertext | The encrypted message |
Decryption | Retrieving the plaintext from ciphertext |
Cipher | A method of encryption and decryption |
Key | A key is usually used in encrypting, decrypting is only possible by knowing the key |
Cryptography | Art or science of encoding data and keeping the message secret |
Cryptoanalysis | Art of breaking ciphertext without knowing the key |
Cryptology | Branch of mathemathics that studies cryptographic methods |
Some US instances linked with cryptology
NSA | National Security Agency |
NIST | National Institute of Standards and Technology |
Cryptography is formally the art of encoding data in a way that only the intended recipient can decode it, and know that the message is authentic and unchanged. Cryptography means different things to different people. Small children play with simple ciphers and substitution secret languages, bigger children play with crypto puzzles. Some people are concerned with privacy for various reasons and use different methods to encrypt sensitive data, with standard unix "crypt" or "rot-13" variants. None of these have anything to do with strong encryption and real data security.
Strong encryption is not a technical standard, it means your encryption cannot be broken by current known methods within feasible time without the data being outdated. Strong encryption can be used to protect your sensitive data against organized crime, government and multinational corporations - all instances with virtually unlimited resources. That has been a cause to recent tries to outlaw strong encryption.
Strong encryption brings many possible applications in daily life. Different applications that require privacy, trust and access control should all use strong encryption methods when possible. Applications include things like electronic money [AUK95] [SAA95], secure communications [PES95] [PUL95] [HEI95], passwords, and many others. It is in people's own interest that different legal/medical/personal data about their person stays confidential to the instances that have a permit to collect the databases (finnish Tietoturvalaki).
Some cryptograhic methods rely on the secrecy of the algorithms used in the cipher (security by obscurity). These ciphers are only of historical interest and are not adequate for a real-world situation.
All modern algorithms use a key to control the encryption and decryption. The message can only be decrypted if the key matches the one it was encrypted with. The key used for decryption can be different from the key used in encryption, and this divides the algorithms in symmetric (or secret-key) and asymmetric (or public-key) classes.
Modern cryptographic algorithms are meant to be executed by computers or specialized hardware devices for which there are several different cryptographic algorithms and methods. This paper concentrates on those commonly used in data encryption today.
Different symmetric algorithms use different length keys, usually a longer key means higher security.
Symmetric algorithms can be further divided into two categories: stream ciphers which take and encrypt one bit of plaintext at a time, and block ciphers which take a number of bits (typically 64) and encrypt them as a single block. Most ciphers belong to the block cipher class.
Symmetric algorithms are generally faster than asymmetric ones and use a much shorter key.
DES stands for "The Data Encryption Standard". DES has been certified by NIST (National Institute of Standards and Technology) for use as an official US Government encryption standard for less-than-top-secret secret material. It is recertified every five years. DES was first certified for government use in 1977, and most recently recertified in 1993, but may not be recertified again [FAQ95].
DES is a strong cipher which encrypts a block of 64 bits at a time using a 56 bit key (56 + 8 parity checks = 64) resulting in 64 encrypted bits.
DES encryption itself consists of many rounds of different transformations and permutations, which are linear and easy to reverse. The critical encryption is done using S-boxes. The S-boxes, or substitution boxes, are a set of highly non-linear functions, implemented in DES as a set of lookup tables (of 4 rows and 16 columns). The S-boxes encrypt 4 bits at a time, so encrypting is done in 16 rounds. After the S-boxes the results are still permutated.
The complicated substituting, permutating, XORs and shifts were chosen to have some useful properties:
DES is normally used in cipher block chaining (CBC) or cipher feedback (CFB) mode. In CBC mode a plaintext block is first XORed with the previous ciphertext block and then encrypted to obtain the ciphertext. In CFB mode the previous ciphertext block is encrypted and then XORed with the plaintext to get the ciphertext.
There are numerous software applications and C libraries with the DES encryption routines widely available. Exporting DES from US is regulated (by NSA - National Security Agency).
Speed
According to RSA Laboratories, when implemented entirely in software, DES is at least 100 times faster than RSA. Implemented in hardware, it may outperform the RSA algorithm by 1000 or even 10000 times. This is due primarily to the fact that the DES S-boxes are simple table-lookup functions, while RSA depends on very-large-integer arithmetic.
A fast 486 PC can encrypt about 400 kb per second using DES. [YLO95]
Security
There are two known ways to 'break' DES. The first requires an exhaustive search of the keyspace, which consists of 2^56 (about 7.2*10^16) possible keys ("brute force"). If you can test one million keys every second, it should only take about 2000 years to go though the keyspace. With special hardware and/or networked machines testing can be done magnitudes faster, a chip could be designed that does a billion tests per second, reducing the needed time to 2 years. It is said that you can buy a dedicated machine (with special decrypting hardware) that can break DES in a couple of hours by brute force, by spending 1 million US dollars [FAQ95].
The other more recent method is called differential cryptanalysis. This method reduces the number of keys that must be tested, but it requires that you have 2^47 chosen plaintexts encrypted with the key you are trying to recover. Since it is highly unlikely that anyone will agree to encrypt 2^47 chosen plaintexts with their secret DES key, this attack is unfeasible in practice [FAQ95].
Because the DES algorithm is based on the "mysterious" S-boxes, which are just constants without any known connections, it has lead to some rumors that there was a trapdoor in the algorithm [SCH95].
The overall consensus is that DES, when used properly, is secure against all but the most powerful organizations. (like NSA, governments, mafia, big supercomuting/parallel computing companies, etc.). Proper use means avoiding known "weak" keys. (weak keys are result of the key being split to 16 pieces, one for each round of encryption). It's simple to avoid the 4 weak, 12 semi-weak keys and 48 possibly weak keys.
The short key is the main known risk in DES. Given all these points, using simple DES for top-secret data is not a good idea anymore, but is sufficient for everyday use.
If DES with a single key is not sufficiently secure for a given application, it can be made more secure by encrypting more than once with different keys. It has been proven that multiple encryptions do actually improve the security of DES (DES is not an algebraic group!), and it is thought that triple-encryption with DES is about equivalent to single-encryption with a 112-bit key [SCH95].
3DES usually uses 2 different keys. First the data is DES encrypted with first key, then decrypted with the second key, and then encrypted with the first key again.
Speed
Triple DES is almost 3 times slower than DES. A fast 486 PC can encrypt about 150 kb per second [YLO95].
Security
At a rate of one million keys per second, an exhaustive search of 2^112 keys would require about 1.65*10^20 years to complete. Since the universe is estimated to be only about 10^10 years old, that is probably long enough for most purposes.
3DES has been proven much more secure than conventional DES, and is a good alternative for current designs. There still remains the same rumors that concern DES (trapdoor?). But as is, it's a better alternative to DES and with current machine technology makes brute force attacks not viable. Applications using DES can easily be converted to use 3DES instead, if the slight effect on speed is not critical.
IDEA (International Data Encryption Algorithm) is an algorithm developed at ETH Zurich in Switzerland. IDEA is considered one of the best and most secure algorithms available today. It uses a 128 bit key to encrypt 64 bit blocks, and the same algorithm is used for encryption and decryption. IDEA is a fairly new algorithm, but it has already been around for several years, and no practical attacks on it have been published despite of numerous attempts to analyze it.
IDEA mixes algorithms from three algebraic groups: XOR, Addition (modulo 2^16), Multiplication (modulo 2^16 + 1). The 64-bit data is divided into 4 16-bit sub-blocks on which the operations (8 total rounds of swapping, XORring, addition, multiplication).
IDEA is normally used in CBC and CFB modes like DES, and many software implementations are publicly available (PGP being most famous). IDEA is patented in the United States and in most of the European countries. The patent is held by Ascom-Tech. Non-commercial use of IDEA is free. Commercial licenses can be obtained by contacting idea@ascom.ch. (Patent not valid in Finland).
Speed
Encryption speed is somewhat faster than 3DES, but slower than DES. A fast 486 PC can encrypt about 200 kb per second [YLO95].
Security
It cannot practically be broken by brute force (even a billion tests/second chip would require 10^13 years .. an array of 10^24 such chips would be needed to find the key in a day, but there is not enough silicon atoms in the known universe to build them). No other methods are known (at least yet). It seems to be safe to use, authorities such as Bruce Schneier [SCH95] consider it the best symmetric algorithm today.
A Triple-IDEA (similar to 3DES) cipher should be sufficient for even the most paranoid people.
The main risk in IDEA is its relatively short age (in cryptography an algorithm needs to be around a couple of decades before it starts to get believed truly unbreakable). Only time will prove the strength of IDEA, authorities today think it is safe.
Skipjack is an NSA-developed encryption algorithm for the Clipper (and Capstone) chips. The algorithm is classified as secret. What is known about it is that it's symmetric, uses a 80-bit key and has 32 rounds of processing per each encrypt or decrypt operation.
The Clipper-chip is a commercial chip made by NSA for encryption, using the Skipjack algorithm. At least AT&T will be using the Clipper for encrypted voice phonelines.
Security
NSA is using Skipjack to encrypt it's own messaging system, so that leads to think the algorithm itself is secure.
Clipper uses Skipjack with two keys. The idea behind it is that anyone who knows the chip's "master key" can decrypt all messages encrypted with it. So, basically NSA could decrypt Clipper-encrypted messages with this backdoor. This method of tampering with the algorithms is called key escrow.
There are many movements and foundations campaigning against the Clipper-chip, concerned about privacy. There are rumors that a similar chip nicknamed EuroClipper could be introduced within EU, but the latest more direct news convince this to be just a rumor.
RC4 is a cipher designed by RSA Data Security, Inc. It used to be a trade secret, until someone posted source code for it in Usenet News. The main good point in RC4 is it's speed, the algorithm is very fast. Several hundred kb per second can be encrypted on a fast 486 PC. It is unknown if RC4 is a strong cipher, but at least there hasn't been confirmed proof against it [YLO95].
It is interesting to know that the exportable version of SSL (Netscape's Secure Socket Layer), which uses RC4-40, was recently broken by at least two independent groups. Breaking it took about eight days [BRE95].
Some other more or less known symmetric ciphers are:
An algorithm developed by Bruce Schneier (author of "Applied Cryptographics") [SCH95].
Enigma, used by Germans in WW2, and broken during it.
Vigenere, historical and easy to solve cipher
cipher | security | speed (486 pc) | key length |
DES | low | 400 kb/s | 56 bits |
3DES | good | 150 kb/s | 112 bits |
IDEA | good* | 200 kb/s | 128 bits |
3IDEA | very good* | ~100 kb/s | 256 bits |
Skipjack | good* | ~400 kb/s | 80 bits |
CLIPPER chip | "good"** | - | 80 bits |
* the algorithm is believed to be strong
** the algorithm itself is good, but it has a built-in weakness
The speed at which computing power has been increasing (16 times more powerful now than 10 years ago) compared to the key-lengths of symmetric algorithms leads at a rough formula of adding one bit to key-length for every year the algorithm is wanted to still hold against brute force attacks, keeping the normal 56-bit key of DES at today's standard. That means a 112 bit 3DES key should be expected to hold at least for about 50 next years, unless any extremely dramatic technical breakthroughs are achieved.
RSA is a public-key cryptosystem for both encryption and authentication, it was invented in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman (RSA). RSA is the most widely used public-key cryptosystem today and has often been called a de facto standard.
RSA works as follows: take two large primes, p and q, and find their product n = pq. Choose a number, e, less than n and relatively prime to (p-1)(q-1), and find its inverse, d, mod (p-1)(q-1), which means that ed = 1 mod (p-1)(q-1); e and d are called the public and private exponents, respectively. The public key is the pair (n,e); the private key is d. The factors p and q must be kept secret, or destroyed.
It is difficult (presumably) to obtain the private key d from the public key (n,e). If one could factor n into p and q, however, then one could obtain the private key d. Thus the entire security of RSA is predicated on the assumption that factoring is difficult; an easy factoring method would break RSA.
There are many applications today using RSA, most notably PGP and Ssh.
Encryption
Suppose Alice wants to send a private message, m, to Bob. Alice creates the ciphertext c by exponentiating: c = m^e mod n, where e and n are Bob's public key. To decrypt, Bob also exponentiates: m = c^d mod n, and recovers the original message m; the relationship between e and d ensures that Bob correctly recovers m. Since only Bob knows d, only Bob can decrypt.
Authentication
Suppose Alice wants to send a signed document m to Bob. Alice creates a digital signature s by exponentiating: s = m^d mod n, where d and n belong to Alice's key pair. She sends s and m to Bob. To verify the signature, Bob exponentiates and checks that the message m is recovered: m = s^e mod n, where e and n belong to Alice's public key.
Encryption and authentication take place without any sharing of private keys: each person uses only other people's public keys and his or her own private key. Anyone can send an encrypted message or verify a signed message, using only public keys, but only someone in possession of the correct private key can decrypt or sign a message.
Speed
RSA operations are all based on series of multiplications. In practical applications, it is common to choose a small public exponent for the public key. Entire groups of users can use the same public exponent. This makes encryption faster than decryption and verification faster than signing. Algorithmically, public-key operations take O(k^2) steps, private key operations take O(k^3) steps, and key generation takes O(k^4) steps, where k is the number of bits in the modulus.
There are many commercially available hardware implementations of RSA, and there are frequent announcements of newer and faster chips. The fastest current RSA chip has a throughput greater than 600 Kbits per second with a 512-bit modulus, implying that it performs over 1000 RSA private-key operations per second. It is expected that RSA speeds will reach 1 Mbit/second within a year or so.
By comparison, DES is much faster than RSA. In software, DES is generally at least 100 times as fast as RSA. In hardware, DES is between 1,000 and 10,000 times as fast, depending on the implementations. RSA will probably narrow the gap a bit in coming years, as it finds growing commercial markets, but will never match the performance of DES.
Security
The security of RSA depends of factoring being difficult. There are several methods to try factoring, but as long as the keys are long enough there is small risk of having your RSA encoded message broken. 384 bits can be broken relatively easily, 512 bits is probably insecure and breakable by major governments, 768 bits is probably relatively safe, 1024 bits should be secure for decades according to today's information. 2048 bits will most probably remain safe for a long time.
Another way to break RSA is to find a technique to compute e-th roots mod n. Since c = m^e, the e-th root of c is the message m. This attack would allow someone to recover encrypted messages and forge signatures even without knowing the private key. This attack is not known to be equivalent to factoring. No methods are currently known that attempt to break RSA in this way.
RSA is very vulnerable to chosen-plaintext attacks, and a good guess can reveal the used key. It is also advisable to include some random data (at least 64 bits) to the encrypted plaintext.
Diffie-Hellman is a commonly used public-key algorithm for key exchange. It is generally considered to be secure when sufficiently long keys and proper prime generators are used. The security of Diffie-Hellman relies on the difficulty of the discrete logarithm problem (which is believed to be computationally equivalent to factoring large integers).
Security
Diffie-Hellman is sensitive to the choice of the strong prime and the generator. The size of the secret exponent is also critical to the security. Conservative advice is to make the random exponent twice as long as the intended session key.
DSS stands for United States government Digital Signature Standard. It's security is based on the discrete logarithm problem (like Diffie-Hellman), altho it's design has not been made public.
Security
Some problems have been found with DSS. Secret data can be leaked with the signature and if the same random number is used twice in generating the signature the secret key will be revealed.
DSS shouldn't be used since both Diffie-Hellman and RSA methods are better.
Some other symmetric ciphers include:
Another cipher based on discrete logarithm problem
Based on "Lucas Sequences", which is a sequence based on factoring two large primes.
Elliptic curve public key cryptosystems are heavy computationally, but believed to be secure.
cipher | security | speed | key length |
RSA | good | fast | varies(1024 safe) |
Diffie-Hellman | good | slower than RSA | varies(1028 safe) |
DSS | low | - | 512 bits |
A cryptographic hash function generates a fixed-size hash value from a message of any length. The idea is to generate a hash value that cannot be used to trace back the original message. The typical applications include things like secret numbers on ATM cards etc.
Message Digest Algorithm 5 (MD5) is a secure hash algorithm developed at RSA Data Security, Inc. It can be used to hash an arbitrary length byte string into a 128 bit value. MD5 is in wide use, and is considered reasonably secure.
MD5 processes the input text in 512-bit blocks, divided into 16 32-bit sub-blocks. The output is a set of 4 32-bit blocks, which are concatenated to a single 128-bit hash value.
Security
It has been reported recently that MD5 was potential weaknesses in it, and that it's breakable in some cases. Also it is also said that one could build a special-purpose machine costing a few million dollars to find a plaintext matching given hash value in a few weeks.
Still, MD5 is considered to be relatively secure and good enough for most purposes.
MD2 and MD4 are earlier attempts of RSA Inc. They have more flaws and it's recommended to use MD5 instead.
Secure Hash Algorithm (SHA), also Secure Hash Standard (SHS) was published by the US Government. It produces an 160 bit hash value from an arbitrary length string.
SHS is structurally similar to MD4 and MD5. It is roughly 25% slower than MD5 but may be more secure, because it produces message digests that are 25% longer than those produced by the MD functions.
Security
Since SHA has a longer (160 bit) hash value it is more resistant to brute force attacks than MD5.
Abstractly, a zero knowledge proof is an interactive proof with a prover and a verifier, where the prover convinces the verifier of a statement (with high probability) without revealing any information about how to go about proving that statement. A mathematical proof is found at [ZER95].
Zero knowledge proofs can be used for authentication before exchanging key information. Details about the zero knowledge systems can be found in Hannu Aronsson's paper [HAA95].
The best applications combine different cryptosystems. The primary advantage of public-key cryptography is increased security: the private keys do not ever need to transmitted or revealed to anyone. In a secret-key system, by contrast, there is always a chance that an enemy could discover the secret key while it is being transmitted.
Another major advantage of public-key systems is that they can provide a method for digital signatures and authentication. Secret key systems would require a third party for this.
A disadvantage of using public-key cryptography for encryption is speed.
For encryption, the best solution is to combine public- and secret-key systems in order to get both the security advantages of public-key systems and the speed advantages of secret-key systems. The public-key system can be used to encrypt a secret key which is then used to encrypt the bulk of a file or message.