The Essential Guide to Cryptography: Hashing, Encryption, and Signing

Cryptography primarily studies how to protect information security through mathematical and computer science methods. It is one of the cornerstones of internet security.

The term “Cryptography” originates from the Greek words “kryptós” (hidden) and “gráphein” (writing).

Information Security comprises the following five fundamental attributes:

  • Integrity
  • Authenticity
  • Non-Repudiation
  • Confidentiality
  • Availability

This article will explain, through specific applications, how cryptographic hashing, encryption, and signing effectively safeguard these fundamental attributes.

#️⃣ Hashing

Cryptographic Hash Function

A Cryptographic Hash Function (CHF) is a one-way mathematical algorithm used to map data of any size to a fixed-size bit string. Common cryptographic hash functions include MD5, SHA256, and others. Cryptographic hash functions typically have the following characteristics:

  • Deterministic: The same input always produces the same output.
  • Efficiency: The hash value can be calculated quickly.
  • Avalanche effect: Minor changes in the input lead to significant changes in the output.
  • Collision Resistance: It is computationally infeasible to find two different inputs that produce the same output.
  • Preimage Resistance: Knowing the output, it is computationally infeasible to deduce the input.
  • Second Preimage Resistance: Knowing one input and its corresponding output, it is computationally infeasible to find another input that produces the same output.
SHA256('hi') = '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4'
SHA256('ho') = 'a821c62e8104f8519d639b4c0948aece641b143f6601fa145993bb2e2c7299d4'
  |     |                                   |
 CHF Plain Text                       Hash or Digest

Cryptographic Hash Functions (CHF) differ from Non-Cryptographic Hash Functions (NCHF), which don’t require security features like resistance to preimage attacks, making them faster and more resource-efficient. Common NCHF example: FNV-1.

Application: Password Storage

From the perspectives of both attack and defense, let’s explore how cryptographic hashing, key stretching, and salting can be used to counteract brute force attacks, dictionary attacks, and rainbow table attacks, thereby securely storing passwords.

  • 🛡️ Defense
    • Cryptographic hashing
    • Key stretching
    • Salt & Pepper
  • ⚔️ Offense
    • Brute-force attack
    • Dictionary attack
    • Rainbow table

🛡️ Suppose a user chooses a simple password, 123456. Service backend hashes it using SHA256 and stores the hash in the database. On login, input is rehashed and checked against the database hash. If they match, login succeeds.

SHA256('123456') = '8d969e...'

⚔️ Attackers use brute-force attacks to test passwords one by one, seeking the string whose hash value is 8d969e.... After some time, they successfully discover that 123456 is the password.

SHA256('1')      != '8d969e...' # failed attempt
SHA256('12')     != '8d969e...' # failed attempt
...
SHA256('123456') == '8d969e...' # success

🛡️ To increase the cost of brute-force attacks for attackers, the backend can implement key stretching. This method boosts the number of hash iterations for each password, requiring attackers to expend more resources for each attempt.

SHA256(SHA256(SHA256('123456'))) = '9ebf7e...'

⚔️ Attackers can use a list of common passwords, and start cracking with the most likely ones. Since users often choose these common passwords, this significantly increases the success rate. This method is known as a dictionary attack. Dictionary files are typically large but save time compared to brute-force attacks, effectively trading space for time.

SHA256('password') != '5e8848...' # failed attempt
SHA256('123456')   == '8d969e...' # success

⚔️ Attackers can further optimize time complexity using lookup tables, which store the hash values of common passwords from the dictionary as keys, significantly speeding up the search process. This strategy further trades space for time.

'5e8848...': 'password'
'8d969e...': '123456'   # success

⚔️ However, lookup tables require vast storage to cover a large password space. To address this, rainbow tables were developed. Rainbow tables are precomputed using cryptographic hash functions (CHF) and reduction functions. The construction process involves selecting a set of initial data as head nodes, alternately applying the hash function (H) and the reduction function (R) to reach the tail nodes. Only the heads and tails of each chain are stored, significantly reducing storage size compared to lookup tables.

'password' -> H    ->    R    ->    H    ->    R    ->    H    ->  R -> 'goodday'  
              |          |          |          |          |          
        '5e8848...'  '123456'  '8d969e...' 'whatsup' '6a8k12...'  
              
'newton'   -> H    ->    R    ->    H    ->    R    ->    H    ->  R -> 'supernova'
'gaming'   -> H    ->    R    ->    H    ->    R    ->    H    ->  R -> 'blackwell'

only stores('password', 'goodday'),('newton', 'supernova'),('gaming', 'blackwell'

The reduction function maps hash values back to a value within the password space. Its goal is to evenly cover the password space to minimize collisions. The password space refers to the set of all possible password combinations.

To crack a password, alternate between R and H on the target hash to locate the chain containing it. Then, starting from the head node, alternate H and R to find the original value of the target hash. While rainbow tables have slightly worse query performance compared to lookup tables, they save significant space, allowing the same storage to cover a larger password space.

- Alternate R and H operations on the target hash value '8d969e...'
- After each R operation, check if the result matches a tail node
    - R('8d969e...') = 'whatsup' != any tail node
    - H('whatsup') = '6a8k12...'
    - R('6a8k12...') = 'goodday' IN ('password', 'goodday')
- Calculate further and find R(H(R('8d969e...'))) = 'goodday'
- This indicates the target hash's preimage is in the chain ('password', 'goodday')
- Starting from 'password', alternate R and H until you find the hash '8d969e...' which is '123456'

🛡️ To counter rainbow table attacks, the backend can use salt. A salt is a random string generated by a cryptographically secure pseudorandom number generator (CSPRNG). When hashing a user’s password, the password and salt are combined to create the final hash. The salt is stored in plaintext alongside the hash result. Each user has a unique salt.

SHA256('123456'+'sda8%D') = '264e1e...'
          |        |           |
        Secret    Salt       Digest
                   |-----------|
                         |
                    Store in DB

The purpose of the salt is to introduce randomness. With salt protection, attackers must calculate a dedicated rainbow table for each salt value, significantly increasing the cost of cracking.

Pepper, like salt, is a string. The difference is that pepper is not stored in plaintext in the database; it is usually stored in the configuration and is globally unique.

SHA256('123456' + 'd9^d7x' + 'sda8%D') = '36df42...'
          |          |          |           |
        Secret     Pepper      Salt       Digest
                     |          |-----------|
                     |                |
              Store in Config    Store in DB

Pepper’s role is to make it difficult for attackers to crack user passwords even if they obtain the plaintext salt values and hashes from a database breach.

Despite these security measures, attackers can still crack passwords. Users should also enhance security awareness by using strong passwords, changing them regularly, enabling 2FA, and recognizing phishing links.


🔒 Encryption

Encryption refers to using an algorithm and a key to convert plaintext data into ciphertext. Common encryption algorithms fall into two categories: symmetric-key encryption and public-key encryption.

Encryption vs. Encoding: These are two different concepts. Encryption aims to protect data privacy, while encoding converts data into a specific format for easier storage, transmission, or processing, such as Base64.

Symmetric-key Encryption

Symmetric-key encryption uses the same key for both encryption and decryption.

K('hi') = 'ab0af2...'    ->    K('ab0af2...') = 'hi'
|          |                   |                 |
Key        Cipher Text         Key               Plain Text                 

Common symmetric encryption algorithms include DES, 3DES, and AES. Among them, AES (Advanced Encryption Standard) is the most widely used.

AES supports three different key lengths:

  1. AES-128: 128-bit key, 10 encryption rounds (better performance)
  2. AES-192: 192-bit key, 12 encryption rounds
  3. AES-256: 256-bit key, 14 encryption rounds (more secure)

AES supports various encryption modes:

  1. ECB (Electronic Codebook): The earliest and simplest encryption mode.
  2. CBC (Cipher Block Chaining): Requires an IV, the most commonly used encryption mode.
  3. GCM (Galois/Counter Mode): Requires an IV, the most common mode for encryption and authentication.

ECB mode is not recommended because it lacks randomness; identical input blocks always produce the same output, making it insecure.

The AES-256 CBC mode encryption process is as follows:

  1. Specify a Password and use CSPRNG to generate a Salt.
  2. Use PBKDF2 to derive a 128-bit Initialization Vector (IV). The IV ensures the ciphertext is random each time.
  3. Use PBKDF2 to derive a 256-bit Key.
  4. Encrypt the plaintext using the IV and Key to obtain the ciphertext.
  5. Transmit the ciphertext along with the Salt.
  6. The receiver uses the Password and Salt with PBKDF2 to regenerate the IV and Key for decryption.
Password -|         |-> IV(128 bits)  -|
          |-PBKDF2->|                  |-Encrypt-> Cipher
Salt -----|         |-> Key(256 bits) -|

Below is a demonstration of using openssl for AES-256 CBC mode encryption and decryption:

> cat plaintext
Let's eat ShakeShack!

# Encrypt
> openssl enc -aes-256-cbc -salt -pbkdf2 -in plaintext -out encrypted -pass pass:123456

# Decrypt
> openssl enc -aes-256-cbc -d -salt -pbkdf2 -in encrypted -out decrypted -pass pass:123456

> cat decrypted
Let's eat ShakeShack!

One of the biggest challenges of symmetric encryption is the key distribution problem: how to securely transmit the key from the sender to the receiver.

Public-key Encryption

Public-key encryption uses different keys for encryption and decryption.

K-('hi') = 'v9we6k...'    ->    K+('v9we6k...') = 'hi'
|                               |                  |
Private Key                     Public Key         Plain Text                 

Common public-key encryption algorithms include RSA and ECC. Below is a simple example of RSA key generation, encryption, and decryption.

RSA Key Generation:

Select two prime numbers: p=5, q=11
Calculate n: n = p * q = 5 * 11 = 55
Calculate Euler's totient function: ϕ(n) = (5-1)*(11-1) = 40
Choose public exponent e, a positive integer less than ϕ(n): e = 3
Calculate private exponent d: d * e mod ϕ(n) = 1

Public Key: (n, e) = (55, 3)
Private Key: (n, d) = (55, 27)

RSA Encryption:

Encrypt plaintext 12: 12 ^ 3 mod 55 = 23

RSA Decryption:

Decrypt ciphertext 23: 23 ^ 27 mod 55 = 12

Public-key encryption is more resource-intensive compared to symmetric encryption, making it unsuitable for encrypting large amounts of data. It is commonly used for key distribution and digital signatures.

Application: SSL/TLS

SSL/TLS is a classic application of both symmetric and public-key encryption. SSL/TLS uses public-key encryption for key exchange and symmetric encryption for subsequent communication. This approach avoids the performance issues of public-key encryption and the key distribution problems of symmetric encryption.

Client <-Cert(Ks+)- Server
Client --Ks+(Key)-> Server
Client <-Enc(Msg)-> Server

✍🏼 Signing

Digital Signature

Message Digest guarantees integrity.

Alice                                        Bob
Msg: 'I\'m buying everyone ShakeShack' -|    |- Msg: 'I\'m buying everyone ShakeShack'
          |                             |    |            |
          | SHA256(Msg)                 | -> |            | SHA256(Msg)
          V                             |    |            V
HashA = 'ac6...f67' --------------------|    |- HashB = 'ac6...f67' == HashA

Message Authentication Code, MAC guarantees integrity and authenticity.

Alice                                        Bob
Msg: 'I\'m buying everyone ShakeShack' -|    |- Msg: 'I\'m buying everyone ShakeShack'
          |                             |    |            |
          | CalcMAC(Key+Msg)            | -> |            | CalcMAC(Key+Msg)
          V                             |    |            V
MAC A = '182...807' --------------------|    |- MAC B = '182...807' == MAC A

Digital Signature guarantees integrity and authenticity, and non-repudiation.

Alice                                        Bob
Msg: 'I\'m buying everyone ShakeShack' -|    |- Msg: 'I\'m buying everyone ShakeShack'
          |                             |    |            |
          | K-(SHA256(Msg))             | -> |            | SHA256(Msg)
          V                             |    |            V
Sig A = '8gk...f06' --------------------|    |- K+(Sig A) = SHA256(Msg)

Application: Digital Certificate

A digital certificate is an electronic document used to prove the ownership of a public key.

# Application
User                           CA
CSR = (User, K+)          -->  SigCA = Kca-(Hash(User, K+))  # Signing
Cert = (User, K+, SigCA)  <--  (User, K+, SigCA)             # Digital Certificate
# Verification
User                           Verifier
(K+, Cert)                -->  Kca+(SigCA) == Hash(User, K+) # K+ Verified

The CA’s public key (Kca+) is usually pre-installed on the user’s device. These public keys are part of the root certificates, forming the foundation of the trust chain.