Structural weakness
The main problem of the hash functions based on Merkle-Damgärd construction is that they’re all vulnerable to length extension attacks. As the message x is split into k blocks x=x1,x2,…,xk (whereas xk contains the padding) and then hashed to the value h(x)=Hk, we can choose a message x′=x,xk+1=x1,x2,…,xk,xk+1. Since the first k blocks of x′ and x are identical, the final hash value of x, i.e., H(x)=Hk, is identical to the internal hash value of H(x′) after k blocks. Because of the iterated formula Hi=g(Hi−1,xi), we can conclude that H(x′)=g(Hk∥xk+1)=g(H(x)∥xk+1).
The fact that H(x) provides direct information about the inner state of H(x′) after k blocks are highly problematic. Suppose Alice and Bob communicate with each other, whereas Alice sends the message x to Bob and wants to authenticate it by sending H(K∣∣x), where K is a secret key that is only known by Alice and Bob, and x is a known message. If H was an ideal hash function, H(K∣∣x) would be a proper authentication system. But since H is prone to length extensions, Eve is now able to extend the message to x∥x′ and forge the authentication code to H(K∥x∥x′) without knowing the secret K.
In order to make length extension attacks impossible, Ferguson and Schneier (2003) proposed the following fix for the hash function:
Iterative hash function
Let H be an iterative hash function. The hash function Hd is defined by Hd=H(H(x)) and has the security level of min(k,n/2), where k is the security level of H and n is the size of the hash result.
Bitcoin uses this fix as they use double-SHA256
as their Proof-of-Work function.
Security of real-life dedicated hash functions
Today’s practical implementations of hash functions are based on dedicated hash functions, which are widely in use, such as MD5
or SHA-2
. In this section, we’ll give a brief overview of the most common hash functions and their security (Mohammad A. AlAhmad et al. (2013)Mohammad A. AlAhmad and Imad Fakhri Alshaikhli. Broad view of cryptographic hash functions, 2013.).
The MD4
hash function was designed by Rivest in 1990Ronald L. Rivest. The MD4 message-digest algorithm. In Proceedings of the 10th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO’90, pages 303-11, London, UK, 1991. Springer-Verlag.. The first collision was found by Dobbertin (1996)Hans Dobbertin. Cryptanalysis of MD4. In Fast Software Encryption, Third International Workshop, Cambridge, UK, February 21-23, 1996, Proceedings, pages 53-69, 1996. with a complexity of 220 MD4
hash operations. This result was further improved by Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag., who reduced the complexity to 28 hash computations such that collisions can be found in a few microseconds. Because of these attacks, the MD4
is no longer collision-resistant. Leurent (2008)Gaëtan Leurent. Fast software encryption. Chapter MD4 is Not One-Way, pages 41228. Berlin, Heidelberg, 2008. Springer-Verlag. demonstrated the first preimage attack on MD4 with a complexity of 2102, which is far below the expected 2128 computations (see this definition
:Security_Goal_of_Hash_Function
).
MD5
is a further development of MD4
and was presented by Rivest in 1992Ronald L. Rivest. The MD5 message-digest algorithm, 1992. Available at https://www. rfc-editor.org/rfc/rfcl321.txt.. Den Boer and Bosselaers (1993)Bert den Boer and Antoon Bosselaers. Collisions for the compression function of MD5. In Tor Helleseth, editor, Advances in Cryptology - EUROCRYPT '93, pages 293-304, Berlin, Heidelberg, 1994. Springer. have shown collisions in the underlying compression function, whereas Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg, 2005. Springer-Verlag. introduced the first efficient practical collision attack on the full MD5
hash function. Further improvements by Klíma (2005)Vlastimil Klíma. Finding MD5 collisions - a toy for a notebook. IACR Cryptology ePrint Archive, 2005:75, 2005. have shortened the search time for full collisions significantly, i.e., down to less than 4 hours using an ordinary notebook PC (Intel Pentium 1.6GHz ). This attack was developed by Stevens (2006)Marc Stevens. Fast collision attack on MD5. IACR Cryptology ePrint Archive, 2006:104, 2006., such that collisions could be found in only minutes using a 3GHz Intel Pentium 4. A further improvement by Stevens (2007)Marc Stevens. On collisions for MD5. Master’s thesis, Eindhoven University of Technology, 62007 . allowed them to find collisions in about 224 compressions such that an attack takes approximately six seconds on a 2.6GHz Intel Pentium 4. Furthermore, Sasaki and Aoki (2009) proposed a preimage attack with a computational complexity O(2123.4).
The most popular dedicated hash functions from the Secure Hash Algorithm (SHA) family were published by the National Institute of Standard and Technology (NIST) as a U.S. Federal Information Processing Standard FIPS PUB 180−X, including SHA-0
, SHA-1
, SHA-2
, and SHA-3
. SHA-0
, SHA-1
, and SHA-2
rely on the Merkle-Damgård construction and were designed by the United States National Security Agency (NSA). SHA-0
was released in 1993 but was later replaced by SHA-1 in 1995 because of a significant weakness due to a design flaw, which is why SHA-0 plays no role in practice.
SHA-1
was released in 1995 as a 160-bit message digest function. However, many standards recommend it to be replaced after 2010 by SHA-2
or SHA-3
, as Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag. proposed the first theoretical collision attack on SHA-1
that required 269, and thus less than the expected 280 hash computations.
In 2017, Google and the Cryptology Group at Centrum Wiskunde & Informatica (CWI) Amsterdam announced the first practical collision by generating a hash collision for two different PDF documents, as shown in the figure given below. The attack requires roughly 263.1 hash evaluations, which equals approximately 6500 single-CPU years. The corresponding PDF files can be found on the official project homepage.
Figure 1
The two PDF documents yield a collision for SHA-1
, but not for SHA-256