We defined the security requirements of hash functions in this definition:Security_Requirements_of_Hash_Functions
, where we formulated three properties that should be computationally infeasible to attack. In this section, we outline what this means in view of complexity. We’ll see that the size of a hash function’s message digest is a crucial security parameter as a hash function whose message-digest range is constrained by n bits contains 2n distinct hash values.
As a result, a brute force attack can find a preimage or second preimage for a general n-bit hash function in approximately 2n hash computations, and collisions in approximately 2n/2 computations, respectively. The 2n/2 hash computations in case the collision is possible due to the birthday attack are shown in this theorem. Therefore, Kelsey and Schneier (2004) give the following definition of the security goal of a hash function:
The security goal of a hash function
A hash function H with a message digest of bit-length n is expected to have the following minimal security properties:
An adversary should not be able to find a preimage or second preimage in less than 2n hash computations.
An adversary should not be able to find a collision in less than 2n/2 hash computations.
Furthermore, Kelsey and Schneier state that “a collision attack on an n bit hash function with less than 2n/2 work, or a preimage or second preimage attack with less than 2n work, is formally a break of the hash function.” (John Kelsey et al. (2005)John Kelsey and Bruce Schneier. Second preimages on n-bit hash functions for much less than 2Power{n} work. In Advances in Cryptology - EUROCRYPT 2005, 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May 22-26, 2005, Proceedings, pages 474-90, 2005.).
A formally broken hash function
A hash function is said to be formally broken if a preimage, second preimage, or a collision can be generated faster than via brute force.
We now want to prove that the birthday attack yields an upper bound of O(2n/2) hash computations in order to find a collision.
Lemma 1
The exponential function e−x is defined by the power series
e−x=k=1∑∞k!(−x)k=1−x+2!x2−3!x3+…
Theorem 1
A collision of a hash function H with message digest bit length n can be found with probability P( collision )>0.5inO(2n/2) hash computations.
Proof
Let H:{0,1}∗→{0,1}n a hash function and the set {x1,x2,…,xk} arbitrary input values. A collision is found if we’re able to determine two different xi=xj such that yi=H(xi)=H(xj)=yj. When we pick two values x1=x2, the probability of not getting a collision (i.e., to get two different message digests) is
P(H(x1)=H(x2))=2n2n−1=1−2n1.
If this holds, the probability to find a third value x3 such that H(x3)=H(x1) and H(x3)=H(x2) is given by
P(y3=y1∧y3=y2)=2n2n−2=1−2n2.
As a consequence, the probability that y1,y2, and y3 are pairwise different is given by
P(y1=y2∧y1=y3∧y2=y3)=(1−2n1)(1−2n2).
This argument can be continued, and thus the probability for no collisions among k message digests is
P( no collision )=(1−2n1)(1−2n2)⋯(1−2nk−1)=i=1∏k−1(1−2ni).
As discussed earlier, it follows that the approximation
e−x≈1−x
holds for x≪1, and therefore
(1−2ni)≈e−2ni
since 2ni≪1. Thus, we can approximate the probability as
P( no collision )≈i=1∏k−1e−2ni=e−2n1+2+3+…+k−1.
The exponent contains the arithmetic series
1+2+3+…+k−1=2k(k−1)
that yields a probability
P( no collision )≈e−2⋅2nk(k−1)
that no collision occurs. Since the probability P of at least one collision is given by P=P( collision )=1−P( no collision ), we achieve an approximated probability of