It’s Your Birthday

… Want an Easy Bet?

Photo by Rebecca Wiggins on Unsplash

It’s Your Birthday

… Want an Easy Bet?

Well they say that should never take a bet from someone who says “I’ll bet you …”. So let’s take an easy bet for you…

If we have classes of 30 students, and I take the money if there is at least two students in the class with the same birthday, otherwise you can have the money.

Do you take the bet? Surely it’s 1-in-365, or perhaps it’s 30-in-365 (less than a one-in-ten chance that I will win) ? Either way you win at least 10 times more than me. It looks a great bet, and you surely can’t lose.

Unfortunately for you, this is the classic Birthday problem, and its a bad bet, as there is a greater than 70% chance that I will win each time:

http://asecuritysite.com/encryption/birthday?val=365,30

If fact if we take a class of 60, there is 99.4% chance of me winning. So if we sampled 1,000 classes, I would win 994 times and you would win just six times … not a good bet!

If you are interested the equation is:

The 365! part stands for 365x364x364…x2x1.

So I reckon that a fair bet is with classes of 23 students, and we both have roughly an evens chance of winning. Here are the estimates:

  • Same birthday with 1 person should give 0%. Calc
  • Same birthday with 10 people should give 11.7%. Calc
  • Same birthday with 20 people should give 41.14%. Calc
  • Same birthday with 23 people should give 50.73%. Calc
  • Same birthday with 30 people should give 70.63%. Calc
  • Same birthday with 60 people should give 99.41%. Calc

What has this go to do with anything?

Moore’s Law predicted that computing power doubles every 18 months or so, so if we have a code which takes 100 years to crack, within 18 months, with the equivalent cost of a system, it will only take 50 years. To simplify things we can estimate that computing power doubles every year, so we find that a code which takes 100 years to crack, will, after 10 years, only takes a matter of weeks to crack (7 weeks). But the trend of improving hardware is now being overtaken by the Cloud, and the standard cryptography we have been using for years is now being push off-the-shelf.

The first to feel the heat is MD5, created by Ron Rivest, and has been a standard method for creating a digital fingerprint of data. It is used extensively in checking that data has not been changed and in providing identity. In the past it has been used to store hashed values of passwords, but its application in this area is reducing fast, as many of the common hashed MD5 values for words have been resolved. One of the key things that is important for MD5 is that the different data does not produce a collision — where different data, especially in the same type of context does not produce the same hash signature.

Recently, Mat McHugh used the Birthday attack to produce the same hash signature for different images, using hashclash, and for just 65 cents on the Amazon GPU Cloud, and took just 10 hours to process.

For 10 hours of computing on the Amazon GPU Cloud, Mat created these two images which generate the same hash signature (Figure 1). If we check the hash signatures we get:

C:\openssl>openssl md5 hash01.jpg
MD5(hash01.jpg)= e06723d4961a0a3f950e7786f3766338

C:\openssl>openssl md5 hash02.jpg
MD5(hash02.jpg)= e06723d4961a0a3f950e7786f3766338

Figure 1: Images

Hashing

With MD5 we use a 128-bit digital fingerprint for our data, and a change of one bit should change the complete signature. For example:

  • “The quick brown fox jumps over the lazy dog.”. Try!, which should give a MD5 of: E4D909C290D0FB1CA068FFADDF22CBD0
  • “The quick brown fox jumps over the lazy dog”. Try!, which should give a MD5 of: 9E107D9D372BB6826BD81D3542A419D6

The method has been shown that it has flaws, where a change in a few of the bits, does not change the output. A collision occurs when there are two different values that produce the same hash signature. In the following example we use a hex string to define the data element (as the characters would be non-printing).

  • Try!, which should give a MD4 hash of: 79054025255FB1A26E4BC422AEF54EB4.
  • Try!, which should give a MD4 hash of: 79054025255FB1A26E4BC422AEF54EB4

Programs too can be modified to give the same hash value, for example these files (goodbye.exe and hello.exe).

C:\openssl>openssl md5 erase.exe
MD5(erase.exe)= cdc47d670159eef60916ca03a9d4a007

C:\openssl>openssl md5 hello.exe
MD5(hello.exe)= cdc47d670159eef60916ca03a9d4a007

C:\openssl>erase.exe
This program is evil!!!
Erasing hard drive…1Gb…2Gb… just kidding!
Nothing was erased.

(press enter to quit)

C:\openssl>hello.exe
Hello, world!

(press enter to quit)

To see how easy it is to crack hashed passwords, see a demo here:

Cracking the Cloud

The Cloud is becoming the largest (and most inexpensive) super computer ever created, and when GPU (Graphical Processing Units) are added it becomes superfast in cracking ciphers. With this we can parallelise the processing for the cracking so that a range of hash values can be allocated to processing elements running in parallel, so that a 1,000 processing element array will take approximately 1/1,000th of the time.

There are two ways to defeat a computer with cryptography. The first is to give it a problem that it really struggles with, such as: “If I take a value and raise it to the power of a secret number and then divide by a certain number, and give you the remainder, you’ll not be able to find what my secret number is within a reasonable amount of time” — basis of El Gamal and Diffie-Hellman. The other method is to have too many of something — such as keys — so that it will take too long for the computer to find the right key that fits.

Most of the cryptography methods are thus defined to make it too difficult to find the right fit. Unfortunately the benchmarking for many of the methods is built around fair old computers, running with just a few cores, and non-optimized for the task. With GPUs in the Cloud, the task becomes so much easier.

An extension of this is to customise for the method used, such as using ASICs to crack Bitcoins, but, compared with running the cracking on the Cloud, this is more expensive in costs.

It’s your birthday

Mat used the birthday attack which is one of the methods used for a brute-force attack, and is based on the birthday problem in probability theory. It defines that we take a set of n randomly chosen people, and there will be a certain percentage that will have the same birthday. A group size of only 70 people results in a 99.9% chance of two people sharing the same birthday. Using this method, if we take an m-bit output there are 2^m messages, and the same hash value would only require 2^(m/2) random messages.

Conclusion

In terms of someone standing up in court now and being asked: “Could these two images produced the same hash signature?” … the answer is “Yes!”. With hashed values for passwords, the only was forward for them is to a salt to their generation, with at least scrambles them a bit, where the same word will produce different hashed values each time they are run.

In conclusion .. don’t bet with a Professor!