Getting Past “Snake-Oil” And To Be Trusted As a Profession — The Major Issue in Cyber Security?

I was contacted by someone asking for advice, as they had been told that all of their SHA-1 signed digital certificates could be cracked…

Getting Past “Snake-Oil” And To Be Trusted As a Profession — The Major Issue in Cyber Security?

I was contacted by someone asking for advice, as they had been told that all of their SHA-1 signed digital certificates could be cracked, and their consultant was advising them to renew all of them. For me, it was bad advice, and I’ll outline the reasons later on.

Until we, as an industry, get rid of this type of simplistic advice, and create a consensus for our viewpoints, we might always be seen as a snake oil industry — where we scare people rather than educating and informing.

I say “industry”, but NIST identifies that there are more than 50 jobs roles in Cyber Security, so perhaps that is the core of the problem, in that we are giving advice outside our knowledge domains, as most people still think there is such as thing as a “Cyber Security Professional”?

For the BBC and Sky News, we see commentators introduced as a “Cyber Security Professional”, and not a “Security Operations Analyst”, “Cryptography Specialist”, or a “Business Risk Analyst”. In some cases, it’s a bit like a chip designer (an electronic engineer) being asked about 33 kV transmission systems (an electrical engineer).

So let me explain why the SHA-1 advice was in-precise and poorly defined.

The real story

Overall SHA-1 is a hashing method, and where we create a digital thumbprint for our data. If we change one bit of data, it completely changes the hash signature. Unfortunately, we can get collisions of hashes, and where different data can give the same hash.

The longer the hash, the less likely it is to create a collision. For MD5, the time to find a collision is relatively low, so we cannot trust it. A researcher on the Internet, for example, used the AWS GPU Cloud and managed to get three photographs to have the same signature. With MD5 we have a 128-bit signature, but SHA-1 increases this to 160 bits. This gives much greater security and has … double … double … double … (32 times) the number of signatures that MD5 has. So while we see fake Adobe updates signed with a valid MD5 signature, it is highly unlikely, at the current time, to see SHA-1 ones.

Overall SHA-1 is on the “at risk” list and will be deprecated, in the same way that MD5 has become untrusted. So last year, Google created a collision in the SHA-1 hashing function:

But wait … it took Google two years to create a single collision. Considering the resources at Google’s fingertips, the cracking of SHA-1 is well out-of-budget for most organisations [here]. The answer for many is the move to SHA-2/SHA-256 [here], but this can be costly, especially to replace digital certificates which use SHA-1. Basically we have evolved from MD5 (a 128-bit hash signature), and then onto SHA-1 (a 160-bit hash signature), but SHA-256 ( a 256-bit hash signature) is now being recommended. And just in case SHA-2 gets cracked, there’s a new method defined with SHA-3 (Keccak) [here].

The research team produced two PDF files which have the same hash signature: (PDF 1 and PDF 2). This type of collision is known

Overall SHAttered took 9,223,372,036,854,775,808 computations, with the equivalent of 6,500 years of single-CPU computations (such as from a desktop computer) and 110 years of single-GPU computations. With 110 GPUs, of course, it will only take one year. So, if you can afford lots of GPUs, then you could find a collision, but it’s going to be costly in terms of energy requirements. SHAttered is 100,000 faster than a brute force attack that uses the birthday paradox, and where we would it would take 12,000,000 GPU one year to crack.

So how did they manage to beat brute-force? Well they used a few tricks in the PDF format, in order to stuff bytes in certain ways, that allowed them to produce the same SHA-1 hash:

Conclusions

Two years is a long time, and, as far as Google knows, no-one in the wild has implemented that crack. The advice is that SHA-1 is to be depreciated. The advice was wrong for two reasons:

The crack targeted specific elements of the PDF document format, and not digital certificates.
You do not crack the certificate and release the private key. You can only create a certificate which looks as it is properly signed with the private key of the organisation.
You need the resources — and budget — of the NSA to find a SHA-1 collision. While computing resources are improved, there’s no way that we will see a major roll-out of SHA-1 fake certificates any time soon. We thus have time to replace them, as we renew.
Certificates are the foundation of trust on the Internet, and a wide scale change of this could open up so many issues.
The faking of a digital certificate is a great deal more difficult than finding a collision for a PDF document.
An intruder is going to need a great deal of money — $million — and millions of GPUs to crack just one certificate.
You must understand the business risk in everything that you do, and each business has their own risks, and these need to be fully understood before making changes.
There’s two types of certificates — one which holds public and the private key and the other which holds the public key. Replacing certificates involves either recooking the certificates with an improved certificate, or changing all the certificate. The changing of certificates on a large-scale basis should only be done when a private key has been leaked. The revocation process of digital certificates is flaky at the best of times.

You trust a doctor’s advice, as the medical profession has typically created a consensus on their viewpoint. Within Cyber Security, we are not yet a coherent professional body, and the public cannot properly know who will give them the best advice. In the UK, the BCS and the IET, for example, have failed to properly create a coherent focus for Cyber Security professionals. New proposals from the NCSC might address this problem though, but many also see that the IISP is the true home of creating a consensus.

Unfortunately, perhaps, the gap between businesses and the general public, and technical specialists is still a wide one, and we need to find better ways to create agreement on our viewpoints, and articulate this better to the general public — rather than just scaring them with in-precise advice.

Background

Here’s a bit of a background on hashing and collisions.

MD5 problems

Moore’s Law predicted that computing power doubles every 18 months or so, so if we have a code which takes 100 years to crack, within 18 months, with the equivalent cost of a system, it will only take 50 years. To simplify things we must project that computing power doubles every year, so we find that a code which takes 100 years to crack, will, after 10 years, only takes a matter of weeks to crack (7 weeks). But the trend of improving hardware is now being overtaken by the Cloud, and the standard cryptography we have been using for years is now being push off-the-shelf.

The first to feel the heat is MD5, created by Ron Rivest, and has been a standard method for creating a digital fingerprint of data. It is used extensively in checking that data has not been changed and in providing identity. In the past it has been used to store hashed values of passwords, but its application in this area is reducing fast, as many of the common hashed MD5 values for words have been resolved. One of the key things that is important for MD5 is that the different data does not produce a collision — where different data, especially in the same type of context does not produce the same hash signature. Recently, though, Mat McHugh showed that he could produce the same hash signature for different images, using hashclash, and for just 65 cents on the Amazon GPU Cloud, and took just 10 hours to process.

For 10 hours of computing on the Amazon GPU Cloud, Mat created these two images which generate the same hash signature (Figure 1). If we check the hash signatures we get:

C:\openssl>openssl md5 hash01.jpg

MD5(hash01.jpg)= e06723d4961a0a3f950e7786f3766338

C:\openssl>openssl md5 hash02.jpg

MD5(hash02.jpg)= e06723d4961a0a3f950e7786f3766338

Figure 1: Images

Hashing

With MD5 we use a 128-bit digital fingerprint for our data, and a change of one bit should change the complete signature. For example:

Try “The quick brown fox jumps over the lazy dog.”. Try!, which should give a MD5 of: E4D909C290D0FB1CA068FFADDF22CBD0
Try “The quick brown fox jumps over the lazy dog”. Try!, which should give a MD5 of: 9E107D9D372BB6826BD81D3542A419D6

The method has been shown that it has flaws, where a change in a few of the bits, does not change the output. A collision occurs when there are two different values that produce the same hash signature. In the following example we use a hex string to define the data element (as the characters would be non-printing).

Try!, which should give a MD4 hash of: 79054025255FB1A26E4BC422AEF54EB4.
Try!, which should give a MD4 hash of: 79054025255FB1A26E4BC422AEF54EB4

Programs too can be modified to give the same hash value, for example these files(goodbye.exe and hello.exe).

C:\openssl>openssl md5 erase.exe

MD5(erase.exe)= cdc47d670159eef60916ca03a9d4a007

C:\openssl>openssl md5 hello.exe

MD5(hello.exe)= cdc47d670159eef60916ca03a9d4a007

C:\openssl>erase.exe

This program is evil!!!

Erasing hard drive...1Gb...2Gb... just kidding!

Nothing was erased.

(press enter to quit)

C:\openssl>hello.exe

Hello, world!

(press enter to quit)

Cracking the Cloud

The Cloud is becoming the largest (and most inexpensive) super computer ever created, and when GPU (Graphical Processing Units) are added it becomes superfast in cracking ciphers. With this we can parallelise the processing for the cracking so that a range of hash values can be allocated to processing elements running in parallel so that a 1,000 processing element array will take approximately 1/1,000th of the time.

There are two ways to defeat a computer with cryptography. The first is to give it a problem that it really struggles with, such as: “If I take a value and raise it to the power of a secret number and then divide by a certain number, and give you the remainder, you’ll not be able to find what my secret number is within a reasonable amount of time” — basis of El Gamal and Diffie-Hellman. The other method is to have too many of something — such as keys — so that it will take too long for the computer to find the right key that fits.

Most of the cryptography methods are thus defined to make it too difficult to find the right fit. Unfortunately the benchmarking for many of the methods is built around fair old computers, running with just a few cores, and non-optimized for the task. With GPUs in the Cloud, the task becomes so much easier.

An extension of this is to customise for the method used, such as using ASICs to crack Bitcoins, but, compared with running the cracking on the Cloud, this is more expensive in costs.

It’s your birthday

Mat used the birthday attack which is one of the methods used for a brute-force attack, and is based on the birthday problem in probability theory. It defines that we take a set of n randomly chosen people, and there will be a certain percentage that will have the same birthday. A group size of only 70 people results in a 99.9% chance of two people sharing the same birthday. Using this method, if we take an m-bit output there are 2^m messages, and the same hash value would only require 2^(m/2) random messages.

Here are some examples:

Same birthday with 1 person should give 0%. Calc
Same birthday with 10 people should give 11.7%. Calc
Same birthday with 20 people should give 41.14%. Calc
Same birthday with 23 people should give 50.73%. Calc
Same birthday with 30 people should give 70.63%. Calc
Same birthday with 60 people should give 99.41%. Calc

Finally is this a good bet …

If we have classes of 30 students, and I take the money if there is at least two students in the class with the same birthday, otherwise you can have the money.

Try this link.