A Practical Use for a SHA1 Collision
[This is a guest diary by Paul Bolton]
First I would like to start by stating what this article is not; it is not a new attack against sha1.
When Google announced a sha1 collision in February (here) it reminded me of a detour I took in Nov 2015 when downloading some software, which I think is relevant when we consider these types of announcements.
We talk about things that the most people can relate to, as this is the most effective way to communicate the issue. When talking about hashing algorithms such as sha1 this naturally tends towards “is my Internet banking” or other browsing is still safe.
In these cases at a technical level not only do we have the challenge of finding a collision but a collision in a highly structured format such as an SSL cert or the issue of time, which make the attack less feasible.
However, standard algorithms tend to be used for more than one thing, and in the case of hashing algorithms, if we can find a more malleable format to abuse or better still one that is also not time bound like a web page loading is, we have a better chance of success.
And this is where the story moves onto ISO images.
As I'm sure many people reading this know, ISO images are usually accompanied by two further files, a file containing the hash of the image (the hash file) and a file containing the PGP signature validating the hash file.
In this case, you already have (or obtain via other channels) the PGP public key of the organization attesting to the validity of the image. They sign the hash file with their PGP key, which you can then validate. Given that, you can then validate the ISO image by computing the hash of it and comparing it with the validated hash file. If they match then all is good.
This allows you to download the software from (mirror) sites you don't necessarily fully trust, or over insecure links, as the validation chain is independent and the theory is mathematically sound.
So, whilst playing with some tools at home I went to download an ISO image. Being a security conscious individual I also successfully validated the file using this process; but wait, the ISO image was odd – literally – it had an odd number of bytes. I was curious....
From a sysadmin background, I knew that disks in a filesystem can be of one size, but the partition table that defines what the OS is interested in, can be of another size. The rest of the data on the disk is irrelevant, whether at the end or on partitions that the OS is not using. Likewise, the treatment of unused blocks within a valid filesystem can vary as well.
After a bit of experimenting, I discovered I could append arbitrary random data to an ISO image and it would still mount on my Linux box, mount via VMWare and could be burnt on my Windows box and the CD/DVD read without error.
At the time I thought it best to play with Kali Linux 2.0, which included sha1 hashes for validation; plus I liked the nice symmetry of use.
Let's take the Kali mini iso:
[paul@pandora sans]$ openssl sha1 kali-linux-mini-2.0-amd64.iso
SHA1(kali-linux-mini-2.0-amd64.iso)= 5639928a1473b144d16d7ca3b9c71791925da23c
[paul@pandora sans]$ fgrep kali-linux-mini-2.0-amd64 SHA1SUMS
5639928a1473b144d16d7ca3b9c71791925da23c kali-linux-mini-2.0-amd64.iso
Let's now generate some random data.
[paul@pandora sans]$ dd bs=64 if=/dev/
urandom
of=./padding-pseudo-random.tmp count=16384
16384+0 records in
16384+0 records out
1048576 bytes (1.0 MB) copied, 0.138081 s, 7.6 MB/s
Appending this to a new copy of the ISO generates a new “untrusted” file which passes a simple iso file integrity check.
[paul@pandora sans]$ cp kali-linux-mini-2.0-amd64.iso fake-kali.iso
[paul@pandora sans]$ cat padding-pseudo-random.tmp >> fake-kali.iso
[paul@pandora sans]$ ls -l kali-linux-mini-2.0-amd64.iso padding-pseudo-random.tmp fake-kali.iso
-rw-r--r--. 1 paul paul 31457280 Apr 1 11:04 fake-kali.iso
-rw-r--r--. 1 paul paul 30408704 Apr 1 10:58 kali-linux-mini-2.0-amd64.iso
-rw-r--r--. 1 paul paul 1048576 Apr 1 11:02 padding-pseudo-random.tmp
[paul@pandora sans]$ isovfy fake-kali.iso
Root at extent 14, 6144 bytes
[0 0]
No errors found
Now, the litmus test. Can we mount it without error:
[root@pandora sans]# mount -o ro `pwd`/fake-kali.iso /mnt
[root@pandora sans]# cd /mnt/.disk && cat info
Debian GNU/Linux 2.0 (sana) amd64 - netboot mini.iso 20150422+kalinext2
How about booting it as a DVD in VMWare Workstation 12.5?
Booting it:
Copying over to windows and then burning a real CD works as well. Nice!
As an attacker, this is really good news. We can append arbitrary data to an ISO file and it will still function as expected.
In other words, we can take an ISO of our choosing (in the example the mini iso), append random data so it is of the same size as one we would wish to impersonate (in the example the full-size iso), and then there is just that annoying matter of matching hashes.
So, how easy could it be to find a collision? Well no harder than finding a collision in a small file.
The performance improvements rely on two facts:
- You have the original ISO image you wish to impersonate, and
- The fake ISO is smaller than the original.
First, a simplistic review of SHA1. You have a set of registers that are initialized towell-knownn values. The data you wish to hash is padded with a defined sequence of data and, importantly, the encoded size of the original data so that it is an exact multiple of the block size of 512 bits. You compute each block which modifies the registers, which then feeds into the next block. The hash you get are the resulting “registers” from completing the sha1 computation.
What this infers is that providing the size of the data is the same and everything after a specified block is also the same, then if you can modify that block to result in the same register settings at that point you will get the same sha1 hash for the whole file. i.e. you only need to complete the computation for one block for the main part of such an attack (512 bit block and 160 bit hash infers that a good hashing algorithm will have an even distribution and therefore multiple collisions if you could test all 2^512 combinations; though probabilities and other cryptanalysis suggest the number of tests is much less for finding just one collision, which is all we need).
So, as in the demonstration earlier, if you can append arbitrary data without error to the given format, you have the ideal situation to find a collision with a real-life and common use case.
More explicitly, as you have the original ISO (I) you can extract the registers at any part of the computation (before the padding). Therefore you can append to your impersonated ISO (I') random data up to block I'n, say. This would have the registers set to R'n. In your original ISO at In the registers would be Rn.
In the original ISO the processing of block n will change the registers to Rn+1. Therefore, providing any appended data in I' for blocks after block n are the same as in I, if we can set block n in I' to contain data that yields R'n+1 = Rn+1 we have a collision. (Remember they are the same size)
As we can append arbitrary data this constraint on later blocks is trivial to meet.
Other than looking at the image itself, would there be any clues as to the fact that something is not right? I'm sure readers will be able to mention a few ideas. One which could stand out is the size of the mounted partition if the fake is very much smaller:
[paul@pandora sans]$ cp kali-linux-mini-2.0-amd64.iso fake-kali-2.iso
[paul@pandora sans]$ cat kali-linux-mini-2.0-amd64.iso >> fake-kali-2.iso
[paul@pandora sans]$ ls -l kali-linux-mini-2.0-amd64.iso fake-kali-2.iso
-rw-r--r--. 1 paul paul 60817408 Apr 1 13:56 fake-kali-2.iso
-rw-r--r--. 1 paul paul 30408704 Apr 1 10:58 kali-linux-mini-2.0-amd64.iso
[root@pandora sans]# mount -o ro `pwd`/fake-kali-2.iso /mnt && df -k /mnt
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 22662 22662 0 100% /mnt
As you can see here, the mounted filesystem is half the size of the iso image; suggesting something isn't quite right if it isn't multi-session.
Finally, I would love to have finished with fanfare to an end result; a collision. Alas, my pockets are nowhere near as deep nor my crypto-fu as strong as Google.
So on that note, I think the main take for me is-
If the security of a particular algorithm or protocol is shown to be less than ideal then you should consider how all your use cases could potentially be abused. You may find that whilst the headline examples offer some comfort, they may hide some other use cases that are more attractive to an adversary.
Here we have a practical example how we can potentially take advantage of a weak algorithm, and thus further evidence we should move away from sha1 as a hashing function.
Comments