# substr md5 collision

I need a 4-character hash. At the moment I am taking the first 4 characters of a `md5()` hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (164) different elements?

Well, each character of `md5` is a hex bit. That means it can have one of 16 possible values. So if you're only using the first 4 "hex-bits", that means you can have `16 * 16 * 16 * 16` or `16^4` or 65536 or `2^16` possibilities.

So, that means that the total available "space" for results is only 16 bits wide. Now, according to the Birthday Attack/Problem, there are the following chances for collision:

• `50%` chance -> `300` entries
• `1%` chance -> `36` entries
• `0.0000001%` chance -> `2` entries.

So there is quite a high chance for collisions.

Now, you say you need a 4 character hash. Depending on the exact requirements, you can do:

• 4 hex-bits for `16^4` (65,536) possible values
• 4 alpha bits for `26^4` (456,976) possible values
• 4 alpha numeric bits for `36^4` (1,679,616) possible values
• 4 ascii printable bits for about `93^4` (74,805,201) possible values (assuming ASCII 33 -> 126)
• 4 full bytes for `256^4` (4,294,967,296) possible values.

Now, which you choose will depend on the actual use case. Does the hash need to be transmitted to a browser? How are you storing it, etc.

I'll give an example of each (In PHP, but should be easy to translate / see what's going on):

4 Hex-Bits:

```\$hash = substr(md5(\$data), 0, 4); ```

4 Alpha bits:

```\$hash = substr(base_convert(md5(\$data), 16, 26)0, 4); \$hash = str_replace(range(0, 9), range('S', 'Z'), \$hash); ```

4 Alpha Numeric bits:

```\$hash = substr(base_convert(md5(\$data), 16, 36), 0, 4); ```

4 Printable Assci Bits:

```\$hash = hash('md5', \$data, true); // We want the raw bytes \$out = ''; for (\$i = 0; \$i < 4; \$i++) { \$out .= chr((ord(\$hash[\$i]) % 93) + 33); } ```

4 full bytes:

```\$hash = substr(hash('md5', \$data, true), 0, 4); // We want the raw bytes ```

Surprisingly high indeed. As you can see from this graph of an approximate collision probability (formula from the wikipedia page), with just a few hundred elements your probability of having a collision is over 50%.

Note, of course, if you're facing the possibility of an attacker providing the string, you can probably assume that it's 100% - scanning to find a collision in a 16-bit search space can be done almost instantaneously on any modern PC. Or even any modern cell phone, for that matter.

4 first characters contains 4*4 = 16 bits of data, so collision will be definitely at 65536 elements, and, due to birthday attack, it will be found much faster. You should use more bits of hash.

Category:hash Views:1 Time:2011-01-13

## Related post

• Create your own MD5 collisions 2009-06-01

I'm doing a presentation on MD5 collisions and I'd like to give people any idea how likely a collision is. It would be good to have two blocks of text which hash to the same thing, and explain how many combinations of [a-zA-Z ] were needed before I h

• md5 collision how is it possible? 2011-01-13

i don't understand how one can create a rough Certificate just by making a MD5 collision. Even if you were able to find another string whose hash matches the original how would you sign it ? You do not have access to the Certificate authority's priva

• why MD5 collision failed? 2009-11-27

The two data blocks are from this site,but failed to produce the collision : var_dump(md5('d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c

• What's the shortest pair of strings that causes an MD5 collision? 2010-01-04

Up to what string length is it possible to use MD5 as a hash without having to worry about the possibility of a collision? This would presumably be calculated by generating an MD5 hash for every possible string in a particular character set, in incre

• HTTP Digest Authentication MD5 Collision 2011-07-03

MD5 hashes are now considered broken, because collision might happen. Is this problematic for HTTP digest authentication? --------------Solutions------------- MD5 is known to be vulnerable to collision attacks. HTTP Digest does not require collision

• Will using a substring of an MD5 hash like this be unique enough? 2010-02-14

What I am trying to do is create a 12 character id for articles on my website similar to how youtube handles their video id (http://www.youtube.com/watch?v=53iddd5IcSU). Right now I am generating an MD5 hash and then grabbing 12 characters of it like

• How many random elements before MD5 produces collisions? 2008-10-14

I've got an image library on Amazon S3. For each image, I md5 the source URL on my server plus a timestamp to get a unique filename. Since S3 can't have subdirectories, I need to store all of these images in a single flat folder. Do I need to worry a

• md5 hash collisions. 2011-07-30

If counting from 1 to X, where X is the first number to have an md5 collision with a previous number, what number is X? I want to know if I'm using md5 for serial numbers, how many units I can expect to be able to enumerate before I get a collision.

• Represent MD5 hash as an integer 2009-09-14

In my user database table, I take the MD5 hash of the email address of a user as the id. Example: email([email protected]) = id(d41d8cd98f00b204e9800998ecf8427e) Unfortunately, I have to represent the ids as integer values now - in order to be able

• shorter php cipher than md5? 2010-09-21

For a variety of stupid reasons, the maximum length of a given form variable that we are posting to an external server is 12 characters. I wanted to obscure that value with md5, but obviously with 12 characters that isn't going to work. Is there a ci

• Update old stored md5 passwords in PHP to increase security 2012-01-12

At the moment I have a database with md5 passwords stored, a few years back this was considered a little more secure than it is now and it's got to the point where the passwords need to be more secure. I've read a lot of posts on here about crypt, md

• Hash and salt collision 2009-03-09

I remember a guy telling me that if I let him change 4 bytes he can make a file have any checksum he wants (CRC-32). I heard mention of salting a hash. I am wondering if someone had his file match my file would salting the MD5 or SHA-1 hash change th

I have never checked and compared the MD5 hash to the real MD5 hash at programs homepages. Programs which I have downloaded have always worked. Is it possible that someone can put their own code during downloading? --------------Solutions------------

• Does any published research indicate that preimage attacks on MD5 are imminent? 2009-05-04

I keep on reading on SO that MD5 is broken, bust, obsolete and never to be used. That angers me. The fact is that collision attacks on MD5 are now fairly easy. Some people have collision attacks down to an art and can even us use them to predict elec

• How do I assess the hash collision probability? 2009-05-14

I'm developing a back-end application for a search system. The search system copies files to a temporary directory and gives them random names. Then it passes the temporary files' names to my application. My application must process each file within

• Importing MD5+Salt Passwords to MD5 2009-07-10

I'm moving my site from an oscommerce store to a commercial application. The new application stores its passwords using straight MD5 encryption. Oscommerce stores the password using MD5, but also adds a random 2 digit number (provided in plaintext) t

• Can I prevent duplicate content using md5? 2009-07-13

I would like to prevent duplicate content. I do not want to keep a copies of content, so I decided to keep just the md5 signatures. I read that md5 collisions do happen, different content could give in the same md5 signature. Do you think md5 is enou

• Examples of Hash-Collisions? 2009-08-03

For demonstration-purposes, what are a couple examples of strings that collide when hashed? MD5 is a relatively standard hashing-option, so this will be sufficient. --------------Solutions------------- This page provides these examples of 128 byte va

• Can two different strings generate the same MD5 hash code? 2009-11-18

For each of our binary assets we generate a MD5 hash. This is used to check whether a certain binary asset is already in our application. But is it possible that two different binary assets generate the same MD5 hast. So is it possible that two diffe