An analysis of the Yahoo! passwords
Last month the biggest security news in the mainstream press was about the password (hash) "breaches" at LinkedIn, eHarmony, and last.fm. Last week, it was a bunch of passwords that were leaked via a Yahoo! service. These passwords were for a particular Yahoo! service, but the e-mail addresses being used were for quite a few domains. There has been some discussion of whether, for example, the passwords for Google accounts were also exposed. The short answer is, if the user committed one of the cardinal sins of passwords and reused the same one for multiple accounts, then, yes, some Google (or other) passwords may also have been exposed. Having said all of that, that isn't primarily what I wanted to look at today. I also don't plan to spend too much time on the password policy (or lack thereof) or the fact that the passwords were apparently stored in the clear, both of which most security folks would probably agree are bad ideas.
The domains
First, I did a quick analysis of the domains. I should note that some of the e-mail addresses were clearly invalid (misspelled domains, etc.). There were a total of 35008 domains represented. The top 20 domains (after converting all to lower case) are shown in the table below.
137559 yahoo.com
106873 gmail.com
55148 hotmail.com
25521 aol.com
8536 comcast.net
6395 msn.com
5193 sbcglobal.net
4313 live.com
3029 verizon.net
2847 bellsouth.net
2260 cox.net
2133 yahoo.co.in
2077 ymail.com
2028 hotmail.co.uk
1943 earthlink.net
1828 yahoo.co.uk
1611 aim.com
1436 charter.net
1372 att.net
1146 mac.com
The passwords
I saw an interesting analysis of the eHarmony passwords by Mike Kelly at the Trustwave SpiderLabs blog and thought I'd do a similar analysis of the Yahoo! passwords (and I didn't even need to crack them myself, since the Yahoo! ones were posted in the clear). I pulled out my trusty install of pipal and went to work. As an aside, pipal is an interesting tool for those of you that haven't tried it. As I was preparing this diary, I noted that Mike says the Trustwave folks used PTJ, so I may have to take a look at that one, too.
The first thing to note is that of the 442,836 passwords, there were 342,508 unique passwords, so over 100,000 of them were duplicates.
Looking at the top 10 passwords and the top 10 base words, we note that some of the worst possible passwords are right there at the top of the list. 123456 and password are always among the first passwords that the bad guys guess because for some reason we haven't trained our users well enough to get them to stop using them. It is interesting to note that the base words in the eHarmony list seemed to be somewhat related to the purpose of the site (e.g., love, sex, luv, ...), I'm not sure what the significance of ninja, sunshine, or princess is in the list below.
Top 10 passwords
123456 = 1667 (0.38%)
password = 780 (0.18%)
welcome = 437 (0.1%)
ninja = 333 (0.08%)
abc123 = 250 (0.06%)
123456789 = 222 (0.05%)
12345678 = 208 (0.05%)
sunshine = 205 (0.05%)
princess = 202 (0.05%)
qwerty = 172 (0.04%)
Top 10 base words
password = 1374 (0.31%)
welcome = 535 (0.12%)
qwerty = 464 (0.1%)
monkey = 430 (0.1%)
jesus = 429 (0.1%)
love = 421 (0.1%)
money = 407 (0.09%)
freedom = 385 (0.09%)
ninja = 380 (0.09%)
sunshine = 367 (0.08%)
Next, I looked at the lengths of the passwords. They ranged from 1 (117 users) to 30 (2 users). Who thought allowing 1 character passwords was a good idea?
Password length (count ordered)
8 = 119135 (26.9%)
6 = 79629 (17.98%)
9 = 65964 (14.9%)
7 = 65611 (14.82%)
10 = 54760 (12.37%)
12 = 21730 (4.91%)
11 = 21220 (4.79%)
5 = 5325 (1.2%)
4 = 2749 (0.62%)
13 = 2658 (0.6%)
We security folks have long preached (and rightly so) the virtues of a "complex" password. By increasing the size of the alphabet and the length of the password, we increase the work the bad guys must do to guess or crack the passwords. We've gotten in the habit of telling users that a "good" password consists of [lower case, upper case, digits, special characters] (choose 3). Unfortunately, if that is all the guidance we give, users being human and, by nature, somewhat lazy will apply those rules in the easiest way.
First capital last symbol = 1259 (0.28%)
First capital last number = 17467 (3.94%)
On the other hand, if we don't enforce at least that much, users won't bother.
Only lowercase alpha = 146516 (33.09%)
Only uppercase alpha = 1778 (0.4%)
Only alpha = 148294 (33.49%)
Only numeric = 26081 (5.89%)
I thought it was also interesting looking at the passwords that contained a year:
Years (Top 10)
2008 = 1145 (0.26%)
2009 = 1052 (0.24%)
2007 = 765 (0.17%)
2000 = 617 (0.14%)
2006 = 572 (0.13%)
2005 = 496 (0.11%)
2004 = 424 (0.1%)
1987 = 413 (0.09%)
2001 = 404 (0.09%)
2002 = 404 (0.09%)
What is the significance of 1987 and why nothing more recent that 2009? When I analyzed some other passwords, I'd see either the current year, or the year the account was created, or the year the user was born. And finally, some statistics inspired by the Trustwave analysis:
Months (abbr.) = 10585 (2.39%)
Days of the week (abbr.) = 6769 (1.53%)
Containing any of the top 100 boys names of 2011 = 18504 (4.18%)
Containing any of the top 100 girls names of 2011 = 10899 (2.46%)
Containing any of the top 100 dog names of 2011 = 17941 (4.05%)
Containing any of the top 25 worst passwords of 2011 = 11124 (2.51%)
Containing any NFL team names = 1066 (0.24%)
Containing any NHL team names = 863 (0.19%)
Containing any MLB team names = 1285 (0.29%)
I wish I had their list of curse words to test. :)
Conclusions?
So, what conclusions can we draw from all of this? Well, the obvious is that without any direction, most users will not choose particularly strong passwords and the bad guys know this. What constitutes a good password? What constitutes a good password policy? Personally, I think the longer, the better and I actually recommend [lower case, upper case, digit, special character] (choose at least one of each). Hopefully none of these users were using the same password here as on their banking sites. What do you, our faithful readers, think?
---------------
Jim Clausing, GIAC GSE #26
jclausing --at-- isc [dot] sans (dot) edu
The opinions expressed here are strictly those of the author and do not represent those of SANS, the Internet Storm Center, the author's spouse, kids, or pets.
| LINUX Incident Response and Threat Hunting | Online | Japan Standard Time | Oct 27th - Nov 1st 2025 | 
 
              
Comments
Anonymous
Jul 17th 2012
1 decade ago
As far as real users, 2 factor id is the easiest way to go that is easy to implement and relatively easy to use. Some kind of physical token, such as a usb stick, cd, or other memory device could easily hold a certificate issued by the service to id the user; it doesn't have to be a 1-time pad from you know who. ;-) ( But that works pretty well also! )
Moriah
Jul 17th 2012
1 decade ago
Moriah
Jul 17th 2012
1 decade ago
The problem and solutions for my world never came down to complexity but education and implementation. We, as IT professionals are schooled (or we would like to think we all are) in these tenants. But how do you educate those individuals that are not. It is never as easy as it seems. But it can be...
I have used a two finger complex password solution for about 7 years now that can be taught easily but once you sell it to the masses, you essentially create the template for "others" to work dilignetly at breaking.
This is my example and since I know it will be eons until the masses even comprehend the significance or importance, I do not worry. This password I have used for many years. I no longer do so please feel free to use it against any algorithm to test.
Il0v#h#@th#r
I know exactly what it is but to some systems (MS) this password has become NOT complex enough. However I can do a two finger, two keyboard key (not counting shift),16 character password that passes every time. Not only does it work but all I need to do to come up with another password when the time period expires is to move up the keyboard one letter. So here it is.
16 character complex, two finger salute to security
qq11QQ!!qq11QQ!! and to reset ww22WW@@ww22WW@@ and so on and so on.
Enjoy the unendingly complex world of "BUT I didnt know I was supposed to do that." IT Security.
Stryker
Jul 17th 2012
1 decade ago
dsh
Jul 17th 2012
1 decade ago
Thaiboxermike
Jul 17th 2012
1 decade ago
1) they are just too fallible and rely on people to remember them.
2) They are easily compromised.
3) once reach a critical mass of them ( i stand now at about 150 user account and password pairs) repeating them is very hard NOT to do.
The only solution I have found to date that is even worth a damn is the lastpass solution. You can use two factor authentication, extremely long and complex passwords, the best part is that you have to remember 1 password and your done. I do not work for them but have studied them in some detail and listened to the Steve Gibson podcast on them. If I have to use a password then i use lastpass to help me come up with a unique, non repeating gibberish password that I do not have to remember.
Eric
Jul 17th 2012
1 decade ago
1. A guessable password that can be found before an account locks out.
2. A poor application that does not protect stored credentials.
#1 means you don't need a complex hard-to-remember password. You just need one that is not easily guessable in a few tries.
#2 means that no matter what your password looks like, it's completely irrelevant if the application does not protect it.
If the application lets people abuse "I forgot my user name" or 'I forgot my password" functionality, it doesn't matter what you choose. If the application does not securely store the credentials, it doesn't matter what you choose. If the application does not include brute-force protection, it doesn't matter what you choose because eventually it will be figured out. If you get a keylogger on your computer or you get man-in-the-middled, it doesn't matter hat you choose.
"Lousy" passwords are almost completely irrelevant when considered next to HOW passwords get lost: a bad application. The fact is that almost all of the "lousy" passwords provided the needed security until the application coughed them up.
JJ
Jul 17th 2012
1 decade ago
whitetaco
Jul 17th 2012
1 decade ago
whitetaco, certificates just introduce a whole new set of issues. Look at the issues with rogue CAs or breached CAs over the years, or the recent certificate revocation issues in browsers and with app-signing certs.
Jim
Jul 17th 2012
1 decade ago