Over the last few days, I've loaded more than 1 billion new records into Have I been pwned(HIBP). As I describe in that blog post, this data was from two very large "combo lists", that is email address and password pairs created by malicious parties in order to help them break into other accounts reusing those credentials. In all, I sent about 440k email notifications and saw hundreds of thousands of people come to HIBP and search for their data. From a personal security awareness perspective, loading the data has been enormously effective.
But there's a question I got over and over again via every conceivable channel:
How can I see the password on my record?
I want to talk generally about why HIBP doesn't make this data available and then I'll come back and talk specifically about last week's combo lists.
Nobody should ever be emailing passwords
Email is not considered a secure communications channel. You have no idea if your email is encrypted when it's sent between mail providers nor is it a suitable secure storage facility. Last week's Google Docs phishing attack is a perfect example of that: one wrong click and you've inadvertently given an attacker access to your email, including any passwords stored within there.
Once common practice, websites emailing you your password is now severely frowned upon. You'd often see this happen if you'd forgotten your password: you go to the "forgot password page", plug in your email address and get it delivered to your inbox. In fact, this is such a bad practice that there's even a website dedicated to shaming others that do this.
Now, in that blog post I wrote on the billion records, I did mention I sought verification from a small number of HIBP subscribers and I emailed them their password in the process. I only did so after requesting their permission and only because it was enormously important to be confident in the accuracy of the data given the extent of people impacted. I requested each person who received it to then change that password anywhere it had been used. It's an exceptional circumstance but I wanted to call it out here; this is not a practice that should be happening en mass in any automated system.
Of course, despite the requests for an email with the password, I could always store them on the site and have people retrieve them from there over a secure connection. But there's another problem that raises and it relates to how the data would need to be stored.
Secure (enough) storage is impossible
The other option people propose is that I should make the password accessible on the site itself, that is allow people to verify that they control the associated email address then display it on a secure page. In order to do this, I'd need to store the password in a fashion that is inconsistent with industry best practice. Let me explain:
Passwords should be stored as cryptographic hashes - not encrypted - but rather hashed. A hash is a one-way, deterministic algorithm. In other words, you can pass a piece of text into it and get a non-reversible output that's always the same when you pass the same text into the same algorithm (that's the deterministic bit). When you register on a website using a hashing algorithm, the password you provide is hashed and the output of the algorithm is stored in the database. When you come back to login, you provide your username and password then the latter is hashed with the same algorithm and the output of that is compared to the one in the DB.
Let's say your password is "Passw0rd" (also, don't use that password!) and the hashing algorithm is MD5. The output of the algorithm is then d41e98d1eafa6d6011d3a70f1a5b92f0. You cannot reverse this process short of cracking hashes which is not particularly relevant in terms of the viability of me storing passwords in HIBP. What that means is that if I was to store passwords in HIBP properly using a strong cryptographic hashing algorithm (not MD5!) I would not be able to reverse them and show people anything useful.
So I'd have to encrypt them and the problem with encryption is decryption. If HIBP got comprehensively pwned itself - and that is always a possibility - to the extent where the encryption key was also exposed, it's game over. Or alternatively, if there's a flaw in the process that retrieves and displays the password such that it becomes visible to an unauthorised person, that's also a very serious issue.
The vast majority of my commercial efforts go into teaching others how to build secure systems. An entire module of the workshop I run around the world is devoted to talking about the secure storage of credentials and encouraging people not to do precisely what I'm being asked to do by those who want their password retrievable. You can see the problems here.
Most breaches already contain hashed passwords
If you look back through the largest verified breaches in HIBP, you see the following incidents and their respective password storage strategies:
- MySpace: 359M accounts with SHA1 hashes
- LinkedIn: 165M accounts with SHA1 hashes
- Adobe: 152M accounts with (badly) encrypted passwords
And as you go through the breaches, you see a raft of them using various hashing techniques. None of these provide anything of any use I could make available via HIBP. Unless you're someone that knows how to take a hash (and often a salt along with it) and compare that to your own plain text password for confirmation, the hash is useless. That's 99.x% of people that use this site left with something they can't use and no, I wouldn't want to either give them a feature or send them off to some arbitrary website to enter their real password so that it could be hashed and compared!
So what about hash cracking? Why don't I just crack as many as I can and make those searchable? From a pure technical perspective, that's always going to result in a sub-100% success rate, especially when stronger hashes are involved (CloudPets, Plex and Ashley Madison all used bcrypt). I'd have to explain that I couldn't determine the plain text value, but that doesn't mean that someone else couldn't. Then there's the time and effort involved; this is a non-trivial exercise, especially with stronger algorithms and larger data sets.
And finally, I'll talk more about ethics in a moment but taking masses of credentials hacked out of another system and then attempting to brute force them so that I can put plain text versions of them up online doesn't sit well with me at all. The now defunct LeakedSource received a cease and desist order from LinkedIn last year for doing just that. Their ultimate demise was also likely hastened by holding data of this nature.
So in short, many of the passwords found in many of the breaches would be useless anyway, they'd create confusion, it'd put a lot of burden back on me it'd be especially legally grey to attempt to do anything useful with them. I appreciate that's not the case with the combo lists that prompted this post, but I wanted to ensure this barrier is still captured here.
No, it's not ok that these passwords are already out there
Over the years, many people have said "well, the data is public anyway by virtue of it having been breached, what's the problem if you now store the password in your system?" Here's the philosophical problem I have with that:
Someone, somewhere has screwed up to the extent that data got hacked and is now in the hands of people it was never intended to be. No way, no how does this give me license to then treat that data with any less respect than if it had remained securely stored and I reject outright any assertion to the contrary. That's a fundamental value I operate under, but there are other more practical reasons as well.
For example, just because data has been breached doesn't make it readily accessible to all. Now this differs wildly case by case: the Ashley Madison hacker went to great lengths to torrent the data extensively and spread it as far as possible. The VTech hacker only gave it to one reporter who only gave it to me; nobody else was going to find that data floating around.
The bottom line is that no matter how irresponsible other people have been with the data, that doesn't give me license to take shortcuts on the privacy of those who own it.
Because it's important to say "I don't store passwords in HIBP"
In the past, I've had people approach me with all sorts of creative means by which I could store this data and make it available to people. But no matter how good a crypto solution I come up with, being able to hand-on-heart say "I don't store passwords in HIBP" is enormously important. Not "I store them but I've been really, really, really careful with them" because that always leaves an element of doubt in people's minds.
There's also the legally grey area of running this whole thing: this is billions of data records that malicious parties have illegally obtained by committing crimes that could put them in jail for considerable time. If you've been following what I've been doing with the project over the last few years, you'll have seen that I'm enormously cautious to operate the service as responsibly as I can whilst still making it readily accessible. Storing passwords as well is another step into the darker end of grey that puts me at further risk.
And that's something I'm also especially conscious of; this is just me - one guy - running the whole thing. I'm not a large entity with an army of lawyers that can fend off organisations or individuals that get pissed that their data has turned up on HIBP. I'm also obviously not attempting to hide my identity or operate under a veil of secrecy so that I can just disappear at the drop of a hat; I run this openly and transparently and I need to do everything I feasibly can to minimise the negative attention this service could attract.
I'm not your personal lookup service
And finally, for everyone who contacts me privately and says "but could you just look up my own password", please understand that you're one of many people who ask this. I try and reply to everyone who asks and politely refer them to my previous writing on the subject, but even then, all the time I spend replying to these requests is time I can't spend building out the service, adding more data, earning a living doing other things or spending time with my family.
For the last 3 and a half years that I've run HIBP, I've kept all the same features free and highly available as a community service. I want to keep it that way but I have to carefully manage my time in order to do that so in addition to all the reasons already stated above, no, I'm not your personal lookup service.
Regarding the combo lists
Someone left me a comment yesterday which said that by not sending people their passwords I was effectively saying:
"Hey, folks — just wanted to let y'all know that one of the 500 locks on one of your 500 doors is broken. Not gonna tell you which one though. Hope that helps!"
This is because as it relates to the billion plus records in those two combo lists, we simply don't know where the data came from. People appearing in it are uncertain which account it was that actually got pwned and indeed where they should now change that password. But analogies with the real world are frequently grossly misrepresented and this is a perfect example so let me rephrase it appropriately:
This isn't asking about which of the person's 500 doors was left unlocked, rather it's asking me to put the actual keys for over a billion doors up into a publicly accessible location with nothing other than my own personal best efforts to keep them safe. And it's not one person reaching out asking me which door either, it's literally been hundreds over the last few days.
In the case of these combo lists, the guidance I gave in the opening paragraph of that blog post still holds true: go get yourself a password manager and create strong, unique passwords. If you had this already then the extent of your risk is very limited and likely constrained to old incidents (at least based on the feedback in the comments). If you didn't and this incident is the impetus that causes you to start practising secure password management then that's precisely the outcome I'd hoped for. And many people are doing just this:
I've been slack and reused passwords... just spent an hour fixing. Thanks @troyhunt / @haveibeenpwned for the kick in the ass I needed :-)
— Craig Edwards (@edwardaux) May 7, 2017
I totally understand the desire to know as much information as possible and I hope this blog post explains the challenges involved. I may never work out where much of this data came from, but the one thing I know for sure is that if it results in more behavioural changes like in the tweet above, then that's a very good result.