One of the most alarming trends I've seen in the world of data breaches since starting Have I Been Pwned (HIBP) back in 2013 is the rapid rise of credential stuffing attacks. Per the definition in that link, it simply means this:
Credential stuffing is the automated injection of breached username/password pairs in order to fraudulently gain access to user accounts.
This form of attack relies on a combination of people reusing the same password across services and then the services themselves allowing automated attacks like this to happen. The first part of that is a simple fix we all have control of as individuals but is extremely hard to address as service operators: people need to stop reusing passwords. Go and get a password manager (I use 1Password), generate random strings for passwords, job done. (Of course, use 2 factor authentication everywhere you can too.)
The second part of the problem - services allowing this to happen - is much more nuanced because what we're saying here is "someone comes to your website with the correct username and password but they're not the legitimate owner of the credentials therefore you should keep them out". This is a hard problem and I'm enormously sympathetic to organisations on the receiving end of these highly automated attacks; there's a lot of support burden that falls back to them after someone has their account taken over due to an attack of this nature and they can be held liable for it. Earlier this year, the FTC in the USA brought a case against an organisation that was the target of a credential stuffing attack and they had this to say:
The FTC's message is loud and clear: If customer data was put at risk by credential stuffing, then being the innocent corporate victim is no defence to an enforcement case.
This is the primary reason I created the Pwned Passwords service last year so that website operators could block people from using passwords that have previously appeared in breaches. That service now receives over 9 million requests a day with many more querying the downloadable data set. It's a simple yet effective tool.
But onto the topic at hand:
I've just loaded 111 million email addresses found in a credential stuffing list called "Pemiblanc" into HIBP.
I had multiple different supporters of HIBP direct me to this collection of data which resided on a web server in France and looked like this:
That site has now been taken down and the data no longer accessible, but per the image above you can see the files dating it around early April. The "USA" folder above contained a loosely organised set of files filled with email address and password pairs:
That one file alone had millions of records in it and due to the nature of password reuse, hundreds of thousands of those at least will unlock all sorts of other accounts belonging to the email addresses involved.
The data was predominantly located in the "USA" folder although it's difficult to know just how much of it actually belongs to American owners. The domains on the email addresses in the image above tell us nothing about the geographic nature of where the owners are based; the reality of it is that this data will likely be from all over the world as it's likely cobbled together from multiple different data breaches. There were other folders in the data set, for example one named "test_split_40" with the first 40 rows containing email addresses all beginning with "bushsucks" followed by various things which, allegedly, Bush sucks.
There are other (much larger) credential stuffing lists already in HIBP, for example the Exploit.in and Antipublic lists I wrote about last year which contained more than a billion records between them. As such, I'm always cautious that I'm not just loading in the same data re-branded as something else. The Pemiblanc list contained 6.8 million email addresses that I've never seen in HIBP before. Of the ones that already were in the system, many were in those aforementioned lists from last year but a substantial number weren't, they were from other data breaches. There were also 50 million passwords that weren't already in the Pwned Passwords list which, given I had over 500 million in there already, is a substantial number (and yes, I do plan to release a V3 of this shortly including these new ones). So in short, there was sufficient new material in this list to justify loading the data.
Edit: I've just released V3 of Pwned Passwords and noted in there that the actual number of unique Pemiblanc passwords was 3.3M. The 50M number was calculated in error due to the presence of control characters (tabs and line returns) that appeared during the data import.
This blog post will be referenced when I make the data live in a moment and inevitably the same 2 questions will come up from people who find themselves pwned:
Which site leaked my account information and what can I do about it?
The answer to the first question is simply "I have no idea". There's nothing in the data to indicate sources short of me trying to imply it from the email address or password and even then, the reality is that these lists are constructed from many different data breaches - there will be no single source. But I do have the answer to the second question:
Go and get a password manager and make all your passwords strong and unique.
The entire value proposition of credential stuffing lists goes away when people do this and the impact of a data breach is constrained to that single site rather than putting all your accounts at risk. I first wrote about password managers 7 years ago when I concluded that the only secure password is the one you can't remember and that advice is more important today than ever before.
Lists like this serve as a reminder of how our data is abused and why good password hygiene is so important. There are always a small number of people who are upset after a list such as this is loaded into HIBP because they don't have information about what the password is (I never store this against an account in HIBP) nor the site it originally came from. But for the vast majority of people, it's awareness value and hopefully, it's the push they need to go and get that password manager. And just because I know people will ask, here are all the reasons I don't make passwords available via HIBP.
The entire 111 million records are now searchable in HIBP.
Edit (10 Jul): I'm working to fast-track V3 of Pwned Passwords which includes this data so that everyone has a way of checking their specific passwords against the service. You'll be able to check one-by-one using the existing web interface, in bulk if you want to script it against the API, from directly within 1Password 7 on the desktop against all stored passwords or via any other service integrating with the API. It will take a day or 2, but I'm on it.
Edit (13 Jul): All passwords from this incident are now searchable in Pwned Passwords. You can check them one by one on the website, script it out using the API or if you're a 1Password user, check them all in the Watchtower feature in V7 on the desktop.