Troy Hunt: Here’s how I’m going to handle the Ashley Madison data

This morning I was reading a piece on the Ashley Madison hack which helped cement a few things in my mind. The first thing is that if this data ends up being made public (and it’s still an “if”) then it will rapidly be shared far and wide. Of course this happens with many major data breaches, but the emergence already of domains like WasHeOnAshleyMadison.com signal a clear intent to make it easily accessible as well.

The second thing was the assumption that leaked data could be removed. Of course it can be in some jurisdictions, but this would be no more than sticking the proverbial finger in the dyke. If released with the intent of it being distributed en mass, it will be and all the DMCA takedowns in the world won’t change that. Billion dollar picture houses can’t solve this problem. Underground drug markets run successfully for years. Data will be shared, just ask Hacking Team (that was almost certainly a much larger breach in terms of data volume too).

But there’s no escaping the human impact of it. The discovery of one’s spouse in the data could have serious consequences. The stress inflicted on individuals that they may now be “found out” could be significant. Yes, there’s the whole ethical issue of Ashley Madison’s purpose and certainly there’s a lot of very unsympathetic commentary out there, but morals and views of relationship statuses are much more complex than simply concluding that everyone in the dump “deserved it”. What’s more, as fellow security guy Per Thorsheim points out, many of the accounts in the dump may not even be “real” so the exposure of this data may have ramifications for those who had absolutely nothing at all to do with the site.

The point is that this data going public has potential ramifications that are very different to say, the Adobe or Forbes dumps. I’ve had quite a bit of time to consider how to handle this in Have I been pwned? (HIBP) and I’ve reached some conclusions I want to share in advance so everything is clear upfront.

Ashley Madison Banner

People desperately want to know if they’re impacted

The traffic on HIBP has tripled since the Ashley Madison news broke. I’ve not loaded any AM data (the auto-import of fake pastes aside), yet three times as many people as usual began descending on the site. Traffic has literally grown threefold in the last week and a bit. Part of that will be because the site was often referenced in the press, part of it will be people curious about their own non-AM accounts but part of it will also be genuine AM customers wondering if they’ve been impacted.

This was much of the original MO of HIBP – make it easy for people to discover where they’ve been exposed. Nothing changes in that regard, the big difference this time is the discoverability by other people.

The data needs to be treated as “sensitive” and not available to the masses

I don’t believe it’s responsible to make all the AM accounts discoverable by anyone. Yes, they will be through various other routes anyway, but I’m not prepared for HIBP to be the avenue through which a wife discovers her husband is cheating or something even worse happens:

Please remember reporters: an account with Ashley Madison doesn't mean anyone had an affair, or ever would. Potential risk of of suicide.
— Per Thorsheim (@thorsheim) July 20, 2015

Consequently, one thing is clear:

Anonymous users will not be able to find Ashley Madison users in HIBP.

There are other ways of handling this so that those who need to know can find out. What they then do with the data is up to them, of course, but there won’t be a construct on HIBP where someone’s spouse or kids or co-workers can randomly pull records.

Using the notification system to solve the problem

The solution I’ve arrived at revolves around the current notification system. I want to ensure that the existing 130k subscribers get the notification that they would expect; if the data is leaked, HIBP will notify them via their verified email address which, of course, will be the one that was used to sign up to Ashley Madison. The neat thing about this model is that for those subscribers, they don’t need to be able to search online because they’ll be told via email anyway. Which leads me to the solution to this problem.

As of now, all new subscribers to the notification system will see a complete list of where their email address has been exposed after they verify it.

What this means is that the data doesn’t need to be shown publicly, it’s only made visible post-verification. The verification process involves clicking on a link with a unique token that is emailed to them. It looks just like this:

Screen shot of breach data after subscribing

But of course it does still mean I need to hold the data and make it searchable, the difference now is that I need to classify it differently. This will all still work for domain searches too because there’s already a verification process in place. If you created emails @mydomain.com and you were able to verify that domain then you’ll get the AM alerts.

Introducing “sensitive” breaches

Due to the Ashley Madison event, I’ve introduced the concept of a “sensitive” breach, that is a breach that contains, well, sensitive data. Sensitive data will not be searchable via anonymous users on the public site, nor will there be indication that a user has appeared in a sensitive breach because it would obviously imply AM, at least until there were multiple sensitive breaches in the system. Sensitive breaches will still be shown on the list of pwned sites and flagged accordingly.

Why this model works

I could have gone down the route of saying that I’ll only email any matches for an email address and never show anything on the public site whether they be sensitive or not. This is a usability nightmare though, not just because you don’t get immediate results but because you then need anti-automation as well to prevent spam. Plus it would break the public API that already has many, many consumers using it. It’s a better fit to keep the information easily accessible for the majority of breaches and keep it private for those rare cases such as AM.

This is a low-friction approach for both the users of the service and myself as the guy who has to build and support it. Implementing it this way meant nothing more than showing results when following the verification link in the subscription email and adding a flag to the breaches that keeps the sensitive ones out of the public eye.

For people genuinely worried about being in the Ashley Madison breach, there’s an easy solution: subscribe to the notification system. Yes, I’m aware that this advice is also a way of building the subscriber base but hopefully the rationale of this approach is now clear and it’s not just viewed as a grab at more subscribers. Besides, it’s free and you’ll only hear from the service when something you’re genuinely going to want to know about happens.

I don’t know if the Ashley Madison data will end up getting dumped or not. The original threat by Impact Team was pretty clear – shut down or they’ll dump the data – but I honestly have no idea if they’ll follow through with that threat or not. It might happen months from now as it did with Domino’s in France; they didn’t pay the ransom that was being demanded and six months later the data was dumped. This is why I’m writing this now and preparing HIBP accordingly because I want to be able to handle the data in a responsible fashion if it does hit. And hey, if it’s not AM then sooner or later it will be another site with data that needs to be handled more sensitively than usual, it’s an inevitability.

Please comment

Lastly, please do leave your comments, questions, suggestions and indeed criticisms below. This is a chance to shape the responsible handling of this data before it hits.

Updates

Let me share some more info in response to some of the comments here and via social channels:

Verifying all searches: I’m not planning on forcing verification for searches across all breaches and there are a number of reasons for this. One is that it adds a significant usability barrier for the reasons outlined under the “Why this model works” heading above (requires CAPTCHA, sending of emails, spam issues, etc). Another is that it breaks the API ecosystem; all those apps that help people assess their risk by consuming from the API die. Yet another is that in the vast majority of cases, this info is already easily discoverable via enumeration on the site (i.e. Adult Friend Finder will tell you if an email address exists on the site). The premise I maintain with this data is that for the non-sensitive breaches, this makes it no easier on the attackers (they’ll just pull the original public dump) but makes discoverability easier for those who genuinely want to assess their risk without unduly increasing it. Also keep in mind that the presence of an email address in a breach does not necessarily mean the owner of that address signed up to the site. This is Per’s point in the link I referenced in the post and it’s something I should probably make clearer in the search. tl;dr – the AM breach doesn’t change the original intention or design of the service for non-sensitive breaches.

The Adult Friend Finder Breach: A number of people have asked if I’ll now flag the AFF breach as “sensitive”. That horse has already bolted – the data has been there for months, the controversy has hit the headlines and died off, the incident now resides in the annals of data breach history. If it happened today then yes, I would flag it as sensitive using the model outlined in this post. Suspicious spouses have already done their searches by now and removing the data from public searches would have other adverse affects such as “breaking” the continuity of the API (an account could be found yesterday but is now gone today). Further to that and as I mention above, AFF will explicitly confirm whether an email address exists on their service or not via their password reset page anyway – suspicious spouses don’t even need HIBP!

The Adult Friend Finder Breach - updated: In light of the subsequent Ashley Madison breach being made public on August 19, the additional scrutiny on data of this nature and massive exposure that HIBP has received, I've elected to flag the AFF breach as "sensitive" which means it is no longer publicly searchable. AFF still has an enumeration risk and will still disclose to the public if an account exists on their site, but that information is no longer discoverable via HIBP.

Domain searches: Does it make sense to allow domain searches to return sensitive data? The thing about this is that there is already a verification process in place for domain searches. You have to demonstrate that you can control the domain or the site that it points to in order to do a search. If someone successfully proves that level of control then they almost certainly have full access to all emails on the domain anyway. For example, if someone can add TXT records or they’re listed as a contact on the domain then they effectively have control over anything@domain.com. A use case that’s been brought up a few times is corporate email addresses – should your company be able to see that you had an account on AM? If the org owns the domain then yes, I believe they should and that’s probably in their corporate policies already anyway. And again, if the org is able to demonstrate that they own the domain then they have access to individual accounts anyway be that via the corporate Exchange implementation or backups or even physical access to employee machines. On the flip side, many people have personal domains they’ve subscribed to HIBP (i.e. @troyhunt.com) and they have an expectation of being notified if they appear in a breach. I appreciate it’s not a black and white scenario, but I feel comfortable with the requirements for domain level searches that include sensitive breaches.

Criteria for flagging a breach as “sensitive”: I'll have to make a judgement call on a case by case basis. There are multiple factors I see going into this: what is the potential impact on individuals of disclosure (AM could destroy families or lives), is the information discoverable already (the AFF example again), is the breach widely available already (the Hacking Team torrents were everywhere) and other aspects I’m sure I’ll have to consider when the times comes. I don’t have a concise answer for this and there will be subjectivity involved – not everyone will always agree – it’s just something I’ll have to play by ear and assess each case on it’s own merits.

Receiving notifications for closed email accounts: This has come up a bunch of times and I’ve actually received quite a number of personal emails about it. I don’t have a model for this right now and the difficulty is obviously that without access to an email account, you can’t verify it. What I’d encourage anyone interested in having a feature like this to do is capture your idea on the HIBP User Voice and do also please leave feedback on how you think this should work.

Q&A: Please note that following the public dumping of this data in August, I created a Q&A to answer many of the questions that are arising here. Please read that if you have any questions relating to how HIBP is handling the data.

Security Have I Been Pwned Ashley Madison

Here’s how I’m going to handle the Ashley Madison data