Over the last week, the Hacking Team story has absolutely exploded. It’s dominated the security news, featured heavily in tech publications and regularly appeared in the mainstream press. The 400GB of data leaked has been extensively torrented, mirrored and reproduced then of course commentated on at length in various articles and social media pieces. In terms of public breaches, this is as exposed as data gets.
Clearly, this incident is also highly controversial. Hacking Team has long been under suspicion for selling to dystopian nations supressing human rights and this breach sheds an all new light on that. But by the same token, they’ve also sold to governments using the software for legal intercepts in ways that most of us would deem quite reasonable; there’s a class of criminal we want off the streets who was being monitored via exploits which are now being patched. I wrote about some of the angles last week in my Security Sense column on Windows IT Pro and made a bunch of other points then and I don’t want to dwell on those here, let me instead focus on how I’m handling this with Have I been pwned? (HIBP).
This breach includes source code, websites, internal documents and a couple of hundred GB of emails in PST files. It’s a huge volume of data and as I mentioned earlier, it’s appearing in all sorts of places even down to the source code of their exploits being loaded into GitHub. I needed to decide what was relevant for HIBP and how to consolidate the information in a way that provided benefit to those impacted without disclosing anything that goes against the ethos of HIBP service. I’m also conscious that those who have been impacted need context; for an incident like this spread over such a huge trove of data, just saying “Hey, you were somewhere in 400GB of all types of data” doesn’t give them much useful info. For example, there are people who have applied for jobs in the data breach – what would their reaction be if they knew their email address was in there but not why it was in there? Would they remember that job application from a couple of years ago when they were probably firing off dozens of emails at a time? Or would they suddenly think that perhaps they’d appeared in an internal document talking about targets identified by an oppressive regime?
What I decided to do was just load the email addresses that appear in the PSTs. This may be a sender or a recipient or even a mention of the email in the body or in an address book, but they’re all just from the PSTs. Of the 32k addresses in there, some of them are completely inconsequential; password reset links, support queues, spam etc. But the vast majority are of consequence and the question of establishing context was solved once Wikileaks published the PSTs. They’re all now searchable which means that given a single email address that appears in HIBP against the Hacking Team breach, a Wikileaks search can establish the context.
This breach is a story that will continue to be analysed to death and have broad reaching ramifications in all sorts of different ways. But from an HIBP perspective, it’s just another source of data with another set of impacted individuals who can now assess their risk from this breach in just the same way as they can with all the other breaches in the system. It’s unusual for me to write about data I load into HIBP in this way, but this is one that really needed some context.