I got a lot of requests after launching HIBP for an API and I saw some great ideas come up in terms of how it might be used for very constructive purposes. Truth be told, there was an API from day one insofar as this was precisely what the web UI was hitting every time you searched for an email address anyway, I just hadn’t published any docs on it or promoted its existence.
That said, I did give it a bit of tweaking to make it more “RESTful” (this, apparently, is what all APIs must be these days) and it works like this:
HTTP GET //haveibeenpwned.com/api/breachedaccount/{email}
You can hit it over HTTP or HTTPS if you’re so inclined as I’ve now dropped a valid cert onto it. Make a request for an invalid email address and it’ll give you an HTTP 400 otherwise it will go away, think for about 4ms then return you a response that’s either not pwned (HTTP 404) or pwned (array of pwned sites):
["Adobe","Gawker","Stratfor"]
There’s also CORS support so you can happily hit the API directly from within another web app on a different domain. It’s all documented on the HIBP site.
That is all.
There is no authentication.
There is no rate limiting.
There is no cost.
Those decisions may turn out to be insightful in that it means it’s exceptionally easy to use and doesn’t place any unnecessary barriers in front of people, or it may be naive and it’ll be abused no end in ways I haven’t even begun to consider. Or both. On the abuse side though, seriously, if you want a big pile of email addresses then go and download Adobe and the others, they’re dead easy to find and it’s a heap easier than enumerating through addresses one by one over HTTP in the hope of getting a hit.
I’ve made the API available because it was easy to do and I’ve made it freely available as it shouldn’t have any cost impact. The compute resources required are tiny and the egress data is measures in bytes – it’s a very efficient process even though it’s searching through 154M records.
Finally, on the structure of the API, I did toss up whether to implement in what is theoretically the more RESTful approach you above (the email address in the path implies a resource) as opposed to a more query-centric approach by passing a value such as email={email}. I asked the question on Twitter and saw vigorous debate arguing the merits of each approach. I’ve published the one described above, but it’s still accessible via query string as well (I haven’t changed the way the search feature on the website uses this). Do feel free to add your thoughts about this or other aspects in the comments below, I’m sure this is but the first phase of many enhancements to come.
I’ll ask one favour from those of you make good use of it – tell me about it. If you can share it publicly then leave a comment here, if you want to share it privately then send me an email. If you want to keep it to yourself, that’s also fine, I’d just like to know that the service has helped some people do something useful and that it’s being put to good use.
The API documentation is now online.
Enjoy!