What Really Caused Facebook’s 500M-User Data Leak?
Since Saturday, a massive trove of Facebook data has circulated publicly, splashing information from roughly 533 million Facebook users across the internet. The data includes things like profile names, Facebook ID numbers, email addresses, and phone numbers. It’s all the kind of information that may already have been leaked or scraped from some other source, but it’s yet another resource that links all that data together—and ties it to each victim—presenting tidy profiles to scammers, phishers, and spammers on a silver platter.
Facebook’s initial response was simply that the data was previously reported on in 2019 and that the company patched the underlying vulnerability in August of that year. Old news. But a closer look at where, exactly, this data comes from produces a much murkier picture. In fact, the data, which first appeared on the criminal dark web in 2019, came from a breach that Facebook did not disclose in any significant detail at the time and only fully acknowledged Tuesday evening in a blog post attributed to product management director Mike Clark.
One source of the confusion was that Facebook has had any number of breaches and exposures from which this data could have originated. Was it the 540 million records—including Facebook IDs, comments, likes, and reaction data—exposed by a third party and disclosed by the security firm UpGuard in April 2019? Or was it the 419 million Facebook user records, including hundreds of millions of phone numbers, names, and Facebook IDs, scraped from the social network by bad actors before a 2018 Facebook policy change, that were exposed publicly and reported by TechCrunch in September 2019? Did it have something to do with the Cambridge Analytica third-party data sharing scandal of 2018? Or was this somehow related to the massive 2018 Facebook data breach that compromised access tokens and virtually all personal data from about 30 million users?
In fact, the answer appears to be none of the above. As Facebook eventually explained in background comments to WIRED and in its Tuesday blog, the recently public trove of 533 million records is an entirely different data set that attackers created by abusing a flaw in a Facebook address book contacts import feature. Facebook says it patched the vulnerability in August 2019, but it’s unclear how many times the bug was exploited before then. The information from more than 500 million Facebook users in more than 106 countries contains Facebook IDs, phone numbers, and other information about early Facebook users like Mark Zuckerburg and US secretary of Transportation Pete Buttigieg, as well as the European Union commissioner for data protection, Didier Reynders. Other victims include 61 people who list the “Federal Trade Commission” and 651 people who list “Attorney General” in their details on Facebook.
You can check whether your phone number or email address were exposed in the leak by checking the breach tracking site HaveIBeenPwned. For the service, founder Troy Hunt reconciled and ingested two different versions of the data set that have been floating around.
“When there’s a vacuum of information from the organization that’s implicated, everyone speculates, and there’s confusion,” Hunt says.
The closest Facebook came to acknowledging the source of this breach previously was a comment in a fall 2019 news article. That September, Forbes reported on a related vulnerability in Instagram’s mechanism to import contacts. The Instagram bug exposed users’ names, phone numbers, Instagram handles, and account ID numbers. At the time, Facebook told the researcher who disclosed the flaw that the Facebook security team was “already aware of the issue due to an internal finding.” A spokesperson told Forbes at the time, “We have changed the contact importer on Instagram to help prevent potential abuse. We are grateful to the researcher who raised this issue.” Forbes noted in the September 2019 story that there was no evidence the vulnerability had been exploited, but also no evidence that it had not been.