Introduction
In the wake of the news that Russia had meddled in the 2016 presidential election,
researcher Jonathan Albright decided to explore the reach of the Russian disinformation campaign on Facebook.
In early October, around the time Albright was conducting his research on Russian disinformation-campaign networks, Facebook claimed that politically divisive advertisements purchased by Russian operatives had reached ten million users in the months before and after the 2016 election.
But Facebook’s ten million figure accounted only for paid advertisements; it failed to include free accounts created by the Russians that influenced Facebook’s massive user base.
Albright theorized that the true reach of Russian meddling encompassed “all the activity of the Russian controlled accounts—each post, each ‘like,’ each comment”—in addition to the paid advertisements.
His hypothesis proved true; the actual extent of Russia’s influence may have been “well into the billions of ‘shares’ on Facebook.”
Following journalists’ reports about how many Americans were exposed to Russian propaganda, Facebook revised its number to 150 million people affected.
Albright’s groundbreaking research would not have been possible without the internet-research technique known as “scraping”—that is, the automated “retrieval of content posted on the World Wide Web through the use of a program other than a web browser or an application programming interface.”
Albright used the Facebook-owned analytics tool CrowdTangle to automatically download the five hundred most recent posts for each of the Russian campaign accounts and analyze their reach.
Albright’s exposé of the true extent of the Russian misinformation campaign on Facebook is consistent with the First Amendment values of democratic self-governance, democratic legitimation, truth, and autonomy.
For example, legal scholar Alexander Meiklejohn argued that self-government justifies free speech because speech and the resulting exchange of information produce informed voters.
Thus, asserting access to information that affects democracy and elections is “paramount to self-governance.”
Similarly, First Amendment scholar Seana Shiffrin posited an autonomy-based theory of free speech: Autonomous thinkers should have access to information in order to be able to think freely and reflect.
This need for access to information becomes only more pronounced in today’s ever-expanding digital society as big data, technology companies, and social media platforms are transforming public discourse and influencing democracy in ways that are often obscured from the public eye.
Albright’s research demonstrates a key tension between journalists and the technology companies they seek to investigate: While there is a First Amendment interest in scraping, these techniques can also subject researchers and journalists to legal liability. Following Albright’s research, Facebook fixed the glitch in the CrowdTangle tool that had allowed him to access the data for his research.
A spokesperson stated that the scraping mechanisms Albright used were “an unintended way to access information about deleted content.”
Although Facebook did not take legal action against Albright for his scraping activity, they potentially could have under the Computer Fraud and Abuse Act (CFAA).
The CFAA is a cybersecurity statute aimed at penalizing misuse of private information and damage to computers.
Many websites contain provisions in their terms of service that effectively prohibit web scraping,
and some courts have interpreted the CFAA to penalize such violations of a website’s terms of service.
The CFAA may be overbroad and overinclusive, but it does capture problematic uses of data; while researchers and journalists may be harnessing data in a way that arguably promotes accountability and democratic participation, others have harnessed data for different motivations not affected with a First Amendment interest. For example, Facebook recently faced scrutiny over the data firm Cambridge Analytica’s harvesting of users’ personal information to target users with personalized political advertisements ahead of the 2016 presidential election.
Therefore, the CFAA can both serve as a recourse for improper uses of data—as with the Cambridge Analytica scandal—and present a substantial obstacle to researchers and journalists seeking to perform research that serves the public interest, as was Albright. Thus, the CFAA crudely lumps together different forms of scraping that have different motivations and implications for social values.
This Note assesses First Amendment interests with regard to scraping and argues that there are competing First Amendment interests both in favor of and against scraping. While scraping that serves the public interest merits First Amendment protection, commercially oriented scraping can threaten First Amendment values of intellectual privacy and should not receive the same protections. Part I provides background on data-scraping techniques, taxonomizes the various applications of data scraping, and outlines how courts’ varying interpretations of the CFAA can impose liability for data scraping. Next, Part II argues that there are competing First Amendment and privacy interests in data-scraping activity as implicated by the CFAA. Finally, Part III proposes that because the CFAA does not appropriately balance these competing interests, a legislative solution is needed so that the law comports with both First Amendment interests of accountability and political self-governance on one end, and privacy on the other end.