Data scraping—the automated collection of data on the internet—is used in a variety of contexts. On the commercial side, scraping might be used as a means of competition—such as scraping by one company to retrieve information on prices for services provided by a competitor. On the noncommercial side, scraping could be used as a research tool—such as scraping by a news outlet to investigate Amazon’s pricing algorithm. Despite the varied applications of data scraping, courts’ varying inter­pretations of the Computer Fraud and Abuse Act (CFAA) can impose both civil and criminal liability for scraping. This Note argues that there are competing First Amendment interests both in favor of and against scraping, depending on the type of scraping conducted. Because the CFAA does not distinguish between various types of scraping and bal­ance these competing interests, a legislative solution is needed to comport with both First Amendment interests of accountability and political self-governance on one end, and privacy on the other end.

The full text of this note may be found by clicking the PDF link to the left.


In the wake of the news that Russia had meddled in the 2016 presiden­tial election, 1 See Office of the Dir. of Nat’l Intelligence, ICA 2017-01D, Assessing Russian Activities and Intentions in Recent US Elections, at ii (2017), documents/ICA_2017_01.pdf [] (“We assess Russian President Vladimir Putin ordered an influence campaign in 2016 aimed at the US presidential election.”). researcher Jonathan Albright decided to explore the reach of the Russian disinformation campaign on Facebook. 2 See Craig Timberg, Russian Propaganda May Have Been Shared Hundreds of Millions of Times, New Research Says, Wash. Post (Oct. 5, 2017), news/the-switch/wp/2017/10/05/russian-propaganda-may-have-been-shared-hundreds-of-millions-of-times-new-research-says/ (on file with the Columbia Law Review). In early October, around the time Albright was conducting his research on Russian disinfor­mation-campaign networks, Facebook claimed that politically divisive ad­ver­tisements purchased by Russian operatives had reached ten million users in the months before and after the 2016 election. 3 David Ingram, Facebook Says 10 Million U.S. Users Saw Russia-Linked Ads, Reuters (Oct. 2, 2017), []. But Facebook’s ten million figure accounted only for paid advertisements; it failed to in­clude free accounts created by the Russians that influenced Facebook’s massive user base. 4 See Timberg, supra note 2. Albright theorized that the true reach of Russian med­dling encompassed “all the activity of the Russian controlled accounts—each post, each ‘like,’ each comment”—in addition to the paid advertise­ments. 5 Id. His hypothesis proved true; the actual extent of Russia’s influence may have been “well into the billions of ‘shares’ on Facebook.” 6 Id.; see also Nicholas Confessore & Daisuke Wakabayashi, How Russia Harvested American Rage to Reshape U.S. Politics, N.Y. Times (Oct. 9, 2017), https://www.nytimes. com/2017/10/09/technology/russia-election-facebook-ads-rage.html (on file with the Columbia Law Review) (citing Albright’s research to describe how Russia’s social media campaign used both paid advertisements and regular, nonpaid posting features of Facebook to influence voters). Following journalists’ reports about how many Americans were exposed to Russian propaganda, Facebook revised its number to 150 million people affected. 7 Spencer Ackerman, Facebook Now Says Russian Disinfo Reached 150 Million Americans, Daily Beast (Nov. 11, 2017), []. Albright’s groundbreaking research would not have been possible without the internet-research technique known as “scraping”—that is, the auto­mated “retrieval of content posted on the World Wide Web through the use of a program other than a web browser or an application programming interface.” 8 Andrew Sellars, Twenty Years of Web Scraping and the Computer Fraud and Abuse Act, 24 B.U. J. Sci. & Tech. L. 372, 373 (2018). An application programming interface is a set of “requirements that govern how one application can talk to another” and enable the movement of information from one program to another. Brian Proffitt, What APIs Are and Why They’re Important, ReadWrite (Sept. 19, 2013), api-defined/ []. Albright used the Facebook-owned analytics tool CrowdTangle to automatically download the five hundred most recent posts for each of the Russian campaign accounts and analyze their reach. 9 Timberg, supra note 2. For the results of Albright’s research conducted via scraping mechanisms, see Jonathan Albright (d1gi), Itemized Posts and Historical Engagement—6 Now-Closed FB Pages, Tableau Public, profile/d1gi#!/vizhome/FB4/TotalReachbyPage (on file with the Columbia Law Review) (last updated Oct. 5, 2017).

Albright’s exposé of the true extent of the Russian misinformation campaign on Facebook is consistent with the First Amendment values of democratic self-governance, democratic legitimation, truth, and auton­omy. 10 See Jeremy K. Kessler & David E. Pozen, The Search for an Egalitarian First Amendment, 118 Colum. L. Rev. 1953, 1978–79 (2018) (“[J]udges and scholars have produced a vast body of writing that seeks to justify, critique, and shape First Amendment doctrine in light of foundational principles and aspirations—above all, the pursuit of truth, the promotion of individual autonomy, and the facilitation of democratic self-government.”). For example, legal scholar Alexander Meiklejohn argued that self-government justifies free speech because  speech  and  the  resulting  ex­change  of  information produce informed  voters. 11 See Alexander Meiklejohn, Free Speech and Its Relation to Self-Government 3–8, 22–27 (1948); Alexander Meiklejohn, The First Amendment Is an Absolute, 1961 Sup. Ct. Rev. 245, 255–57, 263 [hereinafter Meiklejohn, First Amendment Absolute] (“‘[T]he people need free speech’ because they have decided, in adopting, maintaining and interpreting their Constitution, to govern themselves rather than to be governed by others.” (quoting Harry Kalven, Jr., Metaphysics of the Law of Obscenity, 1960 Sup. Ct. Rev. 1, 16)). Subsequent legal scholarship has drawn on Meiklejohn’s theory of information as a key component of a functioning democracy. See, e.g., Thomas L. Emerson, Legal Foundations of the Right to Know, 1976 Wash. U. L.Q. 1, 2 (“[T]he right to know . . . is a significant method for seeking the truth, or at least for seeking the better answer.”). Thus, asserting access to information that affects democracy and elections is “paramount to self-governance.” 12 D. Victoria Baranetsky, Data Journalism and the Law, Tow Ctr. for Dig. Journalism (Sept. 19, 2018), [] (describing the ways in which journalists’ access to critical data is being restricted in an environment oversaturated with information). Similarly, First Amendment scholar Seana Shiffrin posited an autonomy-based theory of free speech: Autonomous thinkers should have access to information in order to be able to think freely and reflect. 13 See Seana Valentine Shiffrin, A Thinker-Based Approach to Freedom of Speech, 27 Const. Comment. 283, 289–92 (2011). This need for access to information becomes only more pronounced in today’s ever-expanding digital society as big data, technology companies, and social media platforms are transforming public discourse and influ­encing democracy in ways that are often obscured from the public eye. 14 See, e.g., Bernard Marr, Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance 43–44 (2015) (ebook) (“There is little doubt that [b]ig [d]ata is changing the world. It is already completely transforming the way we live, find love, cure cancer, conduct science, improve performance, run cities and countries and operate business.”); Alex Abdo, Facebook Is Shaping Public Discourse. We Need to Understand How, Guardian (Sept. 15, 2018), commentisfree/2018/sep/15/facebook-twitter-social-media-public-discourse [ B2JH-HYF3] (“Facebook’s alluring user interface obscures an array of ever-changing algo­rithms that determine which information you see and the order in which you see it. The algorithms are opaque—even to Facebook—because they rely on a form of computation called ‘machine learning’, in which the algorithms train themselves . . . .”).

Albright’s research demonstrates a key tension between journalists and the technology companies they seek to investigate: While there is a First Amendment interest in scraping, these techniques can also subject researchers and journalists to legal liability. Following Albright’s research, Facebook fixed the glitch in the CrowdTangle tool that had allowed him to access the data for his research. 15 Natasha Bertrand, Facebook Scrubbed Potentially Damning Russia Data Before Researchers Could Analyze It Further, Bus. Insider (Oct. 12, 2017), https://www. []. A spokesperson stated that the scrap­ing mechanisms Albright used were “an unintended way to access infor­mation about deleted content.” 16 Id. (internal quotation marks omitted). Although Facebook did not take legal action against Albright for his scraping activity, they potentially could have under the Computer Fraud and Abuse Act (CFAA). 17 18 U.S.C. § 1030(a)(2)(C) (2012) (prohibiting access to “information from any protected computer”). The CFAA is a cyber­security statute aimed at penalizing misuse of private information and damage to computers. 18 Congress’s main goal in enacting the CFAA as the first federal cybercrime law was to combat “so-called ‘hackers’ who have been able to access (trespass into) both private and public computer systems.” H.R. Rep. No. 98-894, at 10 (1984); see also Mark A. Lemley, Place and Cyberspace, 91 Calif. L. Rev. 521, 528 (2003) (noting that the CFAA “was designed to punish malicious hackers”). The CFAA was inspired by the 1983 film WarGames after President Ronald Reagan screened the film at Camp David and instructed his advisors to look into securing government computers from hacking. See Gabe Rottman, Knight Institute’s Facebook ‘Safe Harbor’ Proposal Showcases Need for Comprehensive CFAA Reform, Reporters Comm. for Freedom of the Press (Aug. 6, 2018), knight-institutes-facebook-safe-harbor-proposal-showcases-need-compr/ [ E83Y-B389] (“Taken by the film, . . . [President Reagan] interrupted a meeting with his joint chiefs to ask if the scenario was at all realistic. His advisors looked into it . . . and recommended immediate action . . . .”); see also H.R. Rep. No. 98-894, at 10 (noting that WarGames showed “a realistic representation of the automatic dialing and access capabilities of the personal computer” (quoting Counterfeit Access Device and Computer Fraud and Abuse Act: Hearing on H.R. 3181, H.R. 3570, and H.R. 5112 Before the Subcomm. on Crime of the H. Comm. on the Judiciary, 98th Cong. 185 (1984) (statement of Peter Waal, Vice President, Marketing, GTE-Telenet))). The law has since been expanded to apply to any computer connected to the internet. See Rottman, supra. Courts have interpreted the CFAA broadly to cover an array of activity beyond strictly hacking. See infra section I.B. Many websites contain provisions in their terms of service that effectively prohibit web scraping, 19 See e.g., User Agreement, LinkedIn, [] (last updated May 8, 2018) (“You agree that you will not . . . [d]evelop, support or use software, devices, scripts, robots, or any other means or processes . . . to scrape the Services or otherwise copy profiles and other data . . . .”). and some courts have in­terpreted the CFAA to penalize such violations of a website’s terms of service. 20 See infra section I.B (discussing the circuit split on “exceeds authorized access”).

The CFAA may be overbroad and overinclusive, but it does capture problematic uses of data; while researchers and journalists may be harness­ing data in a way that arguably promotes accountability and democratic participation, others have harnessed data for different motivations not af­fected with a First Amendment interest. For example, Facebook recently faced scrutiny over the data firm Cambridge Analytica’s harvesting of users’ personal information to target users with personalized political advertisements ahead of the 2016 presidential election. 21 Carole Cadwalladr & Emma Graham-Harrison, Revealed: 50 Million Facebook Profiles Harvested for Cambridge Analytica in Major Data Breach, Guardian (Mar. 17, 2018), [] (describing how the data analytics firm linked to former Trump advisor Steve Bannon “compiled user data to target American voters”). Therefore, the CFAA can both serve as a recourse for improper uses of data—as with the Cambridge Analytica scandal—and present a substantial obstacle to re­searchers and journalists seeking to perform research that serves the public interest, as was Albright. Thus, the CFAA crudely lumps together different forms of scraping that have different motivations and implica­tions for social values.

This Note assesses First Amendment interests with regard to scraping and argues that there are competing First Amendment interests both in favor of and against scraping. While scraping that serves the public interest merits First Amendment protection, commercially oriented scraping can threaten First Amendment values of intellectual privacy and should not receive the same protections. Part I provides background on data-scraping techniques, taxonomizes the various applications of data scraping, and outlines how courts’ varying interpretations of the CFAA can impose liability for data scraping. Next, Part II argues that there are competing First Amendment and privacy interests in data-scraping activity as impli­cated by the CFAA. Finally, Part III proposes that because the CFAA does not appropriately balance these competing interests, a legislative solution is needed so that the law comports with both First Amendment interests of accountability and political self-governance on one end, and privacy on the other end.