Introduction
Anna is sixteen weeks pregnant.
Anna uses Pineapple, a popular fertility and pregnancy tracking and social media app. In the app, Anna inputs information about her ovulation cycle, pregnancy symptoms, sleep patterns, eating habits, exercise, and moods. Anna also consumes Pineapple’s content on pregnancy and fetal development and engages with other Pineapple users in their forum, swapping questions on pregnancy and preparing for a new baby.
Pineapple collects, aggregates, and synthesizes data that Anna and other users share. This data includes not only the information that Anna inputs in the app but also data about the content she consumes and her interactions with other users. Pineapple does not sell this data directly—in fact, their privacy policy explicitly states that they will never sell or license individual user data. But Pineapple does sell insights about their user base as a whole to clients like advertisers, employment agencies, and consumer credit agencies. This is how Pineapple makes its money—the app is free to Anna. Pineapple’s clients then combine the data they receive from Pineapple with data from other companies to build out a more complete picture of the behavior of pregnant people. This could include data on TV viewing patterns from video streaming platforms, movement and sleep patterns from wearable fitness devices, or online purchasing behaviors.
Becca has never used Pineapple. But she does watch streaming services, owns a wearable fitness device, and shops online. And Becca’s behavioral patterns on these platforms have shifted in similar ways to Anna’s and other Pineapple users’ behaviors. Pineapple’s clients can, therefore, infer that Becca is also likely pregnant and treat her accordingly.
Why do companies care about Anna’s and Becca’s pregnancy status? Because early pregnancy data is incredibly valuable. Pregnancy signals that a consumer is about to undergo a significant change in their daily habits and their buying activities; the birth of a child is a time when someone’s buying habits, brand loyalties, and daily routines are in flux. Getting to such consumers early is a valuable opportunity to shape their future purchasing behaviors. Diaper companies can advertise to Becca or Anna before competitors and get them locked into their brand. Grocery stores, online subscription services, car manufacturers, and others can also reach out, offering deals favorable to new parents that entice them to switch entrenched behaviors and brand loyalties. Aggregate pregnancy data also provides an opportunity to understand the nature of consumer change more generally—how and why do consumption patterns change? When are such changes most robust, and why? How can you predict (and modify) those behavioral changes?
Data about Anna’s and Becca’s pregnancies is what this Article calls social data.
Social data refers jointly to two interrelated types of data about people. The first is data that directly materializes and stores traces of human activity.
This includes, for example, information on Anna’s or Becca’s TV viewing patterns, ovulation, or movement. This type of social data is directly collected from data subjects, like the data Pineapple collects and uses about Anna. The second is data that is used to apprehend, infer, or predict human activity.
For example, Pineapple collects data about Anna (and other users) to aggregate and analyze for insights about pregnant people as a group, which it sells to third parties. Those third parties may use this data in turn to gain insight about, and drive decisions regarding, Becca. Thus, data about Anna and her pregnancy is also data about Becca’s pregnancy, even though it was not directly collected from Becca.
Indeed, data that can be used to infer or predict human activity need not be collected from people at all—for example, data about weather can be used to predict and infer commuting behavior. In contrast with the more commonplace term “personal data,” “social data” nicely expresses the view (and a central focus of this Article) that data is socially useful and economically valuable—not only for what it can tell the world about any one person, like Anna, but also, and especially, for what it can tell the world about people.
The value of Anna’s and Becca’s social data is what this Article refers to as prediction value.
Prediction value is a particular form of use value that lies in social data’s capacity to infer or predict things about people—in this case, pregnancy status—and to act on that knowledge. For example a firm with access to Becca’s social data may send Becca a diaper coupon or free prenatal vitamins, a hospital where Anna will give birth may use it to inform labor and delivery staffing plans, or an employer’s hiring algorithm may flag Becca as a potentially risky and expensive hire and exclude her from a pool of prospective employees. Social data stores the value of being able to apprehend behavior, to infer, predict, and direct the future actions of people (who are not always the data subject), and to develop informed strategies to obtain some objective. It provides the valuable capacity to exert some measure of insight into and control over future behavior.
The capacity of social data to store insight into human behavior, guide predictions about that behavior, and optimize strategies to guide and change human behavior is (much of) what drives companies to collect the data they collect and use the data in the way that they do.
Social data cultivation is key to the business strategies of some of the wealthiest and most powerful companies operating today. Companies face generalized market pressure to engage in the accumulation and cultivation of social data and its prediction value to stay competitive.
Indeed, the widespread practice of treating social data as a key input to production is part of what it means to refer to contemporary capitalism as an informational capitalism.
Recent technological transformations, like improved chip processing power, ubiquitous connected devices like smartphones, and improvements in machine-learning techniques, have all contributed to the feasibility and utility of entities cultivating, refining, and extracting social data value.
These technological changes have allowed entities to exploit for economic gain what has long been true: People are social beings, deeply knowable and materially influenced by relations to one another. Thus, the stakes of understanding social data’s particular form of value, and the social and economic effects that its widespread cultivation produces, have grown more salient.
A primary way the digital economy works is by using prediction value to increase monetary value: to grow profits by raising revenue and by lowering costs, or to grow market share (and, the thinking goes, future profits) by expanding customer bases and entering new markets.
As Part II will survey in greater detail, companies deploy a variety of strategies to transform prediction value into exchange value—the priced, monetary value of a good, service, or company, typically expressed as a “market price.” Exchange value, as a general theory and form of value, posits that the value of a thing is the value derived from its exchange, expressed via price, on a real or imagined market.
So, prediction value can be—and is—transformed into exchange value. But it doesn’t have to be. Prediction value is distinct from (and not always neatly transformed into) exchange value. Part I provides greater detail defending the descriptive and analytic virtues of cataloging this distinction.
Prediction value confers on its holder the power to apprehend, shape, and thus exert some measure of control over people’s behavior. In fact, the central preoccupation of privacy scholars and many other observers of the digital economy is this potential for the control power of social data, cultivated for its conversion into priced value, to be repurposed toward other (potentially disempowering) ends.
These purposes can coexist with strategies to grow exchange value, such as the use of prediction value in labor settings to reduce operating costs by eroding workplace protections, or lie outside the commercial realm entirely, such as immigration officials repurposing location data cultivated for commercial ends to detect and detain suspected undocumented immigrants.
Indeed, much of privacy law’s traditional concern regarding privately cultivated surveillance capacities is how such capacities fall into the hands of state actors and empower state action without sufficient scrutiny.
Of course, there is also some amount of speculative behavior around prediction value, as when entities, in order to secure valuations of high exchange value, overclaim or overpromise on the prediction value their products can deliver—a phenomenon the computer scientists Sayash Kapoor and Arvind Narayanan refer to as “AI snake oil.”
But this, too, highlights the importance of disentangling assessments of social data value from priced exchange value—to better identify when claims of social data value (and its potential to transform into priced exchange value) are overblown.
As Aaron Shapiro notes in his excellent work on gig platforms, when it comes to understanding the way platforms capitalize on prediction value by turning it into market valuation, there is a considerable “gap between what platforms do and what they say they do.”
Clarifying the two modes of value production (and how they relate to each other) can help regulators and other observers traverse this gap and evaluate when claims are plausible and when they are not.
This Article argues for the importance of understanding how social data value is cultivated and used for regulating the digital economy. Part I provides greater detail on the concepts of social data and prediction value and argues for the distinctive value proposition of cultivating, accumulating, and using social data. It also provides theoretical context to distinguish the concept of exchange value—priced monetary value—from the concept of value more broadly and from prediction value as a particular kind of use value.
Part II offers a taxonomy of the business models and practices developed around cultivating and using social data value. This taxonomy divides the ways in which companies leverage prediction value to produce wealth and power for themselves and their investors into three scripts. The first script is direct and immediate conversion of social data value into exchange value through means such as direct sale of data, or through the premiums charged for targeted, as opposed to untargeted, advertising. The second script is indirect and often delayed conversion of prediction value into exchange value through improving and developing new products and services, lowering costs, increasing and stabilizing revenue, and expanding into new business lines and industries. The third script is leveraging prediction value to accrue power. This script catalogs how social data value can be a source of economic and political power, and thus of value to companies in their longer-term aims to secure market power and favorable regulatory environments. After cataloging and describing these three scripts, the Article explores some specific business practices associated with following these scripts, each of which focus on growth and expansion. These practices include offering free and low-cost services, creating ecosystems of products and services, and embarking on aggressive merger and acquisition strategies. The Article shows how these strategies differ from traditional ones in ways that carry both legal and normative significance.
In Part III, this Article explores how disambiguating prediction value and exchange value (conceptually and normatively) can illuminate why such a variety of existing legal regimes fail to properly manage the social and economic disruptions that have accompanied capitalism’s informational turn. In short, the same legal regimes that structure the transformation of prediction value into exchange value fail to grasp—in its entirety—the messy, imperfect, and socially disruptive process by which this transformation occurs. While the regulatory challenges of the digital economy increasingly place strain on various areas of law, most consider only small portions of this process and lack a systematic understanding of social data value production.
The Article identifies two contexts in which the legal regimes that structure this process index only part of its legally relevant features. The first context is legal regimes that have historically been tasked with governing value creation.
Such regimes are focused on evaluating and regulating companies’ claims of exchange value and thus only apprehend or index prediction value (indeed, they only consider such value normatively and legally relevant) at the point it is transformed into exchange value. As section III.B will show, this can miss many legally salient features of prediction value, such as how it is cultivated and the wider social effects that cultivation creates. This leaves such regimes poorly equipped to properly achieve their normative goals. The Article chronicles these struggles through the example of tax law.
The second context is legal regimes that have not historically understood themselves to be tasked with governing value creation but that are focused on the legal significance of informational power. Such regimes are attentive to the capacity of information about people to create power over them, but they regulate social data along a strict public–private divide. Through the example of privacy and data governance law, the Article shows the conceptual and programmatic challenges of this approach. Privacy and data governance law govern private data collection primarily via individual control and consent rights.
Privacy law traditionally apprehends or indexes social concerns regarding prediction value, and its capacity to coerce action and remake social relations, only if or when it falls into the hands of public actors. And while the near-exclusive focus on state surveillance in the field is shifting, both popular and doctrinal conceptions of socially coercive privacy harm remain primarily focused on public, rather than private, actors. This ignores many salient concerns regarding informational power that arise as social data is imbricated into the strategies of commercial actors and neglects the role of privacy and data governance law in facilitating this form of value creation. It also overlooks the potential social benefits of prediction value if cultivated in procedurally fair ways and put toward collectively determined ends.
This analysis has broad implications for other areas of the law. For example, other legal fields that, like tax law, have historically been tasked with governing value creation have legal frameworks developed around the concept of exchange value and are not achieving their normative goals when applied to prediction value. Antitrust and financial regulation are prominent examples here, as there is growing evidence to suggest these regimes are struggling to index the profit-seeking behavior of technology companies and thus achieve these legal regimes’ regulatory briefs in the digital economy.
This understanding will also be invaluable to legal fields that, like privacy and data protection law, have not historically been seen as regimes governing value creation and that, as a result, have not developed a positive agenda for regulating prediction value. First Amendment law is a prominent example.
The idea that information like social data confers power, and is thus a source of value with significant ontological, political-economic, and legal implications, is not new.
Others, particularly political economists of communication and historians of science, have long identified and analyzed the role of informationalism in contemporary capitalist value formation as it emerged and took on growing importance.
Previous work has established the centrality of social data as a vital, even paradigmatic, factor of production under informational capitalism.
Others have identified the importance of behavioral monitoring and prediction to the governance capacities and challenges of the digital economy.
Legal scholars have also explored the legal facilitations and fallouts of the informational turn.
This Article builds on that earlier work with two goals in mind. First, the Article’s primary goal is to provide a granular and reasonably systematic accounting of the various ways data is used (or can be used) by platforms and other firms to produce value (and power). It takes up this goal in Part II. The Article’s second goal is to provide a theoretical account of social data as a value form whose cultivation is a primary aim of digital firms—indeed, it is part of what marks the digital economy as “digital.” This theoretical contribution, laid out in Part I, is in service of the primary goal: to illuminate the distinctive value proposition of data and to help explain the conceptual and normative significance of social data as a value form. Taken together, Parts I and II describe the current structure of social data production and suggest why legal scholars and regulators have had trouble grasping the implications and effects of data production under the particular conditions of the contemporary digital economy. Part III explores these legal implications directly. In addition to its two substantive goals, the Article makes a modest methodological contribution to how legal scholarship engages with law’s constitution and regulation of production. Its theoretical account supplies a way of analyzing and evaluating productive activity that does not, at the conceptual level, presuppose market ordering of that activity. In doing so, the Article provides a model for similar analysis, when appropriate, for other kinds of productive activity.