GENDER DATA IN THE AUTOMATED ADMINISTRATIVE STATE

GENDER DATA IN THE AUTOMATED ADMINISTRATIVE STATE

In myriad areas of public life—from voting to professional licensure—the state collects, shares, and uses sex and gender data in complex algorithmic systems that mete out benefits, verify identity, and secure spaces. But in doing so, the state often erases transgender, nonbinary, and gender-nonconforming individuals, subjecting them to the harms of exclusion. These harms are not simply features of technology design, as others have ably written. This erasure and discrimination are the products of law.

This Article demonstrates how the law, both on the books and on the ground, mandates, incentivizes, and fosters a particular kind of automated administrative state that binarizes gender data and harms gender-nonconforming individuals as a result. It traces the law’s critical role in creating pathways for binary gender data, from legal mandates to official forms, through their sharing via intergovernmental agreements, and finally to their use in automated systems procured by agencies and legitimized by procedural privacy law compliance. At each point, the law mandates and fosters automated governance that prioritizes efficiency rather than inclusivity, thereby erasing gender-diverse populations and causing dignitary, expressive, and practical harms.

In making this argument, the Article challenges the conventional account in the legal literature of automated governance as devoid of discretion, as reliant on technical expertise, and as the result of law stepping out of the way. It concludes with principles for reforming the state’s approach to sex and gender data from the ground up, focusing on privacy law principles of necessity, inclusivity, and antisubordination.

The full text of this Article can be found by clicking the PDF link to the left.

Introduction

Sasha Costanza-Chock triggered the alarm when they walked through the full-body scanner at the Detroit Metro Airport. 1 Sasha Costanza-Chock, Design Justice, A.I., and Escape From the Matrix of Domination, J. Design & Sci. ( July 16, 2018), https://doi.org/10.21428/96c8d426 [https://perma.cc/E2M3-WGW5] [hereinafter Costanza-Chock, Design Justice]; see also About, Sasha Costanza-Chock, Ph.D., https://www.schock.cc/?page_id=13 [https://perma.cc/​JEQ3-JELT] (last visited Aug. 21, 2023). They knew it would happen because it happens to transgender, nonbinary, and gender-nonconforming people all the time. 2 See, e.g., Deema B. Abini, Traveling Transgender: How Airport Screening Procedures Threaten the Right to Informational Privacy, 87 S. Cal. L. Rev. Postscript 120, 135 (2014); Paisley Currah & Tara Mulqueen, Securitizing Gender: Identity, Biometrics, and Transgender Bodies at the Airport, 78 Soc. Rsch. 557, 562–66 (2011); Dawn Ennis, Her Tweets Tell One Trans Woman’s TSA Horror Story, Advocate (Sept. 22, 2015), https://www.advocate.com/transgender/2015/9/22/one-trans-womans-tsa-horror-story [https://perma.cc/5FZS-6NKV]. For detailed definitions of “transgender,” “nonbinary,” “gender-nonconforming,” and related terms, please see Jessica A. Clarke, They, Them, and Theirs, 132 Harv. L. Rev. 894, 897–99 (2019); Glossary of Terms: LGBTQ, GLAAD, https://www.glaad.org/reference/terms [https://perma.cc/7BHP-6Y2T] (last visited Aug. 21, 2023). In brief, transgender individuals are those whose sense of self or expression of their gender differs from their assigned sex at birth. Nonbinary individuals are those whose identities cannot be restricted to just “male” or “female.” “Gender-nonconforming” is an umbrella term that can include nonbinary individuals, but it is used in this Article to refer to those who are genderqueer (those who challenge norms concerning sex, gender, and sexuality), genderfluid (those whose gender expressions or identities may change over time), or agender (those who do not adopt a traditional gender category and may describe their gender as the lack of one). The machine deemed Sasha “risky” because their body, datafied into machine-readable code, differed from the pictures of bodies that trained the machine’s algorithm. 3 Costanza-Chock, Design Justice, supra note 1. Their breasts were too pronounced relative to data associated with “male,” and their groin area deviated from data associated with “female.” 4 Id. Pulled out of the line for a physical body search, Sasha found themself in an awkward, humiliating, and potentially dangerous situation.

Toby P., a transgender man living in Colorado, was singled out by a different kind of automated administrative technology. 5 Toby’s name has been changed to protect his anonymity as he and his lawyers determine how to proceed with a potential claim against the state. After Toby sustained a debilitating injury at work, his employer completed the required workers’ compensation First Report of Injury Form by checking the box next to “Female,” a designation that matched Toby’s assigned sex at birth and the information in his human resources file. 6 Telephone Interview with Toby P. (May 22, 2022) (notes on file with the Columbia Law Review); Colo. Dep’t of Lab., WC 1, Employer’s First Report of Injury (2006), https://codwc.app.box.com/v/wc1-first-report-injury (on file with the Columbia Law Review). The state’s automated fraud-detection system, which compares this claim form with information pooled from state databases, denied Toby’s claim. The “system,” Toby told me, “saw ‘female’ here and ‘male’ [everywhere else] . . . and figured something didn’t match.” 7 Telephone Interview with Toby P., supra note 6. Seven months, twenty-five phone calls, sixteen refiled forms, and two demand letters later, Toby is still hurt and still without the compensation to which he is entitled. He is “basically bankrupt.” 8 Id.

Sasha and Toby fell through the cracks of the automated administrative state. 9 This Article uses the phrase “automated decisionmaking system” or “algorithmic decisionmaking system” to refer to the overall process in which a computational mechanism uses data inputs to make probabilistic, predictive conclusions or implements policy by software. See Ryan Calo, Artificial Intelligence Policy: A Primer and Roadmap, 51 U.C. Davis L. Rev. 399, 404–05 (2017) (noting that there is no one “consensus definition of artificial intelligence” but clarifying ways of understanding what scholars and industry mean by AI). This simplification is intentional: The Article focuses on the law’s responsibility for trends in automation rather than the technical distinctions between different types of automated technologies. See AI Now Inst., Confronting Black Boxes: A Shadow Report of the New York City Automated Decision System Task Force 7 (Rashida Richardson ed., 2019), https://ainowinstitute.org/publication/confronting-black-boxes-a-shadow-report-of-the-new-york-city-automated [https://perma.cc/2K5X-GB3A] (defining algorithmic or automated decisionmaking systems as “data-driven technologies used to automate human-centered procedures, practices, or policies for the purpose of predicting, identifying, surveilling, detecting, and targeting individuals or communities”). As government agencies turn to algorithms and artificial intelligence (AI) to administer benefits programs, detect fraud, and secure spaces, transgender, nonbinary, and gender-nonconforming individuals are put in situations where they can’t win. They become “anomalies” or “deviants” in systems designed for efficiency. 10 See Toby Beauchamp, Going Stealth: Transgender Politics and U.S. Surveillance Practices 35–37 (2019); Sonia K. Katyal & Jessica Y. Jung, The Gender Panopticon: AI, Gender, and Design Justice, 68 UCLA L. Rev. 692, 710–11 (2021) (explaining that identity detection as a form of biometric surveillance treats some individuals as “anomalies” or outliers when they do not conform to gender binaries).

Technologies “have politics.” 11 Langdon Winner, Do Artifacts Have Politics?, Dædalus, Winter 1980, at 121, 121 (explaining that technology embodies forms of power and authority). Just like race and gender hierarchies can be embedded into technological systems, 12 There is a vast literature in this space. See, e.g., Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (2018) (explaining how digital decisions made through systemic algorithms reinforce oppressive social relationships); Sarah Myers West, Meredith Whittaker & Kate Crawford, Discriminating Systems: Gender, Race, and Power in AI 8–9 (2019), https://ainowinstitute.org/wp-content/uploads/2023/​04/discriminatingsystems.pdf [https://perma.cc/A4YD-UPPG] (outlining research findings that the AI sector has a lack of diversity among its professionals, which has led to discriminatory outcomes); Solon Barocas & Andrew D. Selbst, Big Data’s Disparate Impact, 104 Calif. L. Rev. 671, 674–77 (2016) [hereinafter Barocas & Selbst, Big Data’s Disparate Impact] (outlining various reports that have suggested “big data” has unintended discriminatory effects); Joy Buolamwini & Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, 81 Proc. Mach. Learning Rsch. 1, 10–11 (2018), https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf [https://perma.cc/Q5VD-EF9F] (detailing how machine-learning technology can produce disastrous results in high-stakes circumstances, specifically when used in criminal matters); Pauline T. Kim, Data-Driven Discrimination at Work, 58 Wm. & Mary L. Rev. 857, 874–90 (2017) (describing how “training data,” or data used to inform machines running algorithms, are often unknowingly infected with bias, creating discriminatory results that are especially harmful in the workplace). In a recent article, Professor Sonia Katyal and healthcare industry lawyer Jessica Jung focus almost entirely on the gender and racial biases of algorithmic technologies used by private, for-profit companies. Katyal & Jung, supra note 10. This Article adds to this literature with a different narrative, focusing on government uses of automated technology and the mostly underappreciated laws that are responsible for collecting and entrenching binary gender in government systems. in this case it is cisnormativity—the assumption that everyone’s gender identity and presentation accord with their assigned sex at birth—that is designed into the automated systems that singled out Sasha and Toby. The underlying data that train machines to recognize males and females, the algorithms that identify anomalies in a person’s body relative to that database, the forms inconsistently designed to collect sex and gender data in the first place, and the systems’ restriction to only male/female options all reflect assumptions of gender as binary. Anyone who deviates from a normative, binary body is “risky” and singled out, potentially exposing them to harm. Those gender-nonconforming individuals who are also religious minorities, immigrants, people of color, or people with disabilities, and people who hold more than one minoritized identity, are multiply burdened. 13 See, e.g., Patricia Hill Collins, Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment 221–38 (1990) (describing how minoritized populations experience oppression and domination on multiple levels); Kimberlé Crenshaw, Mapping the Margins: Intersectionality, Identity Politics, and Violence Against Women of Color, 43 Stan. L. Rev. 1241, 1250–52 (1991) (outlining how all intersections of race and gender affect the social construct of identity).

But this Article is not simply about the biases replicated and entrenched by AI and algorithmic technologies, a story deftly told by others and summarized in Part I. Nor is it just about gender as a tool of classification, a story as old as the nation. 14 See Gérard Noiriel, The Identification of the Citizen: The Birth of Republican Civil Status in France, in Documenting Individual Identity 28, 30–42 ( Jane Caplan & John Torpey eds., 2001). This is a story about law. Specifically, this Article argues that the law has mandated, influenced, and guided the state to automate in a way that binarizes gender data, thereby erasing and harming transgender, nonbinary, and gender-nonconforming individuals.

The law’s active role in the creation of this kind of automated state has been overlooked because the two dominant strands in legal scholarship on algorithmic technologies are focused elsewhere. One of those strands sees automation and its harms flourishing in a regulatory void. Scholarship in this vein rightly argues that automated systems used by private, for-profit technology companies cause harm because “the law has offered insufficient protection.” 15 See Katyal & Jung, supra note 10, at 704 (“[G]ender panopticism has been facilitated by absences within privacy law, in that the law has offered insufficient protection to gender self-determination and informational privacy.”); see also id. at 723, 760–61 (outlining forms of biometric surveillance technology that render nonbinary individuals outliers). Other scholars suggest that algorithmic technologies are built amidst “lawlessness,” or the lack of regulation. 16 Shoshana Zuboff, The Age of Surveillance Capitalism 127–28 (2019). But see, e.g., Julie Cohen, Between Truth and Power 3 (2019) [hereinafter Cohen, Between Truth and Power] (arguing that informational capitalism itself is a construct of opportunistic economic actors using law to control the means of informational production); Amy Kapczynski, The Law of Informational Capitalism, 129 Yale L.J. 1460, 1465 (2020) (reviewing both texts); see also Bridget Fahey, Data Federalism, 135 Harv. L. Rev. 1007, 1013–14, 1036–39 (2022) [hereinafter Fahey, Data Federalism] (highlighting the “absence” of “major federal legislation” as one reason for rampant, unregulated data sharing among state agencies but noting the role of interagency agreements and other more informal legal instruments).

A second important strand of law and technology scholarship focuses on how law can address automation’s harms. This research explores how the technologies work, where they go wrong, and how we might use law to regulate them, fix them, and restore the status quo ex ante by holding technologies and those that use them accountable for discrimination, bias, and harm. 17 E.g., Dillon Reisman, Jason Schultz, Kate Crawford & Meredith Whittaker, Algorithmic Impact Assessments: A Practical Framework for Public Agency Accountability (2018), https://openresearch.amsterdam/image/2018/6/12/aiareport2018.pdf [https: ​//perma.cc/Y3YY-BSTG]; Barocas & Selbst, Big Data’s Disparate Impact, supra note 12; Danielle Keats Citron & Frank Pasquale, The Scored Society: Due Process for Automated Predictions, 89 Wash. L. Rev. 1 (2014); Danielle Keats Citron, Technological Due Process, 85 Wash. U. L. Rev. 1249 (2008) [hereinafter Citron, Technological Due Process]; Ignacio N. Cofone, Algorithmic Discrimination Is an Information Problem, 70 Hastings L.J. 1389 (2019); Kate Crawford & Jason Schultz, Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms, 55 B.C. L. Rev. 93 (2014); A. Michael Froomkin, Ian Kerr & Joelle Pineau, When AIs Outperform Doctors: Confronting the Challenges of a Tort-Induced Over-Reliance on Machine Learning, 61 Ariz. L. Rev. 33 (2019); James Grimmelmann & Daniel Westreich, Incomprehensible Discrimination, 7 Calif. L. Rev. Online 164 (2017), https://lawcat.berkeley.edu/record/1128018/files/GrimmelmannWestreich.final_.pdf [https://perma.cc/7QMW-AEDQ]; Meg Leta Jones, The Right to a Human in the Loop: Political Constructions of Computer Automation and Personhood, 47 Soc. Stud. Sci. 216 (2017); Margot E. Kaminski, Binary Governance: Lessons From the GDPR’s Approach to Algorithmic Accountability, 92 S. Cal. L. Rev. 1529 (2019); Sonia K. Katyal, Private Accountability in the Age of Artificial Intelligence, 66 UCLA L. Rev. 54 (2019) [hereinafter Katyal, Private Accountability]; W. Nicholson Price II, Regulating Black-Box Medicine, 116 Mich. L. Rev. 421 (2017); Andrew D. Selbst & Solon Barocas, The Intuitive Appeal of Explainable Machines, 87 Fordham L. Rev. 1085 (2018); Alicia Solow-Niederman, Administering Artificial Intelligence, 93 S. Cal. L. Rev. 633 (2020). Few scholars have focused on how the law creates the automated administrative state, 18 But see Cohen, Between Truth and Power, supra note 16, at 48–74 (exploring the ways law, actively leveraged by interested economic actors, has created a “zone of legal privilege” around the activities of data-driven technologies); Alicia Solow-Niederman, YooJung Choi & Guy Van den Broeck, The Institutional Life of Algorithmic Risk Assessment, 34 Berkeley Tech. L.J. 705, 705–08 (2019) (arguing that risk assessment statutes create frameworks that constrain and empower policymakers and technical actors when it comes to the design and implementation of a particular instrument). and fewer still have focused on how the law constructs gender data in the automated state. 19 Of course, there has been scholarship on gender as a tool of administrative governance. See, e.g., Dean Spade, Normal Life: Administrative Violence, Critical Trans Politics, & The Limits of Law 73–93 (2015) [hereinafter Spade, Normal Life]. But this scholarship has not extended to consider the effects of algorithms and automation in the administrative state. This Article fills that gap: Sasha’s and Toby’s stories are actively and indelibly framed, constructed, and sustained by law every step of the way.

The process begins at the source, where statutes mandate the collection of sex and gender data. As Part II describes, the law of gender data collection relies on assumptions of static gender, taps into uninformed perceptions of the gender binary as “common sense,” and creates the conditions for civil servants to design forms with primarily binary gender questions. This creates binary gender data streams. Part III shows how interstate compacts and interagency contracts, all of which I collected from public records requests, require states to share datasets that include sex and gender. The law of gender data sharing looks outward and inward to privilege the gender binary: It has expressive effects that normalize the gender binary, conflationary effects that confuse the social aspects of gender with the biological aspects of sex, and interoperability effects that force the gender binary onto any agency that wants to realize the benefits of participating in shared data systems. Part IV demonstrates how automation mandates, agency policymaking by procurement, trade secrecy law, and privacy and data protection law actively encourage automation to improve efficiencies while preventing anyone from interrogating the underlying assumptions of the algorithms that use sex and gender data. This web of legal rules guides automation to exclude those outside the norm and erects barriers around automated tools that protect the gender binary from change. 20 See Cohen, Between Truth and Power, supra note 16, at 49 (referring to how the law creates “zone[s] of legal privilege” around information-driven business models). In other words, the law forces an oversimplified legibility on its subjects, leaving those most marginalized at risk. 21 For how governments force this legibility on their subjects, see generally James C. Scott, Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed (1998) [hereinafter Scott, Seeing Like a State] (“[T]he legibility of a society provides the capacity for large-scale social engineering, high-modernist ideology provides the desire, the authoritarian state provides the determination to act on that desire, and an incapacitated civil society provides the leveled social terrain on which to build.”).

This rich account of how law collects, shares, and uses sex and gender data in state-run automated systems offers several insights about automation and the automated state in general that challenge or add nuance to the conventional wisdom in the legal literature. Part V discusses four of those lessons.

The automated state is discretionary. 22 See infra section V.A. Scholars have argued that automation erodes traditional agency discretion, a pillar of the administrative state. 23 See, e.g., Ryan Calo & Danielle Keats Citron, The Automated Administrative State: A Crisis of Legitimacy, 70 Emory L.J. 797, 804 (2021). But this Article shows that civil servants have discretion to guide automation in ways that binarize gender data. The discretion may be buried, but its fingerprints are everywhere—in the design of data-collection forms, in the terms of data-sharing agreements, in the procurement of technologies, and in the design and completion of privacy impact assessments (PIAs). 24 Impact assessments in the law and technology space document development rationales for new technologies and are supposed to keep certain values like privacy and fairness front of mind for those developing and using the technologies. See Andrew D. Selbst, An Institutional View of Algorithmic Impact Assessments, 35 Harv. J.L. & Tech. 117, 122 (2021). But see Ari Ezra Waldman, Industry Unbound 132–33 (2021) [hereinafter Waldman, Industry Unbound] (describing how impact assessments can be reduced to mere checkbox compliance). Relatedly, the automated state is also driven by stereotypes. 25 See infra section V.B. Rather than merely shifting expertise from civil servants hired for their substantive knowledge to engineers with technological knowledge about how algorithms work, the automated state relies on both civil servants’ and engineers’ supposedly commonsense perceptions of sex and gender. 26 See infra section V.B. Because most people have traditionally presumed that sex and gender are the same and static, automated systems designed by engineers and used by the government reflect those stereotypes.

The automated state is also managerial. 27 See infra section V.C. Far from a product of the law stepping out of the way, the state’s use of algorithmic decisionmaking processes represents the synthesis of the logics (and pathologies) of data-driven governance, risk assessment, public–private partnerships, and procedural compliance, leveraging the power of law and the state to achieve efficiency goals. By orienting algorithmic tools toward the neoliberal goal of targeted governance through risk assessments that are supposed to cover most people most of the time, the law singles out those outside the norm for disproportionate harm. Finally, and again, relatedly, the automated state is structurally subordinating. 28 See infra section V.D. Law infuses the government’s data ecosystem with sex and gender information in a way that is both over- and underinclusive: It is overinclusive because it collects sex and gender data too often when not necessary; it is underinclusive because its reliance on the gender binary excludes transgender, nonbinary, and gender-nonconforming individuals from any of the benefits that could come from data’s capacity to create insight.

This kind of automated state harms gender-diverse populations. But the reification of the gender binary in the automated state is not a niche concern; it harms anyone constrained by strict gender expectations. 29 Feminist scholars have long argued that discrimination on the basis of gender nonconformity should be redressable. See, e.g., Mary Anne C. Case, Disaggregating Gender From Sex and Sexual Orientation: The Effeminate Man in the Law and Feminist Jurisprudence, 105 Yale L.J. 1, 2–4 (1995); Katherine M. Franke, The Central Mistake of Sex Discrimination Law: The Disaggregation of Sex From Gender, 144 U. Pa. L. Rev. 1, 3–5 (1995); Vicki Schultz, Reconceptualizing Sexual Harassment, 107 Yale L.J. 1683, 1774–88 (1998). Plus, those most dependent on government resources and thereby subject to the state’s informational demands will bear the greatest burdens of the state’s automated use of binary gender data streams. 30 Cf. Khiara M. Bridges, The Poverty of Privacy Rights 9 (2017) [hereinafter Bridges, Poverty] (“[P]oor mothers have traded [their privacy] for a welfare benefit.”). This poses a particular problem for members of the LGBTQ+ community, approximately one million of whom are on Medicaid. 31 See Kerith J. Conron & Shoshana Goldberg, Over Half a Million LGBT Adults Face Uncertainty About Health Insurance Coverage Due to HHS Guidance on Medicaid Requirements 1 (2018), https://williamsinstitute.law.ucla.edu/wp-content/uploads/LGBT-Medicaid-Coverage-US-Jan-2018.pdf [https://perma.cc/H7Q3-JS7X]. Nearly half of LGBT people of color live in low-income households. 32 Bianca D.M. Wilson, Lauren Bouton & Christy Mallory, Racial Differences Among LGBT Adults in the US 2 (2022), https://williamsinstitute.law.ucla.edu/wp-content/uploads/LGBT-Race-Comparison-Jan-2022.pdf [https://perma.cc/3RYL-4XK7]. Transgender people are nearly two and a half times more likely than non-transgender people to face food insecurity. 33 Kerith J. Conron & Kathryn K. O’Neill, Food Insufficiency Among Transgender Adults During the COVID-19 Pandemic 5 (2022), https://williamsinstitute.law.ucla.edu​/wp-content/uploads/Trans-Food-Insufficiency-Update-Apr-2022.pdf [https://perma.cc/​G5HE-RSYV]. LGBT people have higher rates of unemployment than the general population. 34 Richard J. Martino, Kristen D. Krause, Marybec Griffin, Caleb LoSchiavo, Camilla Comer-Carruthers & Perry N. Halkitis, Employment Loss as a Result of COVID-19: A Nationwide Survey at the Onset of COVID-19 in US LGBTQ+ Populations, 19 Sexuality Rsch. & Soc. Pol’y 1855, 1860 (2022).

For some scholars and advocates, the solution to these problems is for the state to stop collecting sex and gender data. 35 See, e.g., Lila Braunschweig, Abolishing Gender Registration: A Feminist Defence, 1 Int’l J. Gender Sexuality & L. 76, 86 (2020); Davina Cooper & Flora Renz, If the State Decertified Gender, What Might Happen to Its Meaning and Value?, 43 J.L. & Soc’y 483, 484 (2016); Ido Katri, Transitions in Sex Reclassification Law, 70 UCLA L. Rev. 636, 641 (2023); Anna James (AJ) Neuman Wipfler, Identity Crisis: The Limitations of Expanding Government Recognition of Gender Identity and the Possibility of Genderless Identity Documents, 39 Harv. J.L. & Gender 491, 543 (2016). But as various scholars have shown, legibility comes with benefits as well as risks. 36 See Clarke, supra note 2, at 990 (noting the contextual need for the state to recognize gender diversity in some circumstances); Dean Spade, Documenting Gender, 59 Hastings L.J. 731, 814–15 (2008) [hereinafter Spade, Documenting Gender] (suggesting that the state should continue to collect gender data in the public health context). In the context of racial data, see, e.g., Melissa Nobles, Shades of Citizenship: Race and the Census in Modern Politics, at xi (2000) (arguing that racial data and racial enumeration by censuses advance concepts of race); Clara E. Rodríguez, Changing Race: Latinos, the Census, and the History of Ethnicity in the United States, at xiii (2000) (discussing the need for governmental race data to address past discrimination as balanced against the effect race data have on reification and racial identity); Cassius Adair, Licensing Citizenship: Anti-Blackness, Identification Documents, and Transgender Studies, 71 Am. Q. 569, 570 (2019) (discussing race markers on identification documents in American history and the movement to abolish their use); Nancy Leong, Judicial Erasure of Mixed-Race Discrimination, 59 Am. U. L. Rev. 469, 491–92 (2010) (describing activism in support of adding a multiracial category to the census); Naomi Mezey, Erasure and Recognition: The Census, Race and the National Imagination, 97 Nw. U. L. Rev. 1701, 1713–22 (2003) (evaluating the paradoxical nature of racial classification in the census given the tension between the government’s power to recognize and its power to discipline); Nathaniel Persily, Color by Numbers: Race, Redistricting, and the 2000 Census, 85 Minn. L. Rev. 899, 903 (2001) (discussing the importance of census racial data accuracy for minority electoral representation); Naomi Zack, American Mixed Race: The U.S. 2000 Census and Related Issues, 17 Harv. BlackLetter L.J. 33, 35–37 (2001) (discussing the importance of the introduction of mixed-race identification in the 2000 Census but also identifying continuing problems with governmental classification). I don’t know whether there is a way to get it right, to find the “Goldilocks Zone” for gender, data, and power, especially given the state’s historic commitment to queer oppression and the historical aims of what James C. Scott might call top-down legibility. 37 See Scott, Seeing Like a State, supra note 21, at 65–73. On the state’s orientation toward queer oppression, see generally George Chauncey, Gay New York: Gender, Urban Culture, and the Making of the Gay Male World, 1890–1940 (1994); Jonathan Ned Katz, The Invention of Heterosexuality (2007). On legibility, see Scott, Seeing Like a State, supra note 21, at 65–73. But I would like to try. This Article offers a way to navigate the legibility dilemmas triggered by state gender data collection.

The Article’s lessons about the automated state—its persistent reliance on civil servant discretion, its use of stereotypes and perceptions of common sense, its orientation toward efficiency, and its subordinating capacities—suggest that scholars and advocates ignore the liminal space between the law on the books and the law on the ground to our peril. 38 This is known as “gap studies” in the sociolegal literature, and this Article is situated in that intellectual tradition. See Jon B. Gould & Scott Barclay, Mind the Gap: The Place of Gap Studies in Sociolegal Scholarship, 8 Ann. Rev. L. & Soc. Sci. 323, 324 (2012). For sure, we can pass new laws that guarantee an “X” gender marker option; we can also litigate in court when state gender designations discriminate against those outside the gender binary. But “new categories are not enough.” 39 Laurel Westbrook & Aliya Saperstein, New Categories Are Not Enough: Rethinking the Measurement of Sex and Gender in Social Surveys, 29 Gender & Soc’y 534, 535–36 (2015). Nor will a statute “deprogram” a gender binary so embedded in our culture and in the technologies of private and state surveillance. 40 See Rena Bivens, The Gender Binary Will Not Be Deprogrammed: Ten Years of Coding Gender on Facebook, 19 New Media & Soc’y 880, 895 (2017). To protect transgender, nonbinary, and gender-nonconforming individuals from automation-based harms on a more systematic level, we can also develop the state’s “gender competence.” 41 Kevin Guyan, Queer Data: Using Gender, Sex and Sexuality Data for Action 155 (2022). That is, in addition to changing the law on the books, scholars and advocates can also help change how civil servants understand gender data and its value, limits, and powers.

These are the goals of Part VI, which wrestles with the live and pressing questions of the proper role of the state: Should the state ever collect and use gender data? If not, why? If so, how can the state do so in a way that serves the interests of gender-diverse populations rather than its own disciplinary interests? Resolving these questions is beyond the scope of this Article, but in a world in which the state does collect and use gender data, its role should be particularly narrow. Part VI offers three principles, familiar to privacy scholars, for building a future in which government uses of gender data and algorithmic technology foster rather than erode antisubordination goals. A necessity principle urges the state to ask whether it actually needs sex or gender data to achieve its goals and, if it does, to determine which one it needs. An antisubordination principle would limit sex and gender data collection to only those uses that benefit and support greater inclusion of gender-diverse populations. And an inclusivity principle would ensure that once the state decides to collect sex or gender data for emancipatory ends, it does so sensitively and in a contextually inclusive way.

Luckily, privacy law principles of data minimization—that one should only collect as much personal data as is necessary to achieve a stated purpose—and antisubordination—that law should disrupt traditional hierarchies of power enjoyed by data collectors—are capable of doing just that. 42 Scott Skinner-Thompson, Privacy at the Margins 6 (2021) (noting that an antisubordination agenda requires consciousness of classifications and using them to “level up” those disadvantaged by traditional hierarchies of power); Spiros Simitis, Reviewing Privacy in an Information Society, 135 U. Pa. L. Rev. 707, 740 (1987) (“Personal information should only be processed for unequivocally specified purposes. Both government and private institutions should abstain from collecting and retrieving data merely for possible future uses for still unknown purposes.”). Part VI concludes with this Article’s ultimate recommendation: The law on the books and the law on the ground should take gender diversity into account. The state should be able to collect, share, and use sex and gender data only when necessary to support a gender-inclusive antisubordination agenda: to combat discrimination, to provide adequate healthcare, to guarantee benefits that have been traditionally denied, and to enable self-determination for gender-diverse populations.

To date, the law’s role in creating an automated state that binarizes gender data has been mostly hidden from view. It is a puzzle of statutes, rules, interstate compacts, intergovernmental cooperation, procurement, street-level bureaucracy, and managerial policymaking, all of which is summarized in Table 1. This Article pieces that puzzle together. It relies on a mix of primary source materials, including a computationally derived novel dataset of more than 12,000 government forms scraped from state agency websites, documents obtained through public record requests, and first-person interviews with lawyers and government officials.

Table 1. Law and the Binarization of Gender Data, Summary

Law of Data Collection (examples) 43 See infra Part II. Data binarized by . . .
Statutes requiring sex/gender data collection (e.g., security, identity verification, distribution of benefits).

Information primarily gathered through forms created by street-level bureaucrats.

Mediation by the state, which creates the data.

Perceptions of “common sense” about sex/gender, which govern form design.

Path dependencies, which ensure that forms remain the same over time.

Assumption that gender is a static/secure identifier, which implies gender binary only.

Law of Data Sharing 44 See infra Part III. Data binarized by . . .
Data sharing required to realize security and efficiency benefits.

Data sharing permitted at discretion of state agency leadership.

Interagency agreements.

Interstate compacts.

Normalization of the binary by dissemination.

Conflation of sex and gender.

Interoperability, which requires all data look to the same.

Law of Data Use 45 See infra Part IV. Data binarized by . . .
Automation mandates.

Efficiency mandates.

Innovation, chief innovation offices.

Procurement.

Trade secrecy.

Privacy law compliance
(privacy impact assessments).

Efficiency mandates, which mean binary design.
Managerialization via innovation offices, which ensures narrow cost–benefit analysis.
 
No interrogation of design via procurement process.
 
Symbolic compliance, which weaponizes PIAs to serve automation rather than privacy.