Introduction
The representation of women in law enforcement agencies nationwide has improved since Congress enacted the Civil Rights Act of 1964, in a few cases drastically,
but the progress has been uneven and marred by episodes of intentional discrimination.
Changes in the way law enforcement organizations screen applicants for physical ability have been a big part of the story of increased representation in law enforcement,
but even newer physical-ability tests (PATs) have faced substantial scrutiny in the courts under Title VII’s disparate impact prohibition.
To avoid incurring disparate impact liability, some law enforcement agencies use gender-normed PATs.
Gender-normed tests use different cutoff scores for male and female applicants, such that men and women would be expected to pass at equal rates. In other words, a unitary standard would apply the same cutoff to all applicants—say, twenty push-ups for all—but a gender-normed standard would apply different raw cutoffs to men and women—say, fourteen push-ups for women and thirty for men.
Of the law enforcement agencies administered by the Department of Justice (DOJ), at least four use gender-normed physical-ability or strength cutoffs for employment.
At times, courts considering disparate impact claims have blessed gender-norming as a permissible way to retain physical selection devices while adhering to the mandates of antidiscrimination law.
An October 2016 report issued jointly by the DOJ and the Equal Employment Opportunity Commission (EEOC) cited the practice approvingly as a means for law enforcement agencies to mitigate the disparate impact of PATs.
In Bauer v. Lynch,
the Fourth Circuit became the first court of appeals to directly consider the permissibility of gender-norming PATs under Title VII.
The court concluded that the gender-norming did not itself constitute a form of discrimination. But to do so, it applied the so-called unequal-burdens test, a much-maligned doctrine that had been applied in only one area of Title VII jurisprudence: appearance and grooming standards.
The doctrine as previously understood is at best an uncomfortable fit with the facts of the Bauer case.
Further, Title VII’s plain text explicitly prohibits adjusting cutoff scores on the basis of race, sex, color, or national origin, seemingly at odds with the practice of gender-norming these tests.
The Bauer decision occasions a reexamination of this practice. Part I of this Note surveys the relevant legal backdrop, beginning with the Title VII disparate impact framework and challenges to PATs under that theory. Part II explores whether, as a descriptive matter, gender-normed PATs are permitted under Title VII. The Fourth Circuit’s Bauer opinion provides the jumping-off point for this discussion. This Part concludes that the Bauer court got it wrong and that Title VII does not permit gender-norming absent a valid business justification. Part III provides a normative defense of the doctrinal conclusion reached in Part II. It argues that courts should reject the Fourth Circuit’s reasoning and instead apply a traditional Title VII disparate treatment analysis to gender-normed PATs, requiring the norming, as a distinction based on sex, to be justified as a bona fide occupational qualification. Such an approach would, perhaps counterintuitively, better promote gender justice in the workplace.
I. The Disparate Impact of Physical-Ability Tests
Title VII proscribes two basic forms of discrimination: disparate treatment—decisions and policies that intentionally discriminate on the basis of a protected characteristic—and disparate impact—neutral policies that have discriminatory effects.
An employer seeking to use a PAT must ensure the test avoids both pitfalls or else provide a business justification. It is the latter theory of discrimination that has proven the more troublesome hurdle for PATs. Section I.A therefore outlines Title VII’s disparate impact protections, and section I.B reviews how courts have applied this theory to PATs. In sum, Part I describes the legal framework that provides the impetus for employers to adopt the kinds of practices challenged in Parts II and III.
A. Disparate Impact Challenges: The Basic Framework
Almost all Title VII challenges to PATs have been disparate impact claims. Most of these cases don’t involve class-normed tests but rather unitary requirements—single cutoffs that apply, without adjustment, to all test takers.
The main concern of this Note is gender-normed testing that, by definition, should not have a disparate impact on women.
Nonetheless, understanding the relative success of disparate impact challenges to PATs is crucial to understanding both why employers adopt gender-normed tests in the first place and the alternatives available to them.
The Supreme Court first endorsed the availability of a disparate impact cause of action under Title VII in its landmark decision Griggs v. Duke Power Co.
The employment practice at issue in Griggs was a facially neutral education requirement that operated to exclude the vast majority of black workers from desirable promotions and roles.
The Court, acknowledging that the practice may not have been intentionally discriminatory, famously pronounced that “good intent or absence of discriminatory intent does not redeem employment procedures or testing mechanisms that operate as ‘built-in headwinds’ for minority groups and are unrelated to measuring job capability.”
The Court concluded: “Nothing in the Act precludes the use of testing or measuring procedures; obviously they are useful. What Congress has forbidden is giving these devices and mechanisms controlling force unless they are demonstrably a reasonable measure of job performance.”
Thus, facially neutral practices—even those not motivated by discriminatory intent or animus—may nonetheless contravene Title VII if, in effect, they work to arbitrarily exclude protected classes at disproportionate rates.
Congress subsequently codified the disparate impact framework in the Civil Rights Act of 1991.
To make out a prima facie case of disparate impact, a plaintiff must show an employer uses a “particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin.”
Often, disparate impact plaintiffs challenging selection devices make out a prima facie case by satisfying the four-fifths rule;
that is, by showing with statistical evidence that the plaintiff’s protected class has a pass rate that is at most four-fifths that of the favored class.
Upon such a showing, the burden of persuasion shifts to the employer to show “that the challenged practice is job related for the position in question and consistent with business necessity.”
If the employer does so, the plaintiff can prevail only by showing there exists an alternative practice “that has less disparate impact and serves the employer’s legitimate needs.”
The only Supreme Court case to analyze a sex discrimination claim under the disparate impact framework is Dothard v. Rawlinson.
In Dothard, the female plaintiff challenged the Alabama Board of Corrections’s height and weight requirements for correctional officer positions.
The combination of the height and weight requirements would have excluded 41.13% of the female population of the United States; it would have excluded less than 1% of the male population.
Based on the dramatic statistical difference, the Court found that the plaintiff had made out a prima facie case of disparate impact.
The defendants responded that height and weight requirements “have a relationship to strength, a sufficient but unspecified amount of which is essential to effective job performance as a correctional counselor.”
Therefore, the defendants argued, the requirements were sufficiently job related and consistent with business necessity to rebut the prima facie case.
The Court rejected that argument as well, noting that the defendants “produced no evidence correlating the height and weight requirements with the requisite amount of strength thought essential to good job performance.”
The Court also observed that “[i]f the job-related quality that the appellants identify is bona fide, their purpose could be achieved by adopting and validating a test for applicants that measures strength directly.”
This, the Court said, “would fully satisfy the standards of Title VII because it would be one that ‘measure[d] the person for the job and not the person in the abstract.’”
Thus, the Court’s determination appeared to turn on three deficiencies: first, a lack of evidence tying the selection device (height and weight) to the desired quality (some unspecified amount of strength); second, a lack of evidence showing that the unspecified amount of strength was job related with respect to the corrections officer position; and third, skepticism that the nexus between strength and the selection device was sufficiently close, as reflected in the Court’s suggestion that the defendant adopt a test that “measure[d] strength directly.” This final point could be read as doubt either that the height and weight requirements could possibly be validated as a legitimate way to measure strength or that even if they were, a future plaintiff would nonetheless be able to show that there exist less discriminatory alternative practices that adequately measure strength.
B. Disparate Impact Challenges to Physical-Ability Tests
Litigants
have had relative success in the lower courts challenging PATs under Title VII’s disparate impact prohibition.
Because statistical evidence supporting a prima facie case is usually forthcoming in these cases, the outcome often turns on the application of the job-relatedness–business-necessity defense.
Crucially, neither the Supreme Court nor Congress has clarified precisely what is required to show that a hiring device is “job related and consistent with business necessity,”
and the lower courts employ a wide variety of formulations,
sometimes depending not only on jurisdiction but also on the type of job in question.
Possible interpretations are bracketed at one end by a very deferential standard, sometimes described as a “manifest relationship,” requiring only that the employer could rationally conclude that the test effectively measured attributes that were important to job success.
Bracketing the other end of the spectrum is what has been called a “minimum qualifications” standard,
which requires that the employment test represent an actual floor necessary for successful performance of the job in question.
Some courts have even suggested a bifurcated approach, in which positions that implicate “safety concerns” are scrutinized less closely than those that do not.
The EEOC’s Uniform Guidelines take a middle road, requiring that practices be “reasonable and consistent with normal expectations of acceptable proficiency.”
The upshot is that courts considering physical-ability requirements apply a panoply of standards to the typically dispositive prong of the disparate impact analysis.
Unlike the disparate impact cases, which challenge the effect of a requirement as discriminatory, cases in the Bauer mold challenge the practice of norming itself as a form of discrimination.
Yet the disparate impact cases form the doctrinal backdrop against which employers turn to gender-normed tests like that challenged in Bauer, because gender-norming a PAT provides employers a potential solution to the threat of Title VII disparate impact liability by equalizing pass rates. Some of the courts addressing disparate impact claims have affirmatively suggested gender-norming as a permissible alternative to unitary standards. For example, in Lanning v. Southeastern Pennsylvania Transportation Authority,
the Third Circuit suggested in dicta that the Southeastern Pennsylvania Transportation Authority (SEPTA) “institute a non-discriminatory . . . [aerobic] test that would exclude 80% of men as well as 80% of women through separate aerobic capacity cutoffs for the different sexes.”
This, the court asserted, would “help SEPTA achieve its stated goal of increasing aerobic capacity without running afoul of Title VII.”
Indeed, a test that equalizes pass rates does not run afoul of Title VII’s disparate impact prohibition.
But whether it runs afoul of Title VII’s disparate treatment prohibition is another question entirely, to which Part II now turns.
II. Gender-Normed Physical-Ability Tests Under Title VII
Employers wishing to use PATs yet wary of potential disparate impact liability sometimes use gender-normed tests. Unlike unitary standards, which apply a single cutoff across the board, gender-normed tests set different raw cutoff scores for male and female applicants. They thereby circumvent the disparate impact problem described in Part I because male and female applicants pass at roughly equal rates. Most law enforcement organizations in the United States use PATs;
one study estimated that of those, just under a third are gender-normed.
This Part assesses the legality of gender-norming under Title VII, considering the Fourth Circuit’s recent decision in Bauer v. Lynch,
the first court of appeals case to address the issue. Section II.A lays out the factual background of Bauer, illustrating how and why the FBI adopted a gender-normed test. Section II.B provides a brief overview of three threads of Title VII disparate treatment doctrine considered by the Bauer court: the Supreme Court’s decision in City of Los Angeles Department of Water & Power v. Manhart,
the unequal-burdens doctrine, and the statute’s prohibition on adjusting test scores. Section II.C then critiques the Bauer court’s reasoning and concludes that Title VII does not, in fact, permit gender-normed PATs absent a valid business justification, contrary to the Fourth Circuit’s conclusion. Finally, section II.D explains why the defense first articulated by the Supreme Court in Ricci v. Destefano
cannot excuse an employer’s use of a gender-normed test.
A. Bauer v. Lynch: Factual Background
Jay J. Bauer was a recent graduate from a Northwestern University master’s program when the United States was attacked on September 11, 2001.
Deeply moved by those events, he applied to join the FBI’s Special Agent program.
The Special Agent program is one of several possible career tracks within the FBI’s “Operations and Intelligence” branch; the others include “Intelligence Analysts,” “Surveillance,” “Forensic Accounting,” and “Foreign Languages.”
The diverse duties of a Special Agent can include: “work[ing] on matters including terrorism, foreign counterintelligence, cyber-crime, organized crime, white collar crime, public corruption, civil rights violations, financial crime, bribery, bank robbery, extortion, kidnapping, air piracy, interstate criminal activity, fugitive and drug trafficking matters, and other violations of federal statutes.”
Admission to the Special Agent program also opens the door to an array of even more specialized professional opportunities. Special Agents may apply to join selective, elite “mission-centric” units, like the Hostage Rescue Team, SWAT, Special Agent Bomb Tech Program, and the Operational Medic Program.
The FBI rejected Bauer’s initial application in 2001, finding his prior work experience lacking.
Bauer returned to school, received a Ph.D., and began work in academia.
In 2008, he reapplied to the FBI; this time, the FBI expressed interest in Bauer’s candidacy, and he began the arduous applicant-screening process.
Bauer excelled during the screening process, which includes several written examinations and oral interviews designed to measure characteristics like communication skill, cognitive capacity, judgment, integrity, and problem-solving ability.
The last step in the application process is the Physical Fitness Test (PFT).
The PFT consists of four events: sit-ups, a 300-meter sprint, push-ups, and a 1.5-mile run.
Based on an internal study, the FBI gender-normed the minimum benchmarks for each of the events to “account for . . . innate physiological differences.”
For example, the PFT required male applicants to complete thirty push-ups but required female applicants to complete only fourteen push-ups.
None of these requirements emulated a particular task required of Special Agents.
Rather, the FBI believed that an applicant’s test results reflected his or her “physical fitness level” and that physical-fitness level was either generally necessary to job performance or, alternatively, a strong indicator that the applicant could complete Special Agent training without injury.
Bauer initially failed the push-up portion of the PFT but passed on his second attempt.
Accordingly, the FBI admitted him to its Special Agent training program at Quantico. Trainees at Quantico must pass each of the twenty-two-week program’s four components.
The components are “academics; firearms training; practical applications and skills; and defensive tactics and physical fitness.”
Trainees must then repass the same PFT administered at the screening stage.
Bauer excelled in training in every area except one: the push-up portion of the PFT.
His peers even selected him president of his class and spokesperson for graduation, yet he was simply unable to complete the thirty push-ups in his five attempts, despite having once passed the test at the screening stage.
On his final try, Bauer fell just one push-up short.
As a result, the FBI told Bauer he had three options: resign and leave open the possibility of future employment with the FBI, resign permanently, or be fired.
Bauer filed suit, alleging that the PFT violated Title VII.
B. The Disparate Treatment Challenge
Unlike the PAT challenges described in Part I,
Bauer’s challenge did not attack the FBI’s test because it disproportionately impacted a protected class. Instead, Bauer argued the test contravened Title VII on two other bases: first, that gender-norming the PFT constituted impermissible disparate treatment on the basis of sex;
and second, that gender-norming the PFT violated Title VII’s prohibition on the use of different cutoff scores.
1. The Manhart Simple Test. — The basic framework of a disparate treatment challenge under Title VII to a facially discriminatory practice is straightforward.
The statute states, in relevant part, that it is “an unlawful employment practice for an employer . . . to discriminate against any individual . . . because of such individual’s . . . sex.”
Thus, in a disparate treatment challenge, a plaintiff must first show that a decision or policy was made “because of” sex.
Once the plaintiff has done so, a defendant may still prevail if sex is “a bona fide occupational qualification reasonably necessary to the normal operation of that particular business or enterprise.”
Because facially discriminatory policies almost always of their own force suffice to show a decision “because of sex,”
cases challenging facially discriminatory policies typically turn on the application of the bona fide occupational qualification (BFOQ) defense.
The Supreme Court’s decision in City of Los Angeles Department of Water & Power v. Manhart
elaborating this general framework is especially relevant to Bauer. In Manhart, the Court considered whether the pension-contribution policy of the City of Los Angeles, which required higher contributions from women than from men, constituted disparate treatment.
The city reasoned that women as a class tend to live longer than their male counterparts, and thus, female retirees would on average earn more income from the pension fund.
The Court, accepting as fact that longevity is an empirically proven difference between the sexes, nonetheless rejected the city’s policy as impermissible sex discrimination under Title VII.
It said, in relevant part:
The statute’s focus on the individual is unambiguous. It precludes treatment of individuals as simply components of a racial, religious, sexual, or national class. If height is required for a job, a tall woman may not be refused employment merely because, on the average, women are too short. Even a true generalization about the class is an insufficient reason for disqualifying an individual to whom the generalization does not apply.
Thus, the Manhart Court appeared to definitively foreclose reliance on classwide generalizations—even those supported by reliable statistical evidence—as a legitimate basis upon which to distinguish between the sexes, unless that distinction could be justified as a BFOQ. In doing so, it announced a “simple test” for disparate treatment: “whether the evidence shows ‘treatment of a person in a manner which but for that person’s sex would be different.’”
2. An Exception to the Simple Test: The Unequal-Burdens Doctrine. — Generally, Manhart’s “simple test” applies in challenges to facially discriminatory policies, and cases turn on the application of the BFOQ.
One exception to this general principle is the “unequal-burdens doctrine.” Since at least the early 1970s, plaintiffs have sought to use Title VII to protect against sex-differentiated appearance and grooming standards.
And from the early years of these challenges until the Supreme Court’s Price Waterhouse v. Hopkins
decision, lower courts responded almost entirely in one voice: Sex-differentiated appearance standards that apply “equal burdens” are not a form of sex discrimination under Title VII, and employers need not justify them as BFOQs.
Under this line of reasoning, sex-differentiated appearance standards impose permissible “equal” burdens if the standards are similarly costly, consistent with community norms, and not based on stereotypical notions of differences between the sexes.
The second and third of these requirements are in some tension. Thus, after the Supreme Court’s decision in Price Waterhouse, which recognized sex stereotyping as actionable under Title VII,
some courts and commentators expressed doubt about the continued vitality of the unequal-burdens doctrine. In particular, commentators argued that Price Waterhouse’s prohibition on sex stereotyping could not tolerate sex-differentiated grooming standards because these standards typically prescribe conformity with socially constructed, sex-differentiated norms.
Some courts, including most prominently the Ninth Circuit in Jespersen v. Harrah’s Operating Co., held that Price Waterhouse did not affect the unequal-burdens line and that a plaintiff challenging a sex-differentiated grooming standard must show either an impermissible sex stereotype (as distinguished from, apparently, a permissible or de minimis sex stereotype) or unequal burdens in the form of different costs of compliance.
In both form and substance, the unequal-burdens doctrine is anomalous. In broad strokes, the structure of Title VII is that a plaintiff may raise the presumption of impermissibility either by showing a formal classification or intentional discrimination (disparate treatment) or by pointing to the discriminatory effects of an otherwise neutral policy (disparate impact).
The unequal-burdens doctrine “turn[s] Title VII on its head” by requiring a plaintiff to show not only formal or intentional discrimination but also discriminatory effects.
The Manhart Court repudiated this structural inversion, albeit in a different factual context, when it rejected the City of Los Angeles’s argument that its facially discriminatory policy had created no discriminatory effect.
Further,
it is substantively incompatible with Title VII’s prohibition on sex stereotyping, as embodied in Price Waterhouse,
because the doctrine perpetuates and fortifies sex stereotypes by allowing sex-differentiated appearance standards only when the distinctions reflect “generally accepted community standards of dress and appearance.”
Moreover, by measuring “burdens” only by monetary costs, courts applying the unequal-burdens doctrine fail to account for the other weighty interests at stake in grooming and appearance cases.
Lower courts justified this apparently atextual exception to Title VII’s prohibition on disparate treatment by interpreting “sex” as embracing only immutable characteristics.
This line of reasoning distinguished appearance and grooming standards, such as uniforms, makeup, and hairstyling, as mutable and therefore outside Title VII’s purview. While some judges and commentators found this justification less persuasive than others,
all courts had at least agreed in one respect: If the unequal-burdens doctrine does apply at all, it applies to only appearance and grooming standards—that is, until the Fourth Circuit’s decision in Bauer.
3. Title VII’s Score-Norming Provision. — The final relevant strain of applicable law concerns a less frequently litigated provision of Title VII
that prohibits employers from “adjust[ing] the scores of, us[ing] different cutoff scores for, or otherwise alter[ing] the results of, employment related tests on the basis of race, color, religion, sex, or national origin.”
While the legislators who enacted this provision were principally concerned with race-norming,
the text proscribes gender-norming as well.
In fact, after the adoption of the Civil Rights Act of 1991, which added this provision to Title VII, the Cooper Institute, the preeminent developer of PATs in the United States, wrote to law enforcement agencies to state its understanding that the amendment proscribed gender-norming.
The Department of Justice then repudiated this interpretation.
Courts analyzing gender-normed PATs generally failed to explain at any length why the practice was doctrinally permissible under the post-1991 framework, other than to assert—in seeming conflict with the holding of Manhart
—that distinctions on the basis of “undeniable” physical differences between the sexes were permitted under Title VII.
C. Assessing the Fourth Circuit’s Reasoning
The district court in Bauer denied the government’s motion for summary judgment and found that the different cutoff scores for men and women constituted disparate treatment.
It applied the Manhart “simple test”: “[D]iscrimination appears ‘where the evidence shows treatment of a person in a manner which but for that person’s sex would be different.’”
The Fourth Circuit then reversed the district court’s decision. It held that gender-norming the PFT did not constitute impermissible disparate treatment or violate Title VII’s prohibition on different cutoff scores and therefore did not need to be justified as a BFOQ,
so long as the different raw scores represented the same gender-normed fitness level.
It then remanded the case to the district court to determine whether the test did in fact impose equal burdens on each class.
Noting that the appeal involved a “relatively novel issue,” the Fourth Circuit began by setting out what it saw as the “pertinent legal authorities.”
First, it described Manhart’s simple test, relied on by the district court.
Then, the court turned to what it described as an alternative to Manhart’s simple test: the “no greater burden” test, derived from the unequal-burdens line of grooming and appearance cases.
The court reasoned that the district court was wrong to apply Manhart’s simple test because “[m]en and women simply are not physiologically the same for the purposes of physical fitness programs.”
This conclusion, the court asserted, was bolstered by the Supreme Court’s dicta in United States v. Virginia (VMI ), which recognized in the context of an equal protection challenge that the Virginia Military Institute might adjust fitness standards to accommodate newly admitted female cadets.
The Bauer court’s reliance on the unequal-burdens doctrine is remarkable. The historic limitation of the doctrine to grooming and appearance standards is essential to its “mutability” justification and has been strictly observed.
Thus, courts considering challenges to sex-differentiated weight requirements have at times expressed hesitation applying the doctrine to a context that is arguably on the outermost limits of “mutability.”
And while courts have occasionally found differential weight requirements acceptable under an equal-burdens logic, they have done so when the requirement reflected an employer’s aesthetic preference, not when it was used as a proxy for strength or some other quality.
The Bauer court departed dramatically and unceremoniously from this history. The court applied the doctrine, in its own words, because of an innate—read: immutable—difference between the sexes.
But courts have traditionally justified the doctrine’s application to grooming and appearance standards precisely because these standards arguably don’t discriminate on the basis of immutable characteristics.
Unmoored from its traditional limitations and justifications, the doctrine is an atextual and unprincipled enigma. Even if one finds the mutability–immutability justification unsatisfactory, extending the unequal-burdens doctrine to new factual contexts only compounds the problem. It is also at odds with the oft-repeated principle that courts should narrowly interpret exceptions to Title VII liability.
One possible counterargument to this objection would reframe the unequal-burdens doctrine’s prior applications into two categories. The first category includes cases that hold that appearance or grooming standards for men and women may be substantively different so long as they impose analogous, in the sense of equally costly, burdens. For example, courts have ruled that employers may require female employees to wear makeup if there is a corresponding, but qualitatively different, requirement for male employees—for example, a requirement that they remain clean-shaven.
By comparison, a requirement that female employees wear a uniform with no corresponding requirement whatsoever for male employees would not be acceptable.
The second category of cases posits a somewhat different theory: Though the raw scores, cutoffs, or requirements imposed on men and women are quantitatively different, they impose equal, in the sense of qualitatively “the same,” requirements on men and women. Cases dealing with weight requirements—though in fact descended from the mutability justifications—could be reframed as fitting into this distinct category, along with Bauer. In this revisionist account, the first class (the substantively-different-but-equally-costly cases) relies on the mutability–immutability distinction, is most compromised by Price Waterhouse’s reasoning, and is most vulnerable to criticism on the normative grounds identified by commentators.
The second class (the formally-different-but-qualitatively-the-same cases, including Bauer) is justifiable on the distinct ground that the defendants are in substance imposing the same requirements on men and women. This understanding resolves the tension between the Bauer court’s stated justification (innate physiological differences) and the traditional justification for applying the unequal-burdens doctrine (mutable characteristics). It is also consistent with the Bauer court’s reliance on Gerdom v. Continental Airlines, Inc.,
a case challenging differential weight requirements.
But this understanding raises other questions. The cutoffs in Bauer imposed “the same” qualitative burden on male and female applicants only once normed to the applicants’ classes. Thus, the Bauer court’s reasoning relied on its determination that “real physiological differences” between the sexes prevent employers from measuring certain qualities (“fitness level” in Bauer) without reference to applicants’ sex.
However, Manhart seems to foreclose the position that a physiological-differences rationale can negate a prima facie case.
The Bauer court distinguished Manhart, or at least justified setting aside its simple test in favor of the unequal-burdens analysis, based on the observation that physiological differences exist between the sexes.
Yet, the key question in Manhart was whether a “physiological” difference between the sexes—longevity—could justify adjustments in conditions of employment.
The Manhart Court plainly rejected that argument, reasserting the focus on the individual envisioned by Title VII.
The Bauer court failed to explain why this pronouncement did not squarely address and dispose of the defendant’s theory on the import of statistical “physiological” discrepancies.
Perhaps one could reconceptualize the Manhart decision as rejecting the employer’s reliance on physiological differences because of the particular generalization and practice at issue. In other words, Manhart could be understood as saying that the nexus between the practice—in essence, making women go home with smaller paychecks at the end of the day—and the physiological difference—longevity—is too remote. But this reading seems flatly contrary to the language of the opinion. More fundamentally, distinguishing Bauer on these grounds only begs the question. The core issue in Bauer was whether an employer who uses a gender-normed PAT in hiring should be required to show some nexus between the test and the job at issue to justify the practice.
Under the Bauer framework, an employer using a normed test with no disparate impact would not have to justify the practice either as job-related and consistent with business necessity or as a BFOQ. Thus, the nexus between the physiological difference and the consequence could be as remote as in Manhart.
The Bauer court’s proposed framework is also susceptible to abuse. The court accepted as “undeniable” the fact that there exists some abstract concept of “fitness” that (a) is related to trainees’ performance and (b) can be measured only on a normed basis.
But the court did not cite any evidence presented by either party on these points, and there is reason to doubt those assumptions. For example, one study of female and male army trainees indicates that fitness level is in fact an important indicator of injury risk, the harm the FBI sought to avoid.
Yet the study defined fitness level by raw, non-normed scores and found gender to be an insignificant predictor of injury risk when controlling for fitness level.
Thus, a woman with a given 1.5-mile run time was as likely to be injured as a man with the same run time, not with the same gender-normed performance.
It also indicated, consistent with several other studies, that although women entered the training program in significantly worse physical shape than men, female trainees made much bigger fitness gains during the training program.
This study compromises two of the Bauer court’s factual assumptions: that “fitness level,” as measured relative to one’s gender class, is a relevant indicator of injury risk, and that differences in fitness between male and female trainees are necessarily or entirely attributable to innate physiological differences. These may seem like small quibbles with the facts of the Bauer case, and surely a single study is not dispositive of the issue; yet, this finding illustrates how courts applying the unequal-burdens doctrine to “physiological differences” cases might rely on erroneous assumptions about which differences are “real” or relevant. An employer seeking to defend a practice under the Bauer framework need only cook up some abstract construct and assert that the construct can be measured only on a gender-normed basis to escape Title VII’s requirement of a business justification.
The Bauer court also argued that the same unequal-burdens reasoning that justified its finding that there was no disparate treatment also justified its finding that the policy did not contravene Title VII’s cutoff-score provision.
Yet the cutoff-score provision would be meaningless if it required only that individuals receive the same score relative to their respective class. Indeed, Congress adopted the test-cutoff provision specifically due to its concern with norming.
Instead of addressing this question head-on, the court seemed to imply that its observation about “physiological differences” justified this conclusion as well.
All in all, the Bauer court’s reasoning fails to explain how Title VII permits gender-normed PATs without any valid business justification. The unequal-burdens test is an atextual, normatively problematic branch of Title VII doctrine, based on the shaky distinction between “mutable” and “immutable” characteristics. This already-tenuous justification cannot explain the extension of the doctrine to embrace a distinction based on “innate” physiological differences. Further, the court’s reliance on physiological differences seems directly contrary to the Supreme Court’s holding in Manhart. Finally, the facts of Bauer demonstrate how the doctrine is susceptible to abuse and how it encourages courts to rely on armchair empiricism about which physiological differences are real or relevant.
D. The Ricci Defense
As something of an aside, one final strain of Title VII doctrine deserves brief attention. The district court in Bauer suggested that the defense first articulated in Ricci v. DeStefano
might alternatively provide an out for employers using gender-normed tests. Because the FBI had not raised a Ricci defense, the district court did not examine its applicability at length, and the Fourth Circuit did not consider it at all.
Yet it is doubtful that Ricci can or should apply here.
In Ricci, white firefighters challenged the decision of the city of New Haven, Connecticut, to throw out the results of two promotion exams after their administration.
The city decided to do so because the exams had had an unexpectedly extreme disparate impact on African American applicants.
The Court’s decision proceeded in several steps: First, it assumed without discussion that the decision to throw out the tests constituted disparate treatment.
Second, it held that such a decision could be justified only if there was a “strong basis in evidence” of disparate impact liability but for the decision.
Finally, it applied each of the disparate impact prongs (prima facie case, job relatedness–business necessity, and alternative, less discriminatory practice) to the hypothetical case-within-a-case to assess whether the defendant would have been liable had it not thrown out the exams.
Upon finding there was no strong basis in evidence for the potential disparate impact liability, the Court invalidated the decision to throw out the test results.
There are at least two reasons why Ricci should not or would not apply to a case like Bauer.
First, in Ricci, the original test had been administered and the results received; New Haven’s attempt to avoid disparate impact liability was based on these concrete results. The Court was therefore able to assess on the record before it whether the employer would have been liable but for its decision to scrap the test results. In Bauer, by contrast, it’s not clear what the relevant counterfactual for the case-within-the-case would be. Would a court simply assume that the employer would otherwise adopt a unitary standard as stringent as the higher of the two cutoffs? The lower of the two? Or would it look to the test the employer used prior to the gender-normed approach, no matter how far back in history it must go or how related that former test is to the challenged one? The point is that Ricci works only if there is a factual baseline on which to judge the hypothetical disparate impact liability. Second, a more fundamental issue is whether Ricci can be stretched so far as to permit the adoption of a test that itself is a form of disparate treatment. In Ricci, the problematic practice was the (one-time) decision to throw out the original test after administration, not the new test adopted after that decision.
But in Bauer and cases like it, the problematic practice is the test itself.
It would radically expand Ricci’s reach to apply it to excuse this kind of ongoing disparate treatment. Taken to these extremes, Ricci could be read to excuse any decision or policy adopted “because of” a protected characteristic, so long as there is some imaginable counterfactual that would give rise to disparate impact liability.
III. A Solution to the Gender-Normed Physical-Ability Test Problem: A Return to Title VII First Principles
The analysis in Part II suggests that, contrary to the Bauer court’s reasoning, Title VII does not permit the use of gender-normed PATs absent a valid business justification. This Part considers whether this is a desirable state of affairs. Section III.A argues that the Bauer court’s approach is harmful to women’s equality in the workplace and that requiring employers to put forth a business justification for the use of gender-normed PATs helps root out entrenched stereotypes about the primacy of masculinity in traditionally male job sectors. Section III.B then suggests that applying the traditional framework for disparate treatment claims—in other words, requiring a demonstration that the desired characteristic is a bona fide occupational qualification—best addresses this concern. Yet, the application of the disparate treatment framework to gender-norming must be paired with a demanding business-necessity standard in the corresponding disparate impact challenges to unitary PATs. A rule that requires a similarly rigid business justification under either a disparate impact theory or a disparate treatment theory would incentivize employers to tailor their physical hiring practices more closely to the actual demands of the job, thereby discouraging arbitrary practices that either promote harmful stereotypes or have discriminatory effects.
A. An Antisubordination Critique of the Bauer Approach
Scholars have framed debates over the proper approach to antidiscrimination law as a conflict between the anticlassification and antisubordination traditions.
The core claims of the anticlassification approach are that distinctions on the basis of a protected characteristic are virtually never permissible and that facially neutral practices are virtually always permissible, so long as they are not mere pretexts to invidious discrimination.
It is therefore sometimes described as the “colorblindness” principle when applied to race discrimination
and is associated with the conservative wing of the Supreme Court.
In contrast, the antisubordination approach holds that antidiscrimination law should aim to combat the historic and systematic subordination of certain classes and that distinctions on the basis of race, sex, or other protected characteristics are not objectionable if they seek that end.
It is associated with the liberal wing of the Supreme Court.
Recently, Professor Reva Siegel has proposed replacing this traditional dyadic model with a triadic model, recognizing a third position: the antibalkanization principle.
Unlike the anticlassification perspective, this approach recognizes a moral and legal difference between those measures adopted to promote equality and those adopted to reinforce inequality; yet, unlike the antisubordination perspective, it gives credence to concerns about social cohesion and prefers state action that is neutrally structured to mitigate social provocation.
Professor Siegel divines this approach from the opinions of “swing” Justices, including Justices Powell, O’Connor, and Kennedy.
Title VII challenges advanced by favored-group members like Jay Bauer (a man bringing a sex discrimination claim) pose distinctive problems for antisubordination theories.
These so-called “reverse discrimination” suits—which comprise a significant portion, if not the majority, of recent challenges to law enforcement hiring
—seem to pit interests in substantive equality, the chief antisubordination concern, against formal equal treatment, the chief anticlassification concern. Indeed, the use of different cutoffs is flatly unacceptable from an anticlassification perspective. The legislators who championed Title VII’s cutoff-score provision spoke in essentially anticlassificationist terms,
and the Bauer decision drew anticlassificationist criticism from right-leaning media sources because of the perceived hypocrisy of treating sex discrimination claims by men differently than those by women.
Thus, the Bauer dilemma might, at first blush, seem like a neat microcosm of the debate between the antisubordination and anticlassification approaches. The following analysis attempts to complicate that understanding.
In a simplistic sense, gender-norming can open doors for women. Compared against the baseline of a unitary standard employing the higher of the two possible cutoffs—for example, thirty push-ups in Bauer—gender-norming may permit more women to access certain employment opportunities. Nonetheless, gender-norming PATs can meaningfully, though indirectly, contribute to the subordination of women in the workplace, particularly in traditionally male-dominated occupations. The practice insulates from judicial scrutiny
employment selection devices that privilege masculine physicality, even though those devices may have little to do with the job at issue. In the law enforcement context, these practices arbitrarily buttress the stereotype that women are innately less capable of successfully filling roles that implicate public safety.
This stereotype negatively and concretely impacts women’s application to, retention in, and promotion from these roles.
Under the Bauer framework, employers may use selection devices that have no relation whatsoever to the job at issue, so long as the employers have successfully normed the relevant cutoff scores. To illustrate, an employer seeking to fill a role that requires no physical strength at all—say, an accounting job—could require male and female applicants to pass a gender-normed push-up test without contravening Title VII. On the other hand, if an employer administered a unitary push-up test that had a statistically disparate impact, the employer almost certainly could not justify it as job related and consistent with business necessity.
And even if the employer could somehow overcome that barrier, the plaintiff could still prevail by showing the existence of a less discriminatory alternative.
The accountant hypothetical seems absurd because push-ups are so obviously unrelated to accounting, but this hypothetical, exaggerated as it may be, is not quite as dissimilar to law enforcement as it appears. In the words of Professor Mary Anne Case, “The job of police officer is one whose history of being gendered masculine is virtually unsurpassed.”
Yet the relationship between strength and speed and successful law enforcement is unclear at best, and the literature on the relation of physical selection devices to performance in law enforcement roles is mixed.
For many, the idea of policing conjures images of dramatic physical work, but the bulk of police work doesn’t involve intense physical tasks like foot pursuits—indeed, modern policing standards often recommend against them
—and doesn’t (or perhaps shouldn’t) routinely involve violent confrontations.
What it does involve is crisis de-escalation; communication with victims, suspects, and communities; clerical work; and patrol.
Even when police officers are called on to address violent crime, strength and speed requirements are still suspect predictors of performance. Some advocates of a more gender-balanced police workforce emphasize that most violent crime to which officers respond is male-on-female domestic violence and that women as a class are better at responding effectively to these situations.
A related body of scholarship also suggests that, on average, female police officers are more adept at avoiding violent confrontations in the first instance.
These accounts admittedly speak in an essentialist register that should engender skepticism. But the claim here is not that women make better police officers or different “kinds” of police officers. Rather, the claim is a more modest one: that courts should not simply assume that masculine-coded traits like strength and speed are necessary to effective performance of a job—even one as seemingly familiar as law enforcement—without demanding some evidence that that is so.
Anecdotal evidence also illustrates this point. As discussed in section I.B, PATs frequently fail the job-relatedness–business-necessity test in disparate impact challenges, showing that it is hardly farfetched to believe that employers are using invalid physical selection devices.
Bauer too is illustrative. In Bauer, the FBI justified its PFT on two bases: First, it argued that the test was important to strong on-the-job performance.
The district court rejected this justification, noting that the absence of any physical test for incumbent Special Agents belied the argument that the PFT was necessary to success in that role.
In the alternative, the FBI argued the PFT was a screening device intended to filter out applicants prone to injury during the training program.
Evidence showing that the FBI initially developed the PFT in response to high injury rates partially corroborated this claim.
Yet, Bauer passed the pre-training PFT.
It was the post-training test that he was unable to pass.
In other words, he had already avoided training injury by the time he failed, rendering the FBI’s justification nonsensical.
PATs’ often suspect relation to job performance is especially significant because these tests frequently demand physical qualities like strength and speed that code as masculine, as opposed to physical qualities like flexibility and endurance that code as feminine or gender neutral.
This poses a significant impediment to employment equality.
First, it can deter female job seekers from applying to these jobs.
Second, the use of these tests emphasizes qualities that women are perceived to possess in lesser amounts than men, generating and perpetuating the view that women are inherently less qualified to serve in these roles.
In the law enforcement context, a significant body of literature shows that these stereotypes undermine policewomen’s credibility among peers
and the communities they serve,
professional advancement, and work satisfaction.
When applied to gender-normed tests, this latter critique may partially sound in antibalkanization. The antibalkanization theory suggests that preferential treatment exacerbates interclass resentment and erodes social cohesion. This, in turn, undermines the advancement of protected classes, whose achievements are tainted with the odor of paternalistic preference.
Thus, on an antibalkanization account, gender-norming might stunt policewomen’s success by arousing resentment among peers and superiors. But the critique also has an important antisubordination angle. On this account, the trouble with gender-normed tests is that they shield employers’ potentially arbitrary use of hiring practices that reinforce harmful and unnecessary stereotypes about women’s ability to perform in the workplace. Further, gender-normed tests arguably amplify the problem, by emphasizing in one breath the importance of masculinity to success, while reminding in the next breath that women can’t possibly measure up.
With that said, one need not think that police work doesn’t require much strength or speed to believe the antisubordination critique of Bauer. The point is simply that the costs described above are justifiable only if the job at issue actually requires the physical qualities for which the employer tests. Put differently, one can, consistent with this view, believe (1) that employers should adopt only those physical selection devices that accurately predict job performance and (2) that gender-normed law enforcement PATs are likely to pass this test. The question is merely whether employers should have to show the relation at all.
Finally, to be clear, non-job-related hiring procedures that emphasize traditionally masculine or feminine qualities do not alone violate Title VII absent either impermissible disparate treatment or disparate impact. Title VII requires business justification not in the first instance but rather in response to a prima facie case of discrimination.
Thus, if gender-norming PATs is not itself a form of disparate treatment, the antisubordination critique would not of its own force transform those tests into a form of sex discrimination under Title VII. However, if gender-norming is a form of disparate treatment under current Title VII doctrine, as argued in Part II, the critique above suggests it should not be understood as a straightforward case of anticlassification or antibalkanization triumphing over antisubordination. Or, put another way, the critique suggests that judges inclined to read Title VII through an antisubordination lens need not strain present doctrine to accommodate the practice. While unitary hiring standards that impose a disparate impact on women perpetuate the gender hierarchy by exclusion, job-unrelated gender-normed PATs perpetuate the gender hierarchy by arbitrarily privileging masculinity while evading judicial review. Bauer may then be the rare “reverse discrimination” case in which anticlassification, antibalkanization, and antisubordination point to the same result.
B. Applying Title VII to Gender-Normed Physical-Ability Tests
In proposing a solution to the puzzle of gender-normed PATs, this section proceeds from the following premises: First, unitary fitness standards that have a disparate impact on a protected class are a form of discrimination, unless they are job related and consistent with business necessity.
Second, arbitrary hiring practices that reinforce stereotypes about women’s inadequacy impose a distinct harm, even if they do not have a disparate impact on protected classes.
Third, a Bauer-like approach to gender-norming insulates these practices from judicial review, thus creating a problem from an antisubordination perspective (in addition to the more obvious anticlassification and antibalkanization critiques).
And fourth, courts should adopt neither the unequal-burdens doctrine nor any other account based on “real physiological differences” to accommodate this practice.
Based on these premises, an ideal regime would permit employers to adopt PATs—normed or unitary—if, but only if, success on those tests were truly critical to the performance of the job at issue. Requiring employers to justify gender-normed and unitary PATs does not put the employer in an impossible damned-if-you-do, damned-if-you-don’t bind, since a third alternative—eliminating or decreasing the physical requirements—always remains open. Existing Title VII doctrine provides just such a solution. Specifically, physical hiring tests with an impermissible disparate impact should be assessed under a demanding business-necessity standard that would require a showing that the test reflects the actual requirements of the job. This is the approach already taken by some courts of appeals.
This strict standard for disparate impact challenges should be paired with the typical disparate treatment framework in normed cases; in other words, employers would need to show that their gender-normed PATs were BFOQs.
The application of the BFOQ defense to a gender-normed PAT presents two wrinkles. Traditionally, the question the BFOQ defense poses is: Is sex itself a bona fide occupational qualification “reasonably necessary to the normal operation of that particular business or enterprise[?]”
The question posed in the gender-norming context would be: Is the desired gender-normed quality a BFOQ? There is no theoretical barrier to applying the test in that way, but there arguably is a textual one; the statute, in describing the BFOQ defense, says it is not unlawful for an “employer to hire and employ employees . . . on the basis of his . . . sex . . . in those certain instances where . . . sex . . . is a bona fide occupational qualification . . . .”
In contrast to other portions of the statute, it says nothing explicit to excuse an employer “classifying” employees on the basis of sex,
nor does it say anything about employment actions other than hiring or refusing to hire. But it would be absurd to read this provision to excuse employers from firing on the basis of sex and to excuse employment agencies from classifying on the basis of sex, but not to excuse employers from classifying on the basis of sex.
The trickier question is when, if ever, gender-normed physical fitness would be a BFOQ. The BFOQ defense is exceedingly strict by design.
A requirement must “relate to the ‘essence,’ or to the ‘central mission of the employer’s business’” to be a BFOQ.
For example, the safety of third parties who are neither customers nor essential to the business cannot support a BFOQ;
similarly, neither the additional cost of employing one sex
nor customer preference counts as a BFOQ.
That’s not to say that a gender-normed PAT logically never could be a BFOQ—just that it’s difficult to imagine when it would be. But the stringency of the test does not undermine the reasoning supporting its application.
At most, the strict construction of the BFOQ defense in disparate treatment cases underscores the need for a demanding business-necessity standard in corresponding disparate impact cases. Requiring employers who gender-norm to justify the tests as BFOQs but then giving what would amount to a free pass
to employers who use unitary standards would in fact exacerbate the problem outlined in section III.A. It would likely have the additional effect of decreasing access for women by allowing employers to set arbitrarily high, exclusionary unitary standards but prohibiting them from lowering those standards when applied to women. Thus, courts taking the approach advocated here and rejecting the Bauer court’s analysis of gender-normed tests must bring a similar skepticism to unitary standards and require an actual, demanding show
ing of business necessity and job relatedness. In sum, traditional Title VII disparate treatment doctrine provides the proper framework under which to analyze gender-norming, but only if courts require a business justification in corresponding disparate impact cases as well.
Conclusion
The Fourth Circuit’s decision in Bauer v. Lynch stretches the unequal-burdens doctrine beyond its principled limits. In doing so, it creates a significant and anomalous exception to Title VII and shields from judicial scrutiny arbitrary hiring practices that reinforce harmful stereotypes about women’s inadequacy in public safety roles. As this Note suggests, the better reading of Title VII’s text and Supreme Court jurisprudence is that gender-normed PATs are not permitted absent a BFOQ. This doctrinal conclusion is normatively defensible not only from an antibalkanization or anticlassification perspective but also from an antisubordination perspective, and courts considering gender-normed physical-ability tests in the future should reject the Bauer court’s approach. But this solution will serve its intended ends only if courts apply a stringent business-necessity standard in corresponding challenges to unitary physical hiring criteria, an approach that some but not all lower courts already take. By asking employers that insist on using discriminatory physical-ability tests in any form to present business justifications, courts can best promote Title VII’s antisubordination principle.