The Gold Double Standard
What's behind the scientific community's response to Bhattacharya's role in a conference on scientific integrity
The Golden Ratio
The decision of the National Academies of Science, Engineering, and Medicine to convene a recent Workshop on Enhancing Scientific Integrity Progress and Opportunities in the Social and Behavioral Sciences drew heavy criticism from the scientific community. Everything about the event felt oddly misplaced, from the expertise of the speakers to the topics they emphasized.
One can’t help but wonder, for example, why the 18 speakers at a workshop on the social and behavioral sciences featured three physicians, a microbiologist, a molecular biologist, and a geneticist, or why so much of the discussion centered on biomedical research. That’s not to say that scientists must remain in disciplinary silos. But only a handful of the speakers actively conduct original research involving primary data collection in the social and behavioral sciences.
Still, that imbalance alone does not explain the reaction. The “vitriol” that prompted one of the organizers to post a plea for open minds on BlueSky was not about disciplinary representation. It was about three speakers in particular: John Ioannidis, Emily Oster, and, above all, Jay Bhattacharya.
All three played significant roles in shaping public discourse during the COVID-19 pandemic. Two were key authors of a widely publicized seroprevalence study in Santa Clara County that suggested the virus was far less lethal than initially believed. Jay Bhattacharya went on to coauthor the Great Barrington Declaration and now leads the NIH.
It is his presence as the featured speaker at a conference on scientific integrity that drew particular ire from the scientific community. The “ratio” for the post was almost 15:1.
To understand that reaction, it is worth revisiting that study—not in terms of whether it was “right” or “wrong,” but in terms of the standards Bhattacharya himself now invokes as the measure of scientific quality: so-called “Gold Standard Science.”
What Is “Gold Standard Science”?
According to the framework frequently promoted in policy and public discourse, science meets the “gold standard” when it is:
collaborative and interdisciplinary
skeptical of its findings and assumptions
free of conflicts of interest
transparent
structured for falsifiability
communicative of error and uncertainty
accepting of negative results
subject to unbiased peer review
reproducible
On its face, this is a reasonable list. These are principles most scientists would endorse. What it omits is equally important. It says nothing about how to conduct original research in a specific domain, where details of design and data generating processes determine validity.
That omission matters. It is reflected both in the framing of the workshop and in the study that elevated Bhattacharya’s public profile.
The Santa Clara Study
In April of 2020, Jay Bhattacharya and colleagues conducted one of the most consequential studies of the entire pandemic. Its key finding was an estimate of the rate at which people with COVID were dying, the infection fatality rate. That estimate of 0.17% fed the belief that COVID was “no worse than the flu”.
The virus ultimately demonstrated that this estimate was substantially low, likely by a factor of four to eight. The implications of that estimate extended well beyond the study itself, shaping how both policymakers and the public understood the risks of COVID.
To understand what went wrong, it is informative to hold the study to the “gold standard” its senior author now promotes.
A “Gold Standard” Assessment
Collaborative and Interdisciplinary
Interdisciplinarity is not inherently a marker of quality. It improves science only when the expertise brought together is directly relevant to the problem being solved.
With that in mind, consider the author list. It includes economists, a hedge fund manager, laboratory scientists without prior work in antibody testing or infectious disease, an infectious disease physician, an epidemiologist who specialized in meta-analysis, and a group of students.
This broad range of disciplines includes neither an infectious disease epidemiologist nor an expert in antibody testing, the two fields most relevant to the study. Notably, the Stanford-based investigators did consult experts in antibody testing who expressed concerns about the reliability of the assay and declined authorship. At the same time, there is no evidence of prior work in seroepidemiologic survey design among the authors.
So, this study team is collaborative and interdisciplinary in a way that demonstrates the failure of this as a measure of study quality more than it affirms the scientific integrity of the results.
Skeptical of its findings and assumptions
The study’s most significant failure was not in laboratory methods but in epidemiology.
Obtaining a representative sample is difficult in any population study, but uniquely challenging in a seroprevalence study, where participation requires a blood draw. Individuals who believe they have been exposed are far more likely to participate, particularly in the context of limited testing and widespread concern about undiagnosed infection.
The WHO guide on seroprevalence studies explicitly cautions against using advertisements to recruit, “due to both the very high risk of introducing bias and the inability to assess the representativeness of the study population.” The authors nonetheless recruited participants through Facebook ads.
Their only concern with respect to the representativeness of the population related to ethnicity rather than the likelihood of undiagnosed COVID. The raw data yielded an estimated prevalence of 1.5%. The authors then applied demographic adjustments that increased that estimate to 2.8%, driven largely by underrepresentation of Hispanic participants. They did not adequately consider whether the observed differences reflected underlying prevalence or differences in who was willing and able to participate.
In a setting where participation is closely tied to prior illness or perceived exposure, such adjustments can amplify bias rather than correct it.
Further concerns arise from recruitment practices. An email sent by Bhattacharya’s wife to a school listserv suggested that participation would provide “peace of mind” and indicate whether individuals were “immune.” The study does not address the potential impact of this recruitment channel.
A self-skeptical approach would have tested whether these factors influenced the results. That step is not evident in the analysis. The paper doesn’t even mention the email, much less try to assess its impact.
Conflicts of Interest
The study was partially funded by the David Neeleman, founder of JetBlue , who had a clear financial interest in the outcome. In a piece written days before the study was released titled, “Stanford Professors’ Coronavirus Study Could Be Game Changer”, Neeleman himself described his friendship with the authors and stated that he saw them as the “solution” to the very real financial threat COVID posed for his airlines. In an email to Neeleman, hedge fund manager and coauthor, Andrew Bogan, states, “David, I think you should write Taia a note and tell her you’ll support her lab if she validates this kit.” Neeleman did subsequently reach out to Taia Wang, the Stanford microbiologist who refused to certify the kit. She refused his offer of support.
On top of that, there was reputational conflict of interest in that four of the authors had written a total of three opinion pieces, two in the Wall Street Journal and one in STAT, prior to beginning research, asserting that the IFR was likely to be far lower than the WHO estimate.
In what appears to be a direct contradiction to Neeleman’s own words and the email record, the authors’ Conflict of Interest statement in the published paper explicitly states that they have none.
Transparency
The authors’ apparent lack of transparency about Neeleman’s role is at least as concerning as the funding itself. Their failure to even mention the email from Bhattacharya’s wife in the published paper also represents a glaring lack of transparency.
The authors are equally opaque with respect to elements of their methods, particularly subject selection despite its centrality to the paper’s validity. Their only discussion of selection bias is buried deep in the supplementary data and smothered with a poorly delineated and minimally justified weighting scheme related to likely symptom prevalence. I have provided a detailed critique elsewhere, but there are three key weaknesses with their method.
1. They never provide information on the number of Facebook users who had viewed the ad, only stating that 11,000 clicked on it in the first 24 hours. The typical click rates for those ads (3% in 2020) puts the number of viewers above 360,000, more than one hundred times the number that participated. A highly self-selected group.
2. The list of symptoms they asked about did not include muscle aches, fatigue, or headaches, three of the six most common COVID symptoms.
3. They do not ask about severity of symptoms.
It is notable that they never considered asking subjects directly why they participated or if they thought they might have had COVID.
Falsifiability and Communication of Error and Uncertainty
The study reports confidence intervals, but this is a minimal standard. More important is whether the authors explored how their conclusions would change under different assumptions.
That type of sensitivity analysis is not meaningfully developed, despite the dependence of the results on uncertain inputs.
Acceptance of Negative Results
The pandemic itself provided a test of the study’s conclusions.
If the infection fatality rate were 0.17%, the total COVID deaths in the United States would imply well over a billion infections once vaccine effects are taken into account, clearly an implausible number.

When I raised this discrepancy with Bhattacharya at a conference in 2024, he refused to meaningfully revise his conclusions.
Peer Review
The study was rapidly disseminated as a preprint and amplified through media before undergoing full scrutiny. While preprints are an important tool, in this case they functioned less as a step in scientific dialogue and more as a vehicle for immediate public influence. The authors heavily promoted the findings in the absence of peer review. When the paper was finally peer reviewed and published it was in a journal where John Ioannidis, a co-author, served as an editor.
Reproducible
The “reproducibility crisis” occupied most of Bhattacharya’s discussion at the workshop described above. This arises from a legitimate concern that papers that appear to duplicate existing work are difficult to publish, and negative studies are often never submitted. Those are not difficult issues, particularly with the growth of preprints. But he takes it a step further, announcing that the NIH explicitly solicit replication studies.
Was the Santa Clara study reproducible? First, the only type of study that can be purely replicated is a laboratory experiment. Observational studies must accommodate the challenges of a particular time and place and research problem and can never be truly replicated. Science that requires observational research advances by triangulation, not replication.
There was good reason to conduct multiple sero-prevalence studies in different places and time. However, in light of the above, lack of transparency would make this study difficult to duplicate and the flaws in its design would make pure replication inappropriate and unwise.
Scientific Integrity
The problem is not that the Santa Clara study was wrong. Many studies were wrong in early 2020. The problem is that the structure of the study made error likely, the direction of that error predictable, and the response to those limitations insufficiently self-critical. More importantly, Bhattacharya has repeatedly positioned himself as an authority on “Gold Standard Science”, despite the fact that his own most influential study fails to meet many of those criteria.
Which brings us back to the response of the scientific community to the NASEM Workshop. Bhattacharya is the face of biomedical science for the Trump Administration, an administration that has slashed funding, politicized grant making, and repeatedly promoted flawed science, particularly with regard to vaccines. The asymmetrical application of the “Gold Standard” has become emblematic its relationship with science.
Bhattacharya has repeatedly emphasized the importance of transparency, skepticism, and openness to scrutiny as defining features of “Gold Standard Science.” Yet his recent decision to block publication of a COVID vaccine study from the CDC’s Morbidity and Mortality Weekly Report raises serious questions about whether those standards are being applied consistently.
That inconsistency is particularly striking in the context of this workshop. Bhattacharya is not simply a participant, but a featured speaker at an event on scientific integrity, despite the fact that his most influential study fails to meet the very standards he now promotes.
Taken together, this is what concerns the scientific community. The issue is not disagreement. It is the perception that scientific standards are being invoked selectively and not applied to the work that most demands them.






Epidemiologist and liberal-leaning scientist here. The analysis you’ve provided reveals a profound bias you have against the authors. Ioannidis’ STAT piece and other articles at the time didn’t advocate necessarily for low IFR but noted we are making consequential - and potentially harmful - decisions in the face of uncertainty. In that more truthful portrayal of the authors’ work, their efforts to produce data with honest methods at a time when data was scarce suggests they were acting on their own insights much like all scientists do when they pursue hypotheses in broader theories and paradigms in which they’ve played some role.
The more you write about this, the more I’m concerned that you and many scientists in the field are failing to learn the lesson of COVID-19. The lesson isn’t “GBD bad” but that our health scientific ecosystem is unhealthy due to the ways scientists deviate from good standards of science.
From the Proximal Origin paper claiming a lab origin was “implausible” while the authors privately believed it was “so friggin likely” (and ghostwritten by people with a true conflict of interest: the funders of virology research in Wuhan) to the Imperial College forecasts of early 2020 that created the scientific opposition of many of these scientists, myself, and others by misleading managers with models that truly lacked the consequential sensitivity analyses models typically require (and not ad hoc sensitivity analyses burdensome reviewers like yourself are imposing selectively on the SC serosurvey but not to the Manus hospital analysis, vaccine cost-benefit analyses, and more), it’s becoming clear that the rift from COVID will endure for the rest of our lives. As a young scientist, I look forward to arguing against every point you raised until the day I die, and thus the public will see warring scientists, unsure who to trust, because nobody with platforms seems willing to do the hard work of bridging divides.
Just make note that scientists and science writers like you make liberal epidemiologists like me more supportive of the MAHA movement’s efforts to reform academic science, and this article is an exemplary demonstration of why so many people - scientists, managers, & lay people - feel let down by elitist academic “expert” cabals during COVID who could only present their paradigm at the detriment of the managers and public that needed to know the full range of perspectives and uncertainty to make their own, informed decisions.