Skip to content

OK, here’s a hierarchical Bayesian analysis for the Santa Clara study (and other prevalence studies in the presence of uncertainty in the specificity and sensitivity of the test)

After writing some Stan programs to analyze that Santa Clara coronavirus antibody study, I thought it could be useful to write up what we did more formally so that future researchers could use these methods more easily.

So Bob Carpenter and I wrote an article, Bayesian analysis of tests with unknown specificity and sensitivity:

When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modeling capturing variation in these parameters across experiments. Another concern is the people in the sample not being representative of the general population. Statistical adjustment cannot without strong assumptions correct for selection bias in an opt-in sample, but multilevel regression and poststratification can at least adjust for known differences between sample and population. We demonstrate these models with code in R and Stan and discuss their application to a controversial recent study of COVID-19 antibodies in a sample of people from the Stanford University area. Wide posterior intervals make it impossible to evaluate the quantitative claims of that study regarding the number of unreported infections. For future studies, the methods described here should facilitate more accurate estimates of disease prevalence from imperfect tests performed on nonrepresentative samples.

The article includes a full description of our models along with R and Stan code. (We access Stan using cmdstanR.) And it’s all on this pretty Github page that Bob set up!

The paper and code are subject to change. I don’t anticipate any major differences from the current version, but Bob is planning to clean up the code and add some graphs showing dependence of the inferences to prior distributions on the hyperparameters. Then we’ll post it on Arxiv or Medrxiv or ResearchersOne or whatever.

Also, if we get raw data for any studies, we could do more analyses and add them to the paper. Really, though, the point is to have the method out there so that other people can use it, criticize it, and improve upon it.

Above I’ve quoted the abstract of our paper. Here’s how end it:

Limitations of the statistical analysis

Epidemiology in general, and disease testing in particular, features latent parameters with high levels of uncertainty, difficulty in measurement, and uncertainty about the measurement process as well. This is the sort of setting where it makes sense to combine information from multiple studies, using Bayesian inference and hierarchical models, and where inferences can be sensitive to assumptions.

The biggest assumptions in this analysis are, first, that the historical specificity and sensitivity data are relevant to the current experiment; and, second, that the people in the study are a representative sample of the general population. We addressed the first concern with a hierarchical model of varying sensitivities and specificities, and we addressed the second concern with multilevel regression and poststratification on demographics and geography. But this modeling can take us only so far. If there is hope or concern that the current experiment is has unusual measurement properties, or that the sample is unrepresentative in ways not accounted for in the regression, then more information or assumptions need to be included in the model, as in Campbell et al. (2020).

The other issue is that there are choices of models, and tuning parameters within each model. Sensitivity to the model is apparent in Bayesian inference, but it would arise with any other statistical method as well. For example, Bendavid et al. (2020a) used an (incorrectly applied) delta method to propagate uncertainty, but this is problematic when sample size is low and probabilities are near 0 or 1. Bendavid et al. (2020b) completely pooled their specificity and sensitivity experiments, which is equivalent to setting sigma_{gamma} and sigma_{delta} to zero. And their weighting adjustment has many arbitrary choices. We note these not to single out these particular authors but rather to emphasize that, at least for this problem, all statistical inferences involve user-defined settings.

For the models in the present article, the most important user choices are: (a) what data to include in the analysis, (b) prior distributions for the hyperparameters, and (c) the structure and interactions to include in the MRP model. For these reasons, it would be difficult to set up the model as a plug-and-play system where users can just enter their data, push a button, and get inferences. Some active participation in the modeling process is required, which makes sense given the sparseness of the data. When studying populations with higher prevalences and with data that are closer to random samples, more automatic approaches might be possible.

Santa Clara study

Section 3 shows our inferences given the summary data in Bendavid et al. (2020b). The inference depends strongly on the priors on the distributions of sensitivity and specificity, but that is unavoidable: the only way to avoid this influence of the prior would be to sweep it under the rug, for example by just assuming a zero variation in the test parameters.

What about the claims regarding the rate of coronavirus exposure and implications for the infection fatality rate? It’s hard to say from this one study: the numbers in the data are consistent with zero infection rate and a wide variation in specificity and sensitivity across tests, and the numbers are also consistent with the claims made in Bendavid et al. (2020a,b). That does not mean anyone thinks the true infection rate is zero. It just means that more data, assumptions, and subject-matter knowledge are required. That’s ok–people usually make lots of assumptions in this sort of laboratory assay. It’s common practice to use the manufacturer’s numbers on specificity, sensitivity, detection limit, and so forth, and not worry about that level of variation. It’s only when you are estimating a very low underlying rate that the statistical challenges become so severe.

One way to go beyond the ideas of this paper would be to include additional information on patients, for example from self-reported symptoms. Some such data are reported in Bendavid et al. (2020b), although not at the individual level. With individual-level symptom and test data, a model with multiple outcomes which could yield substantial gains in efficiency compared to the existing analysis using only the positive/negative test result.

For now, we do not think the data support the claim that the number of infections in Santa Clara County was between 50 and 85 times the count of cases reported at the time, or the implied interval for the IFR of 0.12-0.2%. These numbers are consistent with the data, but the data are also consistent with a near-zero infection rate in the county. The data of Bendavid et al. (2020a,b) do not provide strong evidence about the number of people infected or the infection fatality ratio; the number of positive tests in the data is just too small, given uncertainty in the specificity of the test.

Going forward, the analyses in this article suggest that future studies should be conducted with full awareness of the challenges of measuring specificity and sensitivity, that relevant variables be collected on study participants to facilitate inference for the general population, and that (de-identified) data be made accessible to external researchers.

P.S. I’ve updated the article, fixing some typos and other things, and adding references and discussion based on comments we’ve received here and elsewhere.


  1. Steve says:

    I now think it is time to raise the issue that the study was apparently sponsored by David Neeleman, the JetBlue Airways founder, and that funding source was not disclosed. Neeleman has been a critic of the lockdown and stay at home orders as you might expect from his economic interests. There is nothing wrong with research being funded by the private sector or even by funding sources that have an agenda, but this is a prime example of why those sources need to be disclosed. If your data is consistent with zero infections and almost everyone is infected, we should know if the researchers might have a conscious or semiconscious agenda.

    • Joshua says:

      I really wish that people would keep concerns about the funding disclosure separate from concerns about the validity of the study findings.

      Yes, there are important issues with respect to conflict of interest. And they could impact on the validity of the findings. But whether or not there was an impact is not known, and it never will be. The authors should be held accountable for not disclosing conflict of interest – but it really should be, IMO, a separate matter.

      IMO, in the end mixing the two areas of concern together is likely to only divert attention away from the issues of scientific validity.

      • Steve says:

        I agree that the typical ad hominem attacks — this was funded by industry so it must be biased is improper, but I don’t see how the issues can be separated. Researchers have varying degrees of freedom. There biases effect how they chose to design a study, analyze the data, etc. There is not some “objective” absolute.

        • Aren’t most of these COVID19 related studies backed by some interests? Of course, they are. Yet the other predictions, for example, were not criticized for their backing. There are double standards in that regard.

          In any event, I think, if the antibody test can be improved, it is one of the best measures of the impact of the COVID19 virus.

          • Steve says:

            I don’t think it’s a double standard. In the Standford study, the source of the funding was not disclosed. If that is true for other studies, it is equally wrong. Like I said, “There is nothing wrong with research being funded by the private sector or even by funding sources that have an agenda, but this is a prime example of why those sources need to be disclosed.”

            Everyone is biased and those biases will impact the research results. There is no escaping that, but when there is a big money with an agenda behind the study, we should be particularly skeptical. The FDA insists on getting raw data from pharma companies and does various reviews, and brings in outside experts exactly because we know left to their own devices the financial interests would swamp the scientific integrity of phrama researchers. Again, I agree that “You’re biased” is not an argument. Everyone is biased. But, “You’re not honest about your biases” is a good reason for a higher degree of skepticism.

      • John Williams says:

        According to Ioannidis (2005, Why most published research findings are false):

        “Corollary 5: The greater the financial and other interests and prejudices
        in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest
        are very common in biomedical research [26], and typically they are inadequately and sparsely reported [26,27]. … ”

        Seems right to me.

  2. Julien Riou says:

    Great stuff!

    Two potential improvements:

    1) Some people proposed a cut-off free approach to estimate seroprevalence from imperfect tests, using continuous measurements instead of dichotomizing the results of the serology into positive or negative (

    2) The specificity and sensitivity of a serologic test are generally assessed using RT-PCR (ie detecting the presence of the virus itself instead of antibodies) as a reference. But RT-PCR is also imperfect! Among people with multiple RT-PCR tests, some tests will turn up negative. It would be great to include this second level and propagate all uncertainty.

  3. Joshua says:

    Andrew –

    > Isn’t it important to discuss other aspects of the uncertainty involved if you’re going to jump from infection rate in Santa Clara at that point in time to infection rate and infection fatality rate more broadly applied?

    For example, what about all the uncertainty in death counts? Shouldn’t there be some consideration of that when calculating a confidence interval?

    What about all the uncertainty with respect to the national representativeness of data for Santa Clara, which is clearly not representative on important variables for understanding the fatality of COVID-19 – such as income and race/ethnicity?

    How does one simply extrapolate uncertainties from the Santa Clara data to calculating uncertainties across a much more varied domain?

    • Andrew says:


      I agree. We tried to be precise about that: “Using an estimate of the number of coronavirus deaths in the county up to that time, they computed an implied infection fatality rate (IFR) of 0.12–0.2%.” Such a computation is only as good as the data and assumptions that go into it.

      • Joshua says:

        I think that Julien’s comment above is very much related. “Second level propagation of uncertainty” seems like it might be a good way to describe my criticism (not of your work, to be clear).

        I think it would be nice to see an analysis that would include a continuous measurement for the probabilities of representativeness along different metrics.

        I read through Iaonnidis’ meta-analysis of the IFR. In it, he basically handwaves to the uncertainty or representativeness of the Santa Clara study. He mostly talks about why their seroprevealnce would be unrepresentatively low (handwaving to less infectiousness among people who are of higher income)…but there’s no quantification.

        I happen to disagree with his conclusion about the direction of impact from the unrepresentativeness of the Santa Clara data. I think that their findings would likely be biased high, when considering infection rate alone (and not infection fatality rate which would likely be biased even higher, IMO).

        But the bigger problem is that what he provides is a vaguely supported binary description. He only gives us his conclusion for high vs. low. What there should be, IMO, is a continuous scale along any variety of important metrics where we could see what the impact of the uncertainty for each of those metrics would be on the conclusion.

        For example, I would like to see how would their infection rate calculations change based on the impact of income on infectiousness?

        As you wrote about the need for this kind of comprehensive data collection w/r/t clinical trials, I see the same kind of need here. Without that kind of data, I have no way of actually knowing whether or not Ioannidis is putting his thumb on the scale when he says that the Santa Clara data is biased low w/r/t extrapolating infection rates.

        • Andrew says:


          The summary numbers from that group for the Santa Clara study are very sensitive to the adjustment they do, which upweights the respondents from the parts of the county that are not near Stanford. So the most important selection bias will arise if people from those areas were more likely to come and get tested if they had reason to believe they’d been exposed to the virus.

    • My understanding is that similar studies were to be conducted in other parts of the US to ascertain profiles of COVID19ers. So maybe they can be designed to include more crucial data; including the individual symptom profiles.

      • Joshua says:

        Sameera –

        > My understanding is that similar studies were to be conducted in other parts of the US to ascertain profiles of COVID19ers.

        I watched a video of Bhattacharya talking about their study with MLB employees – conducted well after the Santa Clara study. In it, he handwaves to the economic profile (not the symptom profile, to be sure) of the participants (as a way to justify his statement that their finding of 0.7% seroprevalence is lower than the real seroprevalence in participants’ surrounding communities) – but gives no actual quantification.

        Yes, the results of these studies should be provided in the context of detailed data collected on the participants. Without those data, their reports are only suggestive, at best, and actually might just be very misleading.

    • confused says:

      >>which is clearly not representative on important variables for understanding the fatality of COVID-19 – such as income and race/ethnicity?

      I agree there are tons of issues with the Santa Clara study, but I am a bit uncertain about the emphasis on this specifically. Isn’t it more likely that these are basically proxies for occupational exposure, prevalence of some underlying conditions (e.g. association of obesity & other diet-related issues with poverty), and perhaps how quickly people get healthcare?

      I am more and more thinking that talking about one IFR is really misleading, however. If nursing homes or similar facilities in a county get infected, the IFR will be far, far higher than if a bunch of young people get infected.

      • confused says:

        On further thought, talking about an overall IFR can be somewhat useful for policy purposes – “if we did nothing and let this go to herd immunity, how many deaths would we expect in this state/nation” kind of thing.

        But the risk to any particular individual is likely to be very different.

        Back in late March there were some well-intentioned attempts to emphasize that this is a risk to everyone (because many younger people were not taking the risk seriously, e.g. Spring Break). But I think this kind of backfired, because it’s clear that the risk – while definitely not zero – is vastly less in younger people, more comparable to risks that people regularly accept without much concern, and because the huge disasters that the media predicted in Florida and Texas because of Spring Break didn’t occur. That made the “it’s just a flu / everyone panicked over nothing” argument sound more plausible.

        If the message had instead been “if you are, or live with, someone over 60…” that might have been more effective.

        • David Young says:

          I agree with this. The AIDS messaging that everyone is at risk I think turned off a lot of people because it was obviously designed to hide the obvious fact that some were 100’s of times more at risk than virtually all others.

        • Joshua says:

          confused –

          > That made the “it’s just a flu / everyone panicked over nothing” argument sound more plausible.

          I see this kind of argument made in the climate wars very frequently, and I get that there’s a certain common sense logic to it…along the lines of “If I see someone overselling something I get skeptical.”

          The only problem is, there’s very little evidence of that actually a process that explains much w/r/t climate change. If you look at who believes what about climate change, there is an overwhelming political/ideological signal. The driving force isn’t really the evidence or even how the evidence is provided to the public. It isn’t what climate scientists say. It is about identity, and politics mediates the relationship between identity views on climate change.

          It’s the same thing with the “lockdown.” Views on government mandated social distancing are almost uniformly aligned with identity (and group), mediated by political orientation. Of course, there are exceptions, but they are rare.

          Many against the “lockdown” like to point to “alarmist” messaging as the reason for their being against “lockdowns.” Some, of course, also claim that they’re only protecting their “freedoms” against “tyrants.” But those same people often favor highly authoritarian government action when their identity aligns in that direction. For example, they will support Trump punishing states for passing legislation for mail in ballots. Probably the best example of this phenomenon can be seen in how Republicans went from strong support for an individual mandate to viewing an individual mandate as complete tyranny once a mandate threatened to be part of a successful Obama program. Of course, we could point to similar patterns among demz.

          The debate about government mandated social distancing is necessarily complicated. It involves a complex tradeoff between protecting against opposing kinds of risk. There is no right answer. Although people on both sides like to blame the other side for forcing their intolerance of the other side, it’s rarely actually true. It’s really about how we’re all fitting these kinds of inherently complex issues into a polarized frame to make ourselves feel better about our group by demonizing the other group.

          Think of “Keep the government’s hands off my Medicare” if you want another simple little example of what I’m talking about.

          • confused says:

            Sure, in our highly polarized times, a lot of people align very strongly along political/ideology lines.

            However, I’ve seen polls (e.g. on that show opinion on COVID social distancing measures/lockdowns is not as polarized (ie: political affiliation is not as predictive of someone’s opinion*) as one would expect. I don’t think climate change is entirely equivalent, since that’s been in the public eye for 15-20 years now, so it’s had time to “harden” and get incorporated into the very binary political discourse.

            There’s also a huge confounding factor here: because of urbanization/population density, nearly all the states that have been hit hard by COVID are Democrat-leaning. It’s pretty easy to question all the fear when no one you know has had COVID.

            There’s actually *less* polarization on this issue than I would have expected 2 1/2 months ago (for example, I kind of expected some of the governors of the very rural red states to laugh the issue off entirely and not do -anything-. But even South Dakota took significant measures – just not as much as other states.)

            *specifically, while nearly all the strong opposition to the lockdowns is among Republicans, a great many Republicans are more supportive

        • anonymous says:

          “To emphasize that this is a risk to everyone.”

          I’m sure there are plenty of instances where media coverage got this wrong. But I’m also sure tons of people weren’t saying lockdowns are required for everyone because everyone faces equal risk. Rather, they were saying lockdowns are required for everyone because everyone can spread the disease.

          Your focus on risk to individuals completely misses the most pernicious aspect of this disease: a high degree of infectiousness and asymptomatic spread. Deciding whether to impose lockdown is not solely a function of the IFR. The singular focus on the debate about IFR and individual risk is, in the worst cases of partisanship visible on Twitter and the like, an ideological ploy intended to distract from the full scope of the reasoning for lockdown. In the worst instances, it can be a way of saying: the lives of the elderly don’t matter enough when the individual risk to everyone else is so low.

          • confused says:

            I see what you’re saying, and that probably was the intent… but I still think it was poorly done.

            Individual-level risk does make a big difference, because not all activities/populations have the same age distributions. For example, colleges could rely on TAs (who are generally quite young) rather than (usually older) professors to teach lectures and do other in-person interactions… Combined with messaging about the risks of spreading it to your family over holidays, this might have been OK.

            I think we could have done *better* than we actually did in terms of protecting the elderly with *less* impact to the lives of the working-age and younger population.

            For example, I am pretty young for these purposes (under 35); my workplace skews pretty young, and the vast majority of us live either alone or with partners/spouses and possibly children, not with our parents/grandparents. Even if we’d all caught it, if we knew not to visit elderly relatives, the risk of transmission to high-risk people would not be high.

  4. jd says:

    Really cool!
    As you clearly point out in the article, it seems that the model is particularly influenced by the choice of the prior on the sigma for sensitivity (and specificity). So much so, due to lack of data in that area, that it seems almost like a tuning parameter by the user (correct?). While normal(0,1) prior seems excessively wide, how did you choose normal(0,0.2)? Using your example with the point estimate of 1.36, then with a normal(0,0.4) prior, there’s roughly 2/3 chance that sensitivity in a new experiment is 0.72-0.85. Couldn’t this also seem reasonable? I am curious to know how in the absence of data the user makes a good choice on this prior, especially when the model results depend so heavily on it? Maybe use sensitivity and specificity ranges from experiments of tests based on sars or flu? Or maybe run the model with a range of priors to display?

  5. Joseph Candelora says:

    I think a word or phrase in this sentence got dropped:
    “The usual intuition suggests that the conditional probability should be approximately 95%, but it is actually, as can be seen from a simple calculation of base rates, as suggested by Gigerenzer et al. (2007).”

    Something like “…, but it is actually [much lower],…”

  6. Sander Greenland says:

    The problem of tests of unknown sensitivity and specificity, including use of background information, has a health and medical statistics history going back to the 1950s. That literature was mainly frequentist with allied sensitivity analyses up to the turn of the century, when Bayesian solutions became prominent. Among the many, many publications that would apply to the present problem as a special case are
    2003) Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments (book) at
    (the 1-star review actually says the book is excellent).
    2006) Cole SR, Chu H, Greenland S. Multiple imputation for measurement error correction (with comment). International Journal of Epidemiology, 35, 1074-1082.
    2007) An elementary illustration of simulation analysis for estimating a proportion given misclassication is in Jurek AM, Maldonado GM, Greenland S, Church TR 2007. Uncertainty analysis: An example of its application to estimating a survey proportion. Journal of Epidemiology and Community Health, 61, 651-654.
    2007) Technical review of frequentist and allied sensitivity methods: Greenland S 2007, Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification. Journal of Statistical Planning and Inference, 138, 528-538.
    2008) Greenland and Lash, Ch. 19 p. 372-375 of Modern Epidemiology 3rd ed. 2008 illustrates simulation analysis with misclassification.
    2009) Some Bayesian approaches to the same problems using maximum-likelihood for missing data and simulation are illustrated in Greenland S 2009. Bayesian perspectives for epidemiologic research. III. Bias analysis via missing-data methods. International Journal of Epidemiology, 38, 1662-1673, doi: 10.1093/ije/dyp278 (very important corrigendum 2010 International Journal of Epidemiology, 39, 1116).

    There are many other articles and book chapters on Bayesian and frequentist methods that would apply here (the articles I’m on I can supply on request). I suppose the lesson is that, unless a method appears in popular distributed software, it is as if it doesn’t exist, and the problem keeps getting solved de novo until such software appears.

    • Andrew says:


      Yes regarding the software. That’s a big advantage of Stan. Anyone can run our model out of the box, play with the parameters, add new data, etc. When I saw that Santa Clara paper the very first time, it was clear to me that a Bayesian analysis would be the most direct way to account for uncertainty in the calibration data.

    • Funko says:

      So I guess the abovementioned cut-off free approach is “well”-known? Seems unlikely that no epidemiologist ever tried this before.

      (I’ll have a look at the referenced papers later. Thank you for providing them :))

    • pophealth says:

      Hi wanted to share the number of times each paper above was cited using Google Scholar. Not really making a point just thought it would be useful.

      1. Gustafson – 501
      2. Cole – 193
      3. Jurek – 15
      4. Greenland – 27
      5. Rothman – 21,975
      6. Greenland – 76

      • When I do a search for the word “Rothman” on this page, only your use of it comes up. Which article are you referring to there?

        • Likely this one as Rothmann is the first author on the book – Greenland and Lash, Ch. 19 p. 372-375 of Modern Epidemiology 3rd ed. 2008 illustrates simulation analysis with misclassification.

          • pophealth says:

            Yep Sorry I was referring to the Modern Epidemiology by Rothman, Greenland and Lash (3rd edition). The citations count is for the entire book, so can’t really tell the story. I would highly recommend reading it if you are getting into EPI methods.

      • Sander Greenland says:

        While those numbers may not be 100% complete, for articles they do seem to reflect the relative obscurity/inaccessibility of the journals (epidemiology is a relatively small field, and its leading journals have a fraction of a percent the circulation of leading med journals; plus Jurek et al. was in a minor one). Regarding the books, Rothman et al. was a runaway hit compared to expectations for advanced textbooks in the field, copies sold well into tens of thousands – and I doubt most of those citations were to Ch. 19.

  7. Guido Biele says:

    Looks as if a shorter and modified version of the preprint is now published in JAMA:
    (At least the overlap in topic and authors suggests so)

  8. zbicyclist says:

    Not related to this thread’s study, but I thought this would be of interest to those trying to follow COVID-19 statistics.

    “… one of my friends sent me an underground article about Covid in TZ [Tanzania]. Evidently government workers are sneaking bodies out [i.e. out of hospitals] of people who die and burying them in the middle of the night. When I tried to research this in the regular way, one website pointed out that the numbers that they are reporting to WHO haven’t changed any in 3 weeks…….”

    This is from a good friend who’s a physician who’s spent her career serving mostly in third world assignments, including several months in Tanzania within the last 5 years.

    Say what you want about the Swedish public health approach to this pandemic, but I have more faith in the numbers numbers coming out of there than I do of the numbers coming out of most places, including the U.S.

  9. Dave #2 says:

    It’s also important to point out that only a little over half of the positive results they found were corroborated using an ELISA assay (the most accurate and trusted), by one of the two Stanford pathologists that were asked to test the validity of the premier biotech test kits. Who both also refused authorship/affiliation with the project and constantly argued with Bendavid et. Crew that the tests were too inaccurate.

  10. Harlan says:

    We just posted a paper related to this and how to adjust for non-representative testing:
    “Bayesian adjustment for preferential testing in estimating the COVID-19 infection fatality rate: Theory and methods”

    Any feedback most appreciated!

    • Joshua says:

      Harlan –

      > We just posted a paper related to this and how to adjust for non-representative testing:

      Yay! Music to my virtual ears.

      It’s nice to see you talk about the uncertainties related to reporting of death: why wouldn’t you try to I corporate that into an analysis of uncertainties?

      Likewise, what about factors such as SES and race/ethnicity (with associated favors such as access rh healthcare, comorbidities, etc.). Ever since I saw the Santa Clara study I’ve been gobsmacked that they would try to extrapolate from infection rates in Santa Clara to national level *fatality rates*, not only jumping across categories but in top of that, extrapolating from samples that aren’t representative on variables that are so clearly predictive of health outcomes.

      What did you mean by “ecological bias”? Are you referring to the aspects I’m questioning?

      • confused says:

        I don’t think it’s really a jump across categories.

        If you know the number of deaths in Santa Clara County, then a seroprevalence can give you total number of infections, and therefore a fatality rate.

        The problem is representativeness/bias instead, I think.

        • Joshua says:

          confused –

          I call it a jump across categories because the uncertainties are different in the two areas, respectively. You can’t just go from one to the other, imo. For example, SES would have a different effect on infection rate that it would on fatality rate.

          So if you calculate the effect of SES on infections in Santa Clara (which they didn’t even bother to do), that doesn’t tell you anything about how SES moderates the relationship between infections in Santa Clara and fatality nationally.

          The problem is that they act like there’s no category jump. They act as if it’s not problematic to go from infection rate in Santa Clara to fatality nationally – as if it’s just a simple extrapolation based on the size of the numbers involved.

          So discount my syntax due toy lack of expertise. Maybe category jump isn’t the correct term. But my point is I don’t see how they justify a simple transition from the one to the other. It seems facile to me.

          • confused says:

            Ah, ok, I see what you mean. That you can’t assume fatality rate would be the same elsewhere even if the infection rate is the same.

            I agree with that.

            But I still think the larger question is selection bias and false positives, because if the IFR is really 0.12% or even 0.2% *in Santa Clara County*, that still makes a huge difference to the overall picture, because Santa Clara County probably isn’t *that* much of an outlier.

            I am kind of skeptical of SES, in and of itself, having that much of an effect – what’s the mechanism?

            I’m not denying a strong *correlation* with SES – but I’d think it’s more of a proxy for actually explanatory variables. And I don’t know that we’d expect Santa Clara County to be much better off than a lot of the Western US, except that we have a serology study showing a really low number.

            If 0.2% is really right for Santa Clara County, it must be because very few elderly people got infected, or something, because otherwise one would expect it to be significantly lower than *that* in, say, Utah (much younger population).

  11. Roy says:

    Only partially related, don’t know if you have seen this discussion on the value of “k” (the degree of patchiness) for COVID-19:

    Claim there (not meant as an evaluation either way) is that is relatively low – that is incidence is very patchy, suggesting a few people have infected a lot of people and most people very few (this will make Taleb very happy as he has been writing a lot that epidemics are fat-tailed and therefore the average values of R for example aren’t very informative).

  12. Anon says:

    The main authors of the seroprevalence study recently gave a talk last week on their work (and had a Q&A session where they responded to some questions/comments). I think many of the people in this thread might find it an interesting watch.

    The following is a link to the recording:

  13. A Country Farmer says:

    > With individual-level symptom and test data

    Did you ask for this data from them?

    • Andrew says:


      I did ask them and they said they do not have permission to share the data from their study. But that’s ok. But they or anyone else are free to adapt our code and fit whatever models they want.

  14. Joshua says:

    Great video about shaming and covid-19 seroprevalence sampling.

    In particular, really interesting point at about 45 minutes in as to whether the Santa Clara authors could even legitimately provide confidence intervals for convenience sampling!

    Also, at about 28 minutes, great example of the problem with convenience sampling.

    Also, good overview of the issue with base rate problems with the testing methodology (starts at around 38 minutes, killer chart at around 39 minutes).

    • Joshua says:

      Wow. At about 57:30 in, they discuss the implications of non-participation to the Santa Clara study use of Facebook ads (7% participation) for recruitment. Bottom line, considering non-participation, seroprevalence range goes from 0.1% – 93.1% – if you don’t assume consistency between participants and non-particioants.

    • Andrew says:


      You can legitimately provide confidence intervals for convenience sampling. Yes, these intervals rely on assumptions, but that’s the case with all real-world surveys. Those political polls we see in the newspapers . . . they typically come from surveys with 7% response rates.

      • Joshua says:

        Andrew –

        They stress methodology to improve sampling to help address the non-participation issue – such as stratified sampling.

        I dunno. Watching the experts on that video didn’t lessen my prior that preprints of studies based on Facebook recruitment should not be used to launch a massive publicity effort to influence public health policy during a pandemic.

        • Andrew says:


          That study had lots of problems, for sure, and I don’t think the authors knew what they were doing with the statistics. I would not use that to draw general conclusions regarding what can be learned from surveys with 7% participation.

      • Joshua says:

        Andrew –

        > Those political polls we see in the newspapers . . . they typically come from surveys with 7% response rates.

        Am I wrong to assume that pollsters have often done some research into the effect of the non-participation (i.e., the representativeness of the participants)?

        In the Santa Clara study they had nothing along those lines. But that didn’t stop them from offering seat of the pants speculation (that not surprisingly lines up with their stated priors).

  15. Joshua says:

    Lol. Shaming = sampling.

    Although shaming is actually appropriate given the sampling

  16. Roy says:

    Okay these was posted on twitter ( –

    A lot of government reports from European countries on seroprevalence to SARS-CoV-2 this week and they all show the same – it’s low.

    Spain ~5%
    Italy ~5%
    Sweden ~5%
    Denmark ~1%
    Norway < 1%

    If you scroll down the links to the sources are given. I am not in a position to evaluate these data, but if true they are pretty consistent. Even if we just accept the high end, and say 5%. This does not necessary mean the same is true in the US, as the US clearly has done much worse in dealing with the virus than almost any other country.

    On a side note, has anyone looked at the studies that tried to analyze how many deaths there would have been if shutdowns had started one or two weeks earlier? If they are even anywhere in the ball park of truth, it is pretty damning.

  17. W Bowman Cutter says:

    I’d like to try a hierarchical bayes model using Stan on some real estate data but I don’t have experience with either. Is there a good paper that goes through all the details-preferably social science- that would give me a model to copy and learn from?

  18. Doug Johnson says:

    For folks who would like to use some of these methods in their work but don’t have the patience for the nitty gritty of Stan, I did a very similar analysis with implementations in both direct Stan code and brms a couple of weeks ago. For most people, adapting the brms code will likely take much less time. I have included links to the git repo and a blog post summarizing the analysis below.

  19. Dan says:

    Quoting from your recent [preprint](,

    > The estimates from Bendavid et al. (2020a) were controversial, and it turned out that they did not correctly account for uncertainty in the specificity (true positive rate) of the test.

    Per Wikipedia, the true positive rate is actually the [sensitivity](, and specificity is the true negative rate. This is consistent with all other sources I’ve ever read. It may be worth revising before sending to a preprint or a journal or elsewhere.

  20. Josh says:

    The text for figure 1 has a typo. It says “(b) histogram of posterior simulations of γ”, but plot (b) is a histogram of π. The main text confirms that (b) should be a histogram of π, so the error is limited to the figure text.

  21. Koen says:

    A related question, if you would have sampled multiple individuals from the same household in a COVID survey, you can of course randomly sample one individual per household and perform MRP as usual.

    But is it also valid to run a multi-level model with household_id and create a fake household_id (level not in the sample) in the poststratification table? If so, would one fake household_id be sufficient, or do you need to explode the number of cells in the population by creating a seperate cell for each household?

Leave a Reply