Skip to content

Routine hospital-based SARS-CoV-2 testing outperforms state-based data in predicting clinical burden.

Len Covello, Yajuan Si, Siquan Wang, and I write:

Throughout the COVID-19 pandemic, government policy and healthcare implementation responses have been guided by reported positivity rates and counts of positive cases in the community. The selection bias of these data calls into question their validity as measures of the actual viral incidence in the community and as predictors of clinical burden. In the absence of any successful public or academic campaign for comprehensive or random testing, we have developed a proxy method for synthetic random sampling, based on viral RNA testing of patients who present for elective procedures within a hospital system. We present here an approach under multilevel regression and poststratification (MRP) to collecting and analyzing data on viral exposure among patients in a hospital system and performing publicly available statistical adjustment to estimate true viral incidence and trends in the community. We apply our MRP method to track viral behavior in a mixed urban-suburban-rural setting in Indiana. This method can be easily implemented in a wide variety of hospital settings. Finally, we provide evidence that this model predicts the clinical burden of SARS-CoV-2 earlier and more accurately than currently accepted metrics.

This is a really cool project. Len Covello is a friend from junior high and high school who I hadn’t seen for decades, Yajuan is at the University of Michigan and collaborates with me on various survey research projects, and Siquan is a graduate student in biostatistics at Columbia. Len is a doctor in Indiana, and he contacted me because his hospital was performing coronavirus tests on all their incoming patients. He did some internet research and had the thought that they could use multilevel regression and poststratification (MRP) to adjust the sample to be representative of (a) the population of people who go to that hospital, and (b) the general population of the intake area. Both (a) and (b) are of interest, and they both involve smoothing over time. It should be better to do better than the raw data by performing this adjustment, as it should fix at least some of the problems of the patient mix varying over time. We did the analysis in Stan, adapting the model from my recent paper with Bob Carpenter.

This work is potentially important not just because of whatever we found in this particular study but because it could be done at any hospital or hospital system that’s doing these tests. Instead of just tracking raw positivity rates, you can track adjusted rates. Not perfect, but a step forward, I think.


  1. jim says:

    Racial categories: shouldn’t latinos/Hispanic be shown separately? It seems to be the most heavily impacted group.

  2. sLAN says:

    ” calls into question their validity as measures of the actual viral incidence in the community ”

    a bit late to be concerned about the ‘actual’ viral incidence of COVUD-19, after turning society upside down based upon the official but now apparently ‘questionable’ data

    • Andrew says:


      Covid is not over, and there will be future epidemics as well. It’s “a bit late” to change past policies but it’s not too late to consider how to use data to inform current and future policies.

    • It’s not “questionable” as a measure of the number of people impacted or the severity of impact, just as a measure of the number of people who are viral positive.

      The data on hospitalization and deaths is solid enough (it’s actually an underestimate of harm caused by the virus as people die at home, or suffer at home without hospitalization), it’s a major **under estimate** of how bad the viral spread is. If anything we should have turned society even farther upside down, to drive those deaths downwards. Thanks Sturgis Motorcycle Rally :-(

      • Len Covello says:

        “Questionable” refers to the utility of using positive test rates to predict future hospitalization and death, and in our area, as the paper says, numbers of positive tests were a decent predictor of those outcomes. Positivity, on the other hand, was pretty crappy. Neither was a good leading indicator of impact. So actually, they are “questionable” as metrics of the severity to come. None of us would question the general notion that high positive test numbers or positivity rates have, in the event, been associated with huge health impact. It was precisely to predict how bad things might get next week or next month that drove us to find a reliable metric or to at least convince ourselves that current metrics were helpful. As it happened, we were better able to anticipate surges; unfortunately, not to prevent much of their impact.

    • Len Covello says:

      Indeed, various agenda driven actors have riffed on whether test numbers drive case numbers or what metric levels should be actionable to limit social and economic commerce. They wouldn’t be statistics if people did not try to twist them to fit preconceived notions. So, it seems to me, ask the question. Are positive case numbers or positivity good metrics to predict clinical burden?

      I like our metric. It seems to simulate randomness and follows clinical impact (ER visits/hospitalizations) incredibly closely, and probably better than the public data. But positive case numbers reflected impact too, albeit somewhat delayed. Positivity was a bit off, though. So I’d say after questioning validity of case numbers, we find them to be pretty good indicators of when to “turn society upside down.” I think you are over reading our criticism. I think we should have pushed harder for random proxy measures early on, but did ok with our biased state data.

  3. Robert Kubinec says:

    This approach is intriguing to me as I’ve been working on a COVID tracking model that can make use of MRP-adjusted data. See here for a reference:

  4. Dale Lehman says:

    As vaccinations are spreading (poor choice of words, perhaps), your assumption of a fixed ratio of asymptomatic/symptomatic people is likely to change. How easily can you adapt your model to account for this? I suppose it is easy enough for a hospital to record who has and has not been vaccinated when they are admitted, but getting data like that from the area population seems more troublesome to me.

    • Martha (Smith) says:

      Dale said,
      “As vaccinations are spreading (poor choice of words, perhaps)”

      This brings up another problem related to COVID vaccinations: The availability of vaccines. I don’t know if other places are as messed up as Texas in getting vaccines into people’s arms, but here’s what my experience in the past few days is:

      I looked up a local grocery chain that has a pharmacy and has given vaccinations in the past. Their website said that they would be giving the vaccine, which was now authorized for vaccinating people in priority category 1b (people over 65; I am over 75, so qualify). But the website also said said they didn’t have the vaccine yet, check back later.

      So I looked up local pharmacies. One said on their website that they were giving the vaccine. They had an online sign-up form, which I filled out and submitted. I got a reply by email saying that I was on their list, but they didn’t have the vaccine yet, and didn’t know when they would get it.

      So I looked up more on the web and found that the state is handling the allocation of vaccines, and they (the state) would be giving more information Monday.

      I checked back Monday — the news said that the state would only be sending vaccine to organizations that had the resources to give 100.000 doses or more.

      So I looked up the grocery chain again. They were also making a list of people to vaccinate, but still had no vaccine yet.

      Then I got an email from my university’s Retired Faculty and Staff organization, saying that they had persuaded the School of Nursing to extend their vaccination program for Student, Faculty, and Staff to retired faculty and staff. So I went to their link and filled out their information requested. I got a reply that I had been put on their waiting list, but they didn’t have any vaccine yet.

      Are other states in similar situations, or is Texas behind the rest?

      • Dale Lehman says:

        With all of the controversies over one dose or two, it sounds like you (eventually) will get 4 doses. What are you going to do when they all suddenly tell you they have the vaccine, come in and get it? Can you transfer your place to someone else? Or sell it (after all, I am an economist)?

      • Joshua says:

        Similarly chaotic situation here so far from what I’ve seen in the Hudson Valley.

        As near as I can tell it’s catch as catch can. Been trying to sign up for myself and my 91 year-old mother-in-law. No luck thus far. What a country!

        • Martha (Smith) says:

          I’ve had some communication from relatives in other states. A cousin in South Carolina seemed to have no difficulty getting vaccinated. Another cousin in Michigan said she is on several waiting lists. Another in Arizona says things are moving very slowly there. Still another in Michigan says the providers say, “Don’t call use; we’ll call you.” And one in California says only health care workers are getting the vaccine, and they are “lined up by the hundreds”.

          • Joshua says:

            A client who’s young and a professor at a healthcare providing university in in Oregon, and working remotely, was told to get the vaccine or it would be potentially be thrown out – when she suggested that it should be given instead to someone at higher risk.

            I have a condition that potentially puts me at a high level of risk. I asked a healthcare provider when I’d be able to get a vaccine. She said to her knowledge they will be distributing the vaccine by age irrespective of medical history – which means I’m not eligible yet as I’m not 65.

            Im willing to grant that this is an enormously huge logistical task, of basically an unprecedented nature, and I don’t want to be an old man yelling at clouds. I expect FUBAR and I’m sure people are doing the best they can.

            On the other hand, we’ve been startlving our public health infrastructure of resources for decades. That, was not inevitable or merely a product of immutable forces.

    • Len Covello says:

      Outstanding question. Hadn’t thought about it explicitly.

      We have been testing for IgG in parallel and will, in the next few weeks, be able to track natural vs vaccine acquired immunity. We will be able to analyze some of the questions surrounding the likelihood that the vaccines will prevent symptoms without preventing asymptomatic infection.

      My gut sense is that the metric will probably hold up reasonably for recent trends but may not compare easily over the life of the epidemic. I can think of some ways to explore the effect of vaccine aggrandizement of asymptomatic infection but am not sure that ratio shift would be open to normalization without some good old fashioned conjecture.

  5. MJM-WA says:

    One of the topics that has garnered attention from data analysts looking through the US C19 data has been the subject of whether hospital patients have been admitted for treatment of CLI versus addmitance for unrelated reasons and then testing C19 positive. We also have known for a long time that hospitals are notorious for nosocomial infections. Given the possibility of such infections —and what I would presume is repeated testing of patients while in the hospital— doesn’t this create an issue for such an analysis??

    • Len Covello says:

      Hi, thx for the question.
      These are all patients getting elective procedures—including CT’s, stress tests and the like—and are strongly skewed to routine outpatient procedures done in a surgery center environment. The hospital system actually tracks data on nosocomial infections and the numbers are negligibly low for this population. And of course, implicit in the above, these are not people being repetitively sampled as they are strictly outpatient. So THIS population is clean.
      If the question is regarding the ratio of these asymptomatic positives to symptomatic positives and that the latter group is tough to nail down with the inpatient/repetitive testing/nosocomial skew problem, I would say, yes,exactly! Tough to trust those numbers. Our point would be that we are anticipating that the TRUE incidence of asymptomatic/symptomatic is fixed per demographic, never mind the MEASURED symptomatic rate, documented in the literature, subject to precisely the skew you identify.

  6. Anoneuoid says:

    SARS-CoV-2 has clearly shown the ability to spread throughout the population via both asymptomatic and symptomatic infection

    Isn’t this a pretty big claim to have no references?

    Specificity is near 100%, with false positives likely generated only by crosscontamination or switched samples.

    Depends if you want to define a case of covid as the presence of an RNA fragment vs presence of a virus that can cause illness or be transmitted to others. The latter may merit some kind intervention while the former does not.

    It can be observed that at Ct = 25, up to 70% of patients remain positive in culture and that at Ct = 30 this value drops to 20%. At Ct = 35, the value we used to report a positive result for PCR, <3% of cultures are positive.

    Another thing to keep in mind is the threshold for a positive test is *supposed* to change based on the presumed level of circulating virus (become more sensitive and less specific when high levels are thought to be circulating):

    Users of RT-PCR reagents should read the IFU carefully to determine if manual adjustment of the PCR positivity threshold is necessary to account for any background noise which may lead to a specimen with a high cycle threshold (Ct) value result being interpreted as a positive result… In some cases, the IFU will state that the cut-off should be manually adjusted to ensure that specimens with high Ct values are not incorrectly assigned SARS-CoV-2 detected due to background noise.

    Presence of the fragments could still be used as a proxy for how much virus has recently been in a community but I’d think testing wastewater would be an orders of magnitude cheaper and safer way to assess that.

    Also, I’ve looked (not read) through the paper twice and I am still not sure what gold standard this model was compared to in order to determine the performance mentioned in the title.

    • rm bloom says:

      Asymptomatics: Navy Ship Roosevelt Study. Marine barracks study. At least these two to begin.

    • Len Covello says:

      Some quality points.
      I’m not too worried about the assertion of spread via asymptomatic carriers not being referenced because I don’t think there’s anyone in epidemiology world that thinks infections can’t be spread asymptomatically. It has become dogma. But for completeness, I guess.
      RNA fragments etc, absolutely. But the test is still picking someone up who recently had the infection, even if not symptomatic then. We’re not actually worried about how contagious a person might be at the time of the test; more to identify them as an asymptomatic infection. I believe the ratio argument (allowing the metric to be a random proxy) holds for viral incidence trends if we assume that for a fixed demographic, the late shedding nonviable RNA ratio to live virus is likely fixed as well. I think that probable, but certainly can’t know it.

      • Anoneuoid says:

        I’m not too worried about the assertion of spread via asymptomatic carriers not being referenced because I don’t think there’s anyone in epidemiology world that thinks infections can’t be spread asymptomatically. It has become dogma. But for completeness, I guess.

        Last I looked into this a few months ago the asymptomatic transmission was based on around a dozen questionable contact-traced cases then a bunch of models assuming it is happening.

        • Joshua says:

          My sense is that it’s considered axiomatic mostly from the simple observation that there’s a significant presymptomatic period, combined with lots o’ people getting infected where they can’t find obvious exposure to people who were symptomatic, combined with evidence such as this;

          And this:

          • Joshua says:

            … will make more sense when the parent comment gets through moderation.

            Of course, there is plenty of information available on the Interwebs that it’s certain that there’s no asymptomatic transmission. From the same folks who are pushing the “casedemic” and “it’s all about creating panic from false positives” nonsense.

          • Anoneuoid says:

            Neither of those papers cite any evidence of asymptomatic transmission. The first one even says:

            It is important to note that detection of viral RNA does not equate infectious virus being present and transmissible.


            The second says ct values above 30 “should not impact public health decisions”:

            We could observe that subcultures, especially the first one, allow an increasing percentage of viral isolation in samples with Ct values, confirming that these high Ct values are mostly correlated with low viral loads. From our cohort, we now need to try to understand and define the duration and frequency of live virus shedding in patients on a case-by-case basis in the rare cases when the PCR is positive beyond 10 days, often at a Ct >30. In any cases, these rare cases should not impact public health decisions.


            That is about 90% of the positive tests.

            Of course, there is plenty of information available on the Interwebs that it’s certain that there’s no asymptomatic transmission. From the same folks who are pushing the “casedemic” and “it’s all about creating panic from false positives” nonsense.

            I see, it was a low effort post meant to push some political agenda…

            • Joshua says:

              What’s the “political agenda” behind pointing out that there are papers supporting the conclusion that asymptomatic spread occurs?

              • Anoneuoid says:

                pointing out that there are papers supporting the conclusion that asymptomatic spread occurs?

                You did not share such a paper. In fact the paper you shared explicitly said not to draw the conclusion you drew from it. Now even after that direct quote was pointed out you continue to assert the same wrong conclusion. It is just misleading or, at best, wasting the time of others when you do this.

              • Joshua says:

                The papers I linked support support the conclusion that asymptomatic spread occurs.

                That you want to hang your hat on a misleading and semantic hair-splitting argument – that they don’t prove asymptomatic transmission takes place – is entirely your right. But that doesn’t make it any less of a misleading and semantic hair-splitting argument.

                > In fact the paper you shared explicitly said not to draw the conclusion you drew from it.

                No. It states with appropriate caveats not to draw an over-certain conclusion that I didn’t draw from it, but that you want to foist onto me because you fancy yourself some kind of mind-reader, I guess. This isn’t the first time that you’ve done this. But enough. No reason to clutter Andrew’s blog further with this. You’re perfectly entitled to have your view, even it if is laughably wrong.

              • Anoneuoid says:

                The paper presents no evidence for asymptomatic transmission and does not claim to. In fact it explicitly tells the reader not to conclude what you did.

                I read words, not minds. It is not an exclusive skill that requires genious or mind reading.

                But yes, please stop cluttering the blog.

              • Joshua says:

                Just in case someone reading who doesn’t bother to click through is tempted to think that what you’re saying is true:

                > Our findings suggest that people with asymptomatic COVID-19 are infectious but might be less infectious than symptomatic cases. We also identified that the proportion of close contacts who became infected did not depend on the serology status of the index case. One reason for this observation could be that close contacts tend to live or work with the index case and are exposed because of their regular contact with a person who was infectious before turning seropositive.

                >> Conclusions and Relevance In this cohort study of symptomatic and asymptomatic patients with SARS-CoV-2 infection who were isolated in a community treatment center in Cheonan, Republic of Korea, the Ct values in asymptomatic patients were similar to those in symptomatic patients. Isolation of asymptomatic patients may be necessary to control the spread of SARS-CoV-2.

              • Anoneuoid says:

                Just in case someone reading who doesn’t bother to click through is tempted to think that what you’re saying is true

                What I am saying is true. Neither study shows evidence of asymptomatic transmission. They just say it could be the case but their data cannot really say either way. I agree with them completely.

                How that becomes axiomatic in some peoples minds is a mystery to me. You need to *read* what they say.

              • rm bloom says:

                This is a competent collection of relevant material which I think is helpful:

            • rm bloom says:

              Here is a study, which is really a collection of studies, eliciting the AIP in a wide variety of circumstances, the results — of varying quality — of which are graphed in a useful vertical stack plot.


        • rm bloom says:

          Here is a study, which is really a collection of a large number of studies, each eliciting the AIP in a wide variety of circumstances, the results — of varying quality — of which are graphed in a useful vertical stack plot.

          • Anoneuoid says:

            Could be a good source for them to use. How exactly was it determined asymptomatic transmission occurred vs people being exposed to the same source though?

            • rm bloom says:

              Each one of those sub-studies in the meta-study has to be evaluated separately for its merits or faults. But the authors have tried to bring the differential study-quality out along with the results. It’s a lot of reading. However the other link I posted on the Roosevelt study seems to have been a concentrated longitudinal track kept on 1271 sailors over the course of about 60 days. The difficulty surrounding the whole issue is that “asymptomatic” and “mildly symptomatic” perhaps have a region of overlap that is impossible to disentangle especially if the data are drawn from self-reports. All the same, the existence of such a cohort seems to be reported by careful work repeatedly. The question of whether this cohort is a significant driver of community transmission is also easily conflated with the question of whether “pre-symptomatics” are significant drivers. Indeed it is supposed that they are. That is what makes it so vexatious: transmission is driven to a large extent by persons who do not yet feel that they are ill. Most of them do become ill. Perhaps a small, but not insignificant subset of this cohort never become ill at all. Is the period of transmissibility for the never-symptomatic carriers comparable to the former cohort, of soon-to-be symptomatic? All questions very difficult to answer. In a year there should be better answers; but that is scarce reassurance for the the worries.

            • rm bloom says:

              Longitudinal studies in closed environments (e.g. the Roosevelt ship study, the marine barracks study) have managed to indicate the likely chain of transmission as the outbreak progresses through the population. E.g. two groups of marine recruits only mixed at point A and were otherwise separate; case incidence began in group 1 at time when group 2 was PCR negative; then by mixing positive test results are incident in group 2 at some delay, though at the earliest incidence in group 2, positive cases in group 1 were still asymptomatic. Such chains are elicited in many studies — including those aggregated in the link I attached below. The question of whether the “never-symptomatic” sub-group is a significant source of transmission seems originally — and quite naturally — to have been subsumed by the prior question (and with good reason): whether persons not-yet-symptomatic are significant source of transmission — and the answer to this question seems to be “Yes.”

    • Len Covello says:

      To continue, the power of a positive test (rather than the analytic sensitivity and specificity) changes markedly in low incidence situations as we experienced in the summer in our area and during which we were unable to rely on trending. Too much noise. We were in fact unable to assess the validity of our method for precisely this reason until our metro area was “lucky” enough to experience a fall surge. Only then did our false positive rate become negligible and allow for real analysis. And it turned out our metric predicted clinical impact well in a leading way.

      Wastewater analysis is a cool method and is obviously cheap. I think our metric would be useful to compare with wastewater data to test the power of each. Or wastewater analysis could be compared on a date matched basis with ER metrics or whatever. I’m not aware of that work but would be great to see and absolutely could be superior/cheaper etc. But our metric has been fun and has been inexpensive, convenient, and has worked well for us. So yay that, anyway.

  7. Yes, and hopefully they will pool efforts over hospitals such as running the same analysis scripts then contracting the results and thoughtfully reaching a more informed assessment – like this group is already doing –

    Or see

  8. rm bloom says:

    Not exactly directly on topic, but surely of pressing concern: in responding to one of the Anons. in regard to question of asymptomatic infection prevalence (AIP) I had posted a link to an earlier study on the Navy Ship Roosevelt that followed cases for about 60 days and counted 572/1271 as positive but remaining asymptomatic throughout the study period. In broadest terms, the degree to which asymptomatic infections are acting as drivers is really the elephant in the room. Here is a study, which is really a collection of studies, eliciting the AIP in a wide variety of circumstances, the results — of varying quality — of which are graphed in a useful vertical stack plot.

  9. Harlan says:

    logit(π ) = β + βmale + α1age + α2race + α3county + α4time + α5age∗male,

    “time[i] indices the time in weeks when the test result is observed for individual i”

    Quick question- is time a categorical variable here?

  10. Elin says:

    This really tracks similar to how HIV understanding changed once they started doing routine screenings in hospitals (though in that case it was hugely controversial).

    For two days my zip code had the highest rolling average positivity rate in NYC. I figure that what happened is that some family all tested positive. Then the city council member showed up and had two days of free testing in the park. Now we are back to where we usually have been.

    Meanwhile, well looking at this I noticed that co-op city was amazingly low on positivity and I had all kinds of interesting ideas about why. But then I looked at how many people were tests and realized that it was huge! That area is a NORC (naturally occurring retirement community) and then the whole place is run by one management company, so it makes sense that they would have a lot of testing. But looking at cases and deaths they were worse off. (And their testing rate has come back down)

Leave a Reply to Joshua