Skip to content

Pro Publica Surgeon Scorecard Update

Adan Becerra writes:

In light of your previous discussions on the ProPublica surgeon scorecard, I was hoping to hear your thoughts about this article recently published in Annals of Surgery titled, “Evaluation of the ProPublica Surgeon Scorecard ‘Adjusted Complication Rate’ Measure Specifications.”​

The article is by K. Ban, M. Cohen, C. Ko, M. Friedberg, J. Stulberg, L. Zhou, B. Hall, D. Hoyt, and K. Bilimoria and begins:

The ProPublica Surgeon Scorecard is the first nationwide, multispecialty public reporting of individual surgeon outcomes. However, ProPublica’s use of a previously undescribed outcome measure (composite of in-hospital mortality or 30-day related readmission) and inclusion of only inpatients have been questioned. Our objectives were to (1) determine the proportion of cases excluded by ProPublica’s specifications, (2) assess the proportion of inpatient complications excluded from ProPublica’s measure, and (3) examine the validity of ProPublica’s outcome measure by comparing performance on the measure to well-established postoperative outcome measures.

They find:

ProPublica’s inclusion criteria resulted in elimination of 82% of all operations from assessment (range: 42% for total knee arthroplasty to 96% for laparoscopic cholecystectomy). For all ProPublica operations combined, 84% of complications occur during inpatient hospitalization (range: 61% for TURP to 88% for total hip arthroplasty), and are thus missed by the ProPublica measure. Hospital-level performance on the ProPublica measure correlated weakly with established complication measures, but correlated strongly with readmission.

And they conclude:

ProPublica’s outcome measure specifications exclude 82% of cases, miss 84% of postoperative complications, and correlate poorly with well-established postoperative outcomes. Thus, the validity of the ProPublica Surgeon Scorecard is questionable.

When this came up before, I wrote, “The more important criticisms involved data quality, and that’s something I can’t really comment on, at least without reading the report in more detail.”

And that’s still the case. I still haven’t put in any effort to follow this story. So I’ll repeat what I wrote before:

You fit a model, do the best you can, be open about your methods, then invite criticism. You can then take account of the criticisms, include more information, and do better.

So go for it, Pro Publica. Don’t stop now! Consider your published estimates as a first step in a process of continual quality improvement.

At this point, I’d like Pro Publica not to try to refute these published data-quality criticisms (unless they truly are off the mark) but rather to thank the critics and take this as an opportunity to do better. Let this be the next step in an ongoing process.


  1. Rahul says:

    Isn’t “elimination of 82% of all operations from assessment” by itself a pretty damning criticism of any metric that purports to be generally useful?

    • Z says:

      not if the remaining 18% are representative

      • Rahul says:

        Not sure I understand. Representative of what?

        Why would potential patients similar to the 82% of operations rejected not consult Pro Publica? And if they did isn’t the thrown away 82% what they ought to be basing their decisions on?

        • Alex says:

          Think of random sampling. If they randomly sampled 18% of operations, and those operations are representative of the other 82%, wouldn’t that be ok? If the 18% aren’t representative, then of course there could be an issue depending on how they account for that.

          • Rahul says:

            Sure that would be ok.

            But here they have an *exclusion* criterion that eliminates 80% of the data, right? I’m struggling to understand how / why a surgeon scorecard can drop 80% of surgeries performed from a surgeon evaluation. Isn’t that throwing away 80% info.?

            Maybe I’m not seeing the obvious.

            • Alex says:

              Sure, could definitely be a problem. But like Andrew I haven’t looked into this at all, so I can’t really say. Maybe their exclusion criteria were just to get them a reasonably-sized data set? Or a clean one? It’s on ProPublica to justify it, but it could also be fine.

  2. Mark Friedberg says:

    Possibly of interest: our critique of the methods, with recommendations for improvement

    There was a bit of back-and-forth afterwards…

Leave a Reply