Skip to content

Open data and quality: two orthogonal factors of a study

It’s good for a study to have open data, and it’s good for the study to be high quality.

If for simplicity we dichotomize these variables, we can find lots of examples in all four quadrants:

– Unavailable data, low quality: The notorious ESP paper from 2011 and tons of papers published during that era in Psychological Science.

– Open data, low quality: Junk science based on public data, for example the beauty-and-sex-ratio paper that used data from the Adolescent Health survey.

– Unavailable data, high quality: It happens. For reasons of confidentiality or trade secrets, raw data can’t be shared. An example from our own work was our study of the NYPD’s stop and frisk policy.

– Open data, high quality: We see this sometimes! Ideally, open data provide an incentive for a study to be higher quality and also can enable high-quality analysis by outsiders.

I was thinking about this after reading this blog comment:

There is plenty to criticize about that study, but at least they put their analytic results in a table to make it easy on the reader.

Open data and better communication are a good thing. Also, honesty and transparency are not enough. Now I’m thinking the best way to conceptualize this is to consider openness and the quality of a study as two orthogonal factors.

This implies two things:

1. Just because a study is open, it doesn’t mean that it’s of high quality. A study can be open and still be crap.

2. Open data and good communication are a plus, no matter what. Open data make a good study better, and open data make a bad study potentially salvageable, or at least can make it more clear that the bad study is bad. And that’s good.

Leave a Reply