Thanks, I was confused and thought that “the real treatment here” meant the treatment that should be considered in the discussion of the Miller et al. analyis.

In that case the treatment is at the PhD program level. It has the values “include GREs in the selection process” and “do not look at GREs in the selection process”. The time of the treatment would be when that choice was made at each department. Many things would be post-treatment, and some of them could have a meaningful impact on the outcomes like the choices of students of what programs to apply for. Outcomes for one “subject” (department) would also depend on the “treatments” recived by the other “subjects”. Changes in admissions policies elsewhere would change the composition of the pool applicants that would enter the program if accepted. Many complex dynamics could appear. For example, assuming GREs are informative, maybe second-tier programs would see the “information content” of GREs increase when the programs upstream make a less efficient use of that information.

In any case, that would be a very different “treatment” and doesn’t correspond to the “effect” estimated in that paper. It doesn’t seem relevant for the “mistake of adjusting for post-treatment variables” question.

To be clear, I don’t say it’s not a mistake in that case. I say it ain’t necessarily so.

For example, if we wanted to determine if GRE predicts doctorate completion and we had data about two programs with 40 students each

A: 50/50 mix of scores 2 and 3, 50% completion rate for the former, 75% for the latter

B: 50/50 mix of scores 3 and 4, 25% completion rate for the former, 50% for the latter

and the regressions looked like this: https://imgz.org/i8vFtUPX/

conditioning on school seems more appropriate than not doing it.

Of course many assumptions about what we are seeing and what we want to know are required. That’s the point.

]]>@ Carlos The treatment choices are “look at GREs” or “don’t look at GREs” . Both of these choices are now actually in use at different departments. Also some intermediates like “look at GREs if submitted but don’t require them”. I believe some departments have a flat policy that no one is allowed to submit GRE scores.

So in principle one could compare otherwise very similar departments to see which policies are getting better results for the department, not for individuals. I’ve proposed an entirely feasible RCT that departments that can’t decide should form a pooled group and accept random assignments of GRE policy, which would save many argumentative department meetings. Nobody is interested.

Yes, the effect of these admissions policy choices on graduation rate will depend on the program. I discuss that above under “interaction effects”. You could get that the treatment effect depends on things like program rank, theory/experiment balance, etc. Miller et al. used that as a retrospective excuse to include rank in their regressions, but since they didn’t include interactions with rank they introduced collider bias while getting exactly zero sensitivity to those differences in treatment effect between programs.

]]>@ curious

Once more into the breach.

From arXiv:

” so the equal-weighted sum is

close to GRE-P+1.5*GRE-Q, i.e. 1.5*GRE-Q has about the same range as GRE-P. Using data from

Table 2 of (1) its coefficient (GRE-P coefficient +(1/1.5)*GRE-Q coefficient) is virtually identical

in the entire sample (“All Students”) and the three subgroups described

From Sci. Adv.

“Adding the Q coefficient to 1.5 times the GRE-P coefficient [from Table 2 of (1)], we find that the

predictive coefficient of the equal-weight sum is the same to within

a 1% range in the “All Students” total sample and in each of the three subsamples described:”

In one case I multiplied the P coefficient by 1.5 before adding, in the other case divided the Q coefficient by 1.5 before adding. Since the question was whether the result was nearly invariant under choice of subsample, the constant scale factor of 1.5 between these two choices is irrelevant.

Maybe there’s a reason that GRE-Q, based on 9th grade math, is a good predictor.

]]>> The treatment is for the committee to look at the real scores.

What does “treatment” mean? Is there a way, even hypothetical, that a different “treatment” had been applied? I don’t see how, if they look at the real pre-existing scores. Or maybe the alternative “treatment” would be not to look at the scores? If “treatment” is just an empty label that we can attach wherever, what’s the relevance for the analysis?

If the question you care about is “How can an admissions committee pick a set of students who will have a high graduation rate?” then you have as many questions as admissions committee. Each admissions committee wants to pick a set of students who will have a high graduation rate in their program. Wouldn’t then the relevant correlation be conditional on the program? If in every program better scores mean better outcomes then it seems that the mistake could be to look at pooled data and conclude the opposite (Simpon’s “paradox”).

]]>@curious I don’t think that would be needed if there were a genuine randomized input, such as randomly faked GRE scores. For the sort of pseudo-random effects I used to make an informal argument about this, since nothing else was available, yes, you certainly need to think about such possible confounders.

]]>Michael:

To do this properly one must also be able to measure and adjust for the possibility of disparate treatment at the individual & group levels post acceptance.

]]>That’s not quite it. This is exactly the somewhat tricky point that Jamie forced me to think more clearly about back in early 2019. (Seems like another universe.)

The initial question is not “What do you change to increase the likelihood that a given student would graduate?”

It’s “How can an admissions committee pick a set of students who will have a high graduation rate? Does including GREs help pick?”The treatment is for the committee to look at the real scores. Because there were basically no places that didn’t include GREs, they couldn’t even use sophisticated techniques to simulate an RCT on that question.

So what they did in effect was to look at a different question: “What measurable individual traits predict graduation? Do GREs help predict?” This is close to the causal question of “What traits cause a student to graduate?”, which encouraged me to use some causal language (“collider bias”) to distinguish ways in which their analysis systematically biased their estimates from ways in which their analysis systematically lost signal-to-noise.

What does the second question, which they address though incompetently, have to do with the first one, the real policy question? The implicit idea is that if admissions committees choose the individual students with the strongest markers of traits leading to graduation, they will end up with a cohort with a higher graduation rate. With various caveats, that part is useful.

I think you are suggesting another experiment: substitute randomized scores for the real ones, and see how those scores affect graduation rates, mediated by their effect of which program the student goes to. That would actually be a good way (though unethical) to measure the direct effect of program rank on graduation rate. Then in the “what makes a student likely to graduate?” question, one could properly adjust for that direct effect of rank. The sign of that direct effect is unknown.

The adjustment used for program rank instead ended up adjusting for a classic collider, giving systematic negative bias. Not because colliders always have to give negative bias, but because we’re pretty sure that the things collided with in admissions here (prior research work, reputation of undergrad program letters, …) all are counted with a positive effect sign by the admissions committees.

Sounds like we agree, although I haven’t looked over the book and can’t comment on it.

]]>Michael:

I did not make a mistake, the error in description was yours.

I simply calculated exactly as described in each paper, which said to multiply the logit for GRE-P in the sciencemag version and GRE-Q in the arxiv version. If the calculation should be conducted differently, it is your responsibility as the author to describe precisely how it is done so that it can be understood by your readers. Better yet, it would have been better to include the actual equation as you did in the arxiv version and to replicate Miller et al Table 2 with the new effects based on your calculations.

That is the entire point of your article according to the arxiv version.

]]>@curious It feels stupid to still be answering this but what we have here is a simple change of variables in a linear equation. What you multiply the old variable by to get the new one is the inverse of what you multiply its coefficient by. What appeared in print was exactly right. I’d like to be snide about it but actually the first time I looked at it I made the same mistake as you.

]]>If the real treatment is admissions committees looking at GREs[1], the real effect we want to estimate is the difference between the case where they look at some scores and the counterfactual where they look at different scores (but the student is the same)?

Say I change the treatment assignment and increase the scores for one student in the application documents that one admission committee will receive. What would be the effect? If the student was admitted (and enrolled [2]) either way, would the probability of completing the degree be different?

One could imagine a mechanism, depending on the “scope” of the treatment. Maybe it gives access to more money, or a better advisor, increasing chances. (or maybe higher expectations lead to disappointment, conflict and quitting). But if the “treatment” ends there, why should the probability of completing the degree change?

In the case where the “treatment” (which is a manipulation of the scores the admission committes consider, not a change in the student’s ability) results in the admission on a different program it’s more plausible that the probability of completing the degree change. Many things will change as a result. Maybe a better program is harder, increasing the probability of dropping out. Maybe having better resources, or more motivation because the post-degree outlook is brighter, increase the probability of doctorate completion. But it would also make sense in that case to say that the “real treatment” was admission on that program and not the score.

[1] I guess you have then many treatments, one per admissions committees (let’s assume for simplicity that all the members look at them at the same time and the “treatment” happens when they meet to look at the scores). For each student, including some that won’t be accepted anywhere, there will be one or more treatments and the outcome (finishing or no the degree) will be measured for at most one of the treatments (conditional on the program accepting the student and the student “accepting” the program).

[2] The “change of treatment” doesn’t affect the other admission processes.

]]>I had only skimmed your paper, but I think you disuss the issue in more detail than saying “you just don’t”. My comment was about Andrew’s remark. He points to chapter 19.6 “Do not adjust for post-treatment variables”. To be fair, the message is weakened already in the first paragraph where it says “it is generally not a good idea”. I think the next chapter 19.7 Intermediate outcomes and causal paths is also relevant. Even if the bottom line is still “if you do that bad things can happen to you”, it makes clear that the issue is complex.

(By the way, I tried a regression using the data in Figure 19.10 and I got a coefficient of -1.62, not -1.5. It’s possible that I did wrong, though.)

]]>But to clarify further: The real treatment here is admissions committees looking at GREs.

But there was little direct way to look at that, since at the time the look-at-GREs treatment was universal. So (although their description was very muddled) what Miller et al. did was to construct a model of what causes graduation, and try to infer what the effect of the admissions looking at GREs treatment would have been from the coefficients of that other causal model. Whatever influences GRE scores is one of the causal factors in this other model.

That really wasn’t a crazy thing to do, except that they screwed up everything about how they did it.

]]>Yeah, living by rules can be a problem. A somewhat better rule would be more like “Don’t adjust for downstream variables in the causal diagram”. Time order is just a way to guarantee that something isn’t causally downstream. In this case, GRE probably has little effect on GPA. But GPA and GRE both have strong causal effects on which grad school you get into.

]]>Michael:

I pointed out an error in your description. You ignored and pretended it is the responsibility of the reader to understand what you meant. When you make your paper about transparency and competency, it is your responsibility to acknowledge any lack of clarity brought to your attention.

]]>typo above GRE-Q female high end 70/30 not 70/20.

]]>I think I remember an Irish student many years ago describing having gone through a system very much like that. If I understand this Wikipedia passage, it seems that it’s still in use.

“Ireland

In Ireland, students in their final year of secondary education apply to the Central Applications Office, listing several courses at any of the third-level institutions in order of preference. Students then receive points based on their Leaving Certificate, and places on courses are offered to those who applied who received the highest points.”

@curious. I suggest you go home and think about that.

]]>@curious. OK, I got off the phone and found the blown-up xeroxes of Fig 2.

For GRE-p I got that the female odds went up from 62/38 to 70/30, for a logit of 0.358. For males 70/30 to 77/23, logit 0.361. Weighted average 0.36.

For GRE-Q females 59/41 to 70/20, logit 0.483. Males 68/32 to 77/23, logit 0.454, weighed average 0.46.

That said, what the hell is going on with you?

]]>Michael:

I will say it’s now exceeded my interest given the person who claimed their paper was about transparency and research competence is pretending they did not make an error in describing their methods.

]]>@Curious. You seem to no longer believe “This study is not worth spending the time on”.

On “which equation”: If you multiply the GRE-Q score by 1.5 before adding it to GRE-P to make equal-range contributions, then you have to divide the GRE-Q slope by 1.5 due to the change of units.Fr the purposes of checking whether the net slopes on the subgroups are the same, that’s equivalent to multiplying the GRE-P slope by 1.5, since the overall scale is irrelevant.

I’m very late for another obligation now but later will scrounge the figures to get the percentile ranges. You can easily do it yourself to check, just using these percentile ranges and the published slopes (together with the correlation coefficient) to get the effect size. Reading the y-axes to get logits is a consistency check.

I should note that in their lengthy response, which took months and disputed more or less everything else, Miller et al. did not dispute that my read of their numbers was correct.

I don’t understand the computer output you included.

]]>Michael:

Another couple questions:

1. Which formula should be used?

https://advances.sciencemag.org/content/6/23/eaax3787/tab-pdf

“From Figure 2 of (1) , we see that the GRE-P range in the U.S. is about 1.5 times as large as the GRE-Q range. Adding the Q coefficient to 1.5 times the GRE-P coefficient [from Table 2 of (1)]…”

equation from sciencemag: GRE_Q + 1.5*GRE_P

> gre_qp_sciencemag

# A tibble: 4 x 5

group GRE_P GRE_Q GRE_V GRE_QP

1 All_N_3962_Logit 0.003 0.013 -0.001 0.0175

2 US_female_N_402_Logit 0.0002 0.017 -0.001 0.0173

3 US_male_N_1913_Logit 0.005 0.01 -0.000005 0.0175

4 US_N_2315_Logit 0.005 0.01 -0.0001 0.0175

https://arxiv.org/ftp/arxiv/papers/1902/1902.09442.pdf

“The 10th to 90th percentile ranges for the U.S. group can be seen in Fig. 2 of (1), with

GRE-P having ~1.5 times as large a range as GRE-Q in this cohort, so the equal-weighted sum is

close to GRE-P+1.5*GRE-Q, i.e. 1.5*GRE-Q has about the same range as GRE-P.”

equation from arxiv: 1.5*GREQ + GRE_P

> gre_qp_arxiv

# A tibble: 4 x 5

group GRE_P GRE_Q GRE_V GRE_QP

1 All_N_3962_Logit 0.003 0.013 -0.001 0.0225

2 US_female_N_402_Logit 0.0002 0.017 -0.001 0.0257

3 US_male_N_1913_Logit 0.005 0.01 -0.000005 0.02

4 US_N_2315_Logit 0.005 0.01 -0.0001 0.02

2. In effort to “maintain minimal standards of competence and transparency” will you please share the values you used when you, “… measured the points with a ruler and calculated the corresponding logit” from Miller et al Figure 2? If they are in the arxiv version, please point me to the page because I’ve missed them.

]]>@curious Good questions.

1. The UGPA- GRE correlation was not initially shared with me. Finally under pressure it appeared in their arXiv follow-up. I forget what the exact value was but it wasn’t huge. You can look it up.

2. I took Fig. 2 and blew it up to full size with a copying machine. Then I measured the points with a ruler and calculated the corresponding logits. You may notice that the numbers changed very slightly between my arXiv versions because I tried to be extra careful when I realized that an official publication would result.

If these two predictors were fully collinear, then the range of their sum would be the sum of their ranges. They aren’t quite, so the range is reduced by the factor given in the paper. You take that range and multiply by the slope of the sum to get the effect size for the sum.

BTW, I want to brag about something. In the first arXiv versions, when the authors were still hiding all the correlation coefficients, I had to guess the GRE-P-GRE-Q correlation. Guided by other test correlations, I guessed 0.707, out of convenience. I wrote ETS, who said they didn’t have it but later came back with the actual value: 0.70 for both Spearman and Pearson. They asked how I knew. The value of 0.55 in the data of the paper is reduced by range-restriction.

]]>Michael:

Now that I’ve taken your sage advice and read your paper in greater detail, I see how the 1.5*GRE_P + GRE_Q brings the subgroup logit effects to ~ equal. However, I am wondering:

1. Was the correlation between UGPA ~ GRE subtest scores shared with you?

2. In the second paragraph on page 2 of your paper you state:

“The net logit change between the 10th and 90th percentiles on that combined score would be reduced from the sum of the separate effects of the two scores [~0.46 and ~0.36 in the United States for Q and P, respectively, estimated from Figure 2 of (1)] by a factor (1.55/2)^1/2 since their correlation is 0.55, giving a net logit effect of ~0.72.”

It’s not clear to me where the effects ~0.46 and ~0.36 are being pulled from Figure 2 in the Miller et al paper and how you are calculating the combined effect of ~ 0.72.

]]>https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines :-)

]]>https://www.chronicle.com/article/can-algorithms-save-college-admissions

]]>If GREs are the treatment, someone who lives by the rule of “never adjust for post-treatment variables” would also want to leave out GPAs. The GRE is often taken before graduation, sometimes even before finishing the first year of college.

]]>Curious:

Indeed, this relates to an interesting point, which is that open data is a plus, regardless of the quality of a study. A high-quality study is even better if the data are available, and a low-quality study can be valuable despite its flaws if its data can be accessed.

]]>Michael:

There is plenty to criticize about that study, but at least they put their analytic results in a table to make it easy on the reader.

]]>@curious I agree that effect size is more important than statistical significance. That’s why I made a point of giving it in the paper, along with the calculation method. I have no access to any data other than what Miller et al. published. In fact, our interaction started when I asked for one correlation coefficient (GRE-Q and GRE-P) and the lead author wrote that he couldn’t give it because of “human subjects” issues. It took months and some pressure from others to get that coefficient.

As I wrote in the papers, the effect size isn’t huge (around a factor of 2 odds ratio for GREs toward the top of the enrolled range compared to toward the bottom, holding GPA constant. The GPA effect was smaller, holding GRE constant. None of my published estimates make any correction for range restriction, but informally I’d estimate that would raise that odds ratio to ~3 for the current enrollees. Following the policy recommendation of the Miller et al. to drop the exams entirely would mean extending admissions to people who are below the bottom of that current range, and thus would give a bigger odds ratio.

These mediocre predictors are all that people have to work with, which is why they try to combine several of them to get better guesses.

]]>I could not care less about the statistical significance of your estimates, what I care about is the size of the estimates.

]]>Share the data and formulas you used Michael, because what I’m looking at are substantively tiny effects (though statistically significant – made larger by correcting for range restriction) produced by large sample sizes that are effectively meaningless given the crudeness of the measures.

]]>The journal (Physical Review Physics Education Research) has declined to publish my comment but has suggested that a survey of messed-up causal inference in their published papers would be welcome.

“…I’d like to encourage Dr. Weissman to comb through

PRPER to find some other examples of authors making the same unwarranted leap in

their “implications” sections, and write a Short Paper for PRPER explaining this

critique and arguing that it applies fairly widely–that it’s a mistake authors

commonly make.”

So please, if anybody knows of any papers that should be covered, please let me know. If anybody is really enthusiastic about this project maybe we could collaborate. It’s quite unusual for a journal to request critical comments on their own papers.

]]>@Curious. No. Read the damn papers. The original authors did something crazy to get groups of ~23 to artificially inflate confidence intervals to hide an effect. That particular data subset actually had N= ~2300, and the overall set had N=~4000. It turns out that was enough to see trends at better than 4 sigma level despite some collinearity with other predictors and despite range restriction plus artificially enhanced collider stratification bias.

]]>Michael:

Are you really making a strong inference based on statistical adjustment for data with cells as small as 23 using measures we know to be crude in their ability to differentiate at small increments?

This study is not worth spending the time on, but the inference you are making is patently absurd.

]]>You’d think so a priori, but it turns out the data say otherwise for completion rates. You’d have to ask somebody with actual knowledge of the field to see if anything is known about broader outcome measures.

]]>Exactly. Jamie Robins helped me clarify that after my first response was sort of scrambled.

]]>Yes to all that. But for me the big issue is not admissions policy. It’s whether in our pursuit of various goals (virtue, status, grant money,…) we completely shit on basic scientific methods. If we do, what was it that we were supposed to be offering the world? Why would anybody listen to us about a pandemic or the climate?

]]>That is one sort of effect that could be present, that it’s harder t graduate from the top programs. There are other effects (e.g. differential funding) that can make it easier to graduate from the top programs. So the net effect there is of uncertain sign.

The bias introduced in the S.A. paper and the response was of a different type. Their model included GRE & GPA plus a few less interesting predictors. It had to leave out ones with less standard scales, like undergrad research experience, letters of recommendation, quality of essays,… When you ask “what rank of program did a student end up in?” that outcome variable has causal antecedents both inside the model (GRE, GPA) and outside the model (see above). It’s called a collider between those causes. So when you stratify on it high inside-model predictors become systematically negatively correlated with outside-model predictors. That collider stratification gives a systematic negative bias to all the predictive coefficients for variables inside the model. It can be a very large effect, even flipping signs of coefficients. My paper includes some estimates for simple analytic distributions.

Although many of the problems in this paper were uniquely comical, collider stratification is much more general. Almost every program will tell you that they don’t see much dependence of success on X, where X is an admissions criterion. That’s not just because of small-N stats. Each X collides with many other causes in determining which program a student ends up enrolling in. If a noticeable success vs. X remained in the enrollees, then that says the program should add or subtract weight to X in admissions evaluations, depending on the sign of the dependence. Under normal conditions, very little dependence should remain.

So most of the single-program anecdotes just say that things are working about the way they are intended to work.

Yes, good points. Except that the tests scores do correlate significantly positively with graduation, even when controlling for undergrad GPA and despite negative compensatory effect bias form other predictors. Only strenuous butchering of statistical methods allowed the authors to hide that correlation. Maybe when they set out to collect the data they didn’t expect a correlation.

]]>Am I correct in this interpretation:

Not including individual or tier level interaction is a mistake because GRE scores, etc., are used for admissions and then conditional on those scores that get a student admitted, other tier or program specific variables will determine completion?

For example, when looking at SAT scores and college completion I take a sample that includes students from Cal Tech and Cal State Long Beach. Assume those schools have similar completion rates. If I ran my regression on the whole sample, even with random effects for programs, I would still not find a significant and positive relationship between SAT and college completion, even though we would expect students who score in the 90th percentile on the SAT to have higher completion rates than those who score in the 10th percentile if all those students went to the same program.

]]>“Meanwhile, the question remains of what use should be made now of the actual predictive power of the GREs. That involves non-technical considerations rather than p-values. The issue of how our profession should choose its new members faces a variety of not always parallel social goals and is fraught with uncertainties. Despite these difficulties, finding the best selection method is trivial in one limiting case. If we do not try to maintain minimal standards of competence and transparency or even basic logic in our treatment of data, then the optimum group of students whom we should be educating is the empty set.”

(bolds left out)

Ouch! Tell us how you really feel, why dontcha!

]]>‘The conclusion seems to be that GRE scores shouldn’t be used because they select against underrepresented groups and they don’t work anyway: you can predict doctoral completion just as well with undergraduate grades. But no reason is given to think that admissions processes based on GPA only, ignoring GRE scores completely, would be less selective against underrepresented groups. The distribution of GPA by race and gender is not discussed, apart from a mention to underrepresented minorities going to public universities where grades are lower than in private universities, so “applying UGPA thresholds would indirectly favor White students, posing a risk to broadening participation aims.”’

I think this is all part of the larger debate over whether GRE scores of various sorts should be used for grad school admissions, and specifically whether they cause equity problems. AFAICT progressive opinion has dramatically shifted on this in recent years. Current opinion is that the tests are unfair to underrepresented groups because (basically) wealth buys better access to preparation (whether general schooling or test prep courses). What I don’t get is how reducing admissions criteria to basically just GPA, undergrad school reputation, and letters of reference helps — all of these seem CLEARLY to be likely to be much MORE subject to favour-the-well-connected-and-rich bias than the GRE test. I can’t fathom why this isn’t the primary issue being raised in the GRE/admissions discussion. Maybe it’s just easier to delete the GRE and thus claim you did something about equity in admissions.

I also wonder why the key statistical goal has become “criteria that predict PhD completion.” Presumably, if 2 students had identical physics ability, the rich one would have a better chance on average of finishing a PhD than the poor one, for all kinds of obvious reasons (rich students have less need to work other jobs, can take more risks since they have a backup source of support, they may be less motivated to leave for industry, they have more academics in their family and thus less “are you finished with school yet” pressure, etc. etc.).

But I assume we all agree that it would be a bad idea for admissions committees to rank wealthier students higher! (although arguably the reputation of the undergraduate school is a close correlate anyway)

]]>Only following this loosely, haven’t read everything, but I think…

(a) the “treatment” is the GRE score(s) (left in or out as one

of the predictors of completion)

(b) the response is completion/non-completion, and

(c) the rank of the Physics program the student was in, is being treated as another predictor — when in fact it’s another response (because it may itself depend on GRE scores etc.)

Adjusting for (c) may therefore weaken or reverse any signal for (a) predicting (b). (? – best guess)

]]>I’m not sure I get it. What’s the treatment here? The GRE score?

]]>1) many profs are desperate for students and will take any crappy student and push them through so they can have a department funded slave;

2) many students get sick of the academic bullshit and just get a job and leave;

3) even in companies, just not pissing anyone off is a more effective way to rise in the company than performing well – which often means pissing off entrenched people and blocking the path to advancement.

Claiming the **testing** is an “unfair admissions policy” is comical. No doubt a claim made by someone who didn’t do well on the test.

The problem isn’t the admissions policy. It’s the graduation and success criterion.

]]>“many people finish a Ph.D. who shouldn’t, and are granted degrees essentially for sticking around, to the frustration of hard-working and capable students.”

Baddabing. This is the case with ***ALL*** degrees and alot the case in companies too, so the idea that test scores will predict completion rates for grad students or success in the “after life” is a crock of shit to begin with.

]]>Mj said,

“The use of “and” here pretty clearly indicates a joint hypothesis”

It could also indicate sloppy thinking or sloppy writing.

]]>A study in 2016 showed that post-treatment conditioning is present in nearly half of the published papers in American

Political Science Review, the American Journal of Political Science, and

Journal of Politics. (source: https://www.dartmouth.edu/~nyhan/post-treatment-bias.pdf)

The use of “and” here pretty clearly indicates a joint hypothesis, yet nowhere is this hypothesis tested.

]]>