Rafa Irizarry writes:

What do we call it when someone thinks cor(Y,X) = 0 because lim h -> 0 cor( X, Y | X \in (x-h, x+h) ) = 0

Example:

Steph, Kobe, and Jordan are average (or below average) height in the NBA so height does not predict being good at basketball.

GRE math scores don’t predict success in a Math Phd program so you don’t need to know GRE level Math to enter Math PhD program: https://www.bmj.com/content/342/bmj.d556

I can’t find a name for it.

My reply: I don’t know if there’s a name for it. It’s indeed a well known point—I guess that Gauss, Laplace, Galton, etc., knew about it. We make the point in the attached figure from my two books with Jennifer Hill. Here it is in Regression and Other Stories:

I’ll blog and see if anyone out there knows the name of the fallacy.

Collider bias?

Yep. See also: https://catalogofbias.org/biases/collider-bias/

Lucy, Ryan:

I followed the link, and I think that what they are calling collider bias is something different. They write, “When an exposure and an outcome independently cause a third variable, that variable is termed a ‘collider’.” But there is no third variable here. There’s only x and y (in this case, height and ability).

Rafa is talking about a very specific issue, which is that when you restrict the range of x, the correlation between x and y decreases, and no third variable is involved.

We could call it “selection bias,” because it is, but I think that’s too general a term, as it doesn’t really point to the particular issue here.

I believe the third variable here is whatever you’re conditioning on that restricts your x. Usually this is something explicit rather than an arbitrary restriction on the range of x. Conditioning on being in a PhD programme means the GRE scores will only be at the very top. Conditioning on being an NBA player means you’ll only get tallest players, etc.

I think Lucy and Ryan are right. In the NBA example, the third variable is “selection into the NBA”, which is predicted by both ability and height.

I think the NBA example could be an example of collider bias. In fact, in the NBA there may be a negative relationship between height and skill because below average height players probably have to compensate by being above average in skill. This negative correlation occurs because of incorrectly conditioning on selection into the NBA. Height and skill may actually be independent or positively correlated in the general population.

However, in the two plots the relationship between height and log weight is consistent across the entire range yet r squared is less in the subset. Because the subset is more or less randomly selected, I don’t think this would be an example of collider bias as in the NBA example. So collider bias wouldn’t cover the more general concept that Andy is referring to.

Seems to be classic conditioning on a collider, which I learned from The 100% CI: http://www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/

I actually have a little more to say about this as this exact question comes up every time someone mentions SAT/GRE scores not correlating with undergrad/grad success.

If someone scores 750 on the math section of the GRE versus a perfect 800 that doesn’t really tell me anything (this is partly due to the test being too easy, for reference my score report showed 770 only being at the 87th percentile: https://twitter.com/OmnesResNetwork/status/1105975049763328005). However, if someone scores 400 on the GRE I can be pretty confident they shouldn’t go to grad school (at least in the sciences). So I agree that these tests can’t tell you much about who will do well in grad school, but they can tell you who shouldn’t go to grad school.

Similarly, maybe I can’t tell you how well a 6 foot NBA player will do, but I can tell you a 5 foot player shouldn’t be in the NBA.

Okay, so I read the paper you linked to, and that doesn’t seem to be conditioning on a collider, but the NBA and GRE examples are.

Two points:

1)The score ranges have changed from what you remember: The General Test scores now range from 130 – 180 rather than up to 800.

2) I think your points are good ones, but only considering the GRE “general” exams. The subject area exam scores are another matter. The subject area scores still seem to range from 200 to 990. Many years ago, I served for several years on the committee that reviewed and ranked applicants for math NSF fellowships. The advanced math subject area exam was useful to us in the following situation: A letter of recommendation from a small college saying “This student is the best math student I have ever had” was not very meaningful — but if the applicant also had a GRE score over 900 on the advanced math subject test, that put them in the running for a fellowship.

Explaining collider biases, Richard McElreath uses the NBA example (see 36:36 https://www.youtube.com/watch?v=l_7yIUqWBmE)

I suppose you’re kidding about not knowing a name? The problem is that there are too many names.

When you’re just losing R^2 but keeping the correct slope, as in the illustration, simple “restricted range”.

When you’re trying to estimate a causal effect coefficient (like height in NBA) “compensatory effects” or “collider stratification bias”. Probably there’s more names, but those come to mind first.

Am I missing the joke?

Oh, I should have clicked on the link first. “Restricted range” is in the title.

No idea what the fallacy is supposed to be here, but I think that the idea that there is one partly relies on an imprecision in the use of the word “predict”.

To take the first example:

Steph, Kobe, and Jordan are average (or below average) height in the NBA so height does not predict being good at basketball.

seems to me an entirely reasonable form of argument, if I take “predict” here to be a synonym for “guarantee”. Were I to take “does not predict” to mean “has no correlation with” I might come to a different conclusion.

With regard to the GRE example, the form of argument is:

If A then sometimes not B

Therefore if C not necessarily D

I agree that the argument is formally invalid. What it has to do with the first example, or the the alleged fallacy, I cannot conjecture.

To me it seems like the (equally evil) twin of the “Survivor Bias”

Yes! I definitely think this is a form of survivorship bias.

This kinda sorta sounds like apex fallacy.

https://rationalwiki.org/wiki/Apex_fallacy

“An apex fallacy (also semantic apex fallacy) occurs when someone evaluates a group based on the performance of best group members, not a representative sample of the group members (e.g., evaluating how well women are doing by looking only at national leaders).”

I wish there were a better term though, preferably one that isn’t so closely associated with the manosphere.

Eric:

Interesting point. I agree that the analysis based on Steph, Kobe, and Jordan is an example of the apex fallacy—in statistics jargon, conditioning on a subset y. But I think that what Rafa was really asking about is the fallacy of taking the correlation to be a generalizable parameter: in statistics jargon, Rafa’s interested in the simpler fallacy of conditioning on a subset of x. I think the Steph/Kobe/Jordan thing is not the best example of the error that Rafa is exploring here.

P.S. Thanks for all the comments so far! I was gonna say that I’m glad that I’m not the only one who has nothing better to do on a Saturday night, but then I realized that with all this coronavirus, nobody has anything better to do on a Saturday night!

I don’t know the name, either. But I wildly guess that Galton didn’t treat this fallacy (because Galton didn’t define Pearson’s correlation coefficient strictly). Alexander et al. (1984) pointed out this kind of problem was treated in Pearson(1903).

Pearson(1903) checked the effect of “natural selection” to the magnitudes of some population constants (like means, standard deviations and correlation coefficients). He assumed normal distribution for distributions of both X/Y and natural selection. Under this normal distribution assumption, he proved the correlation coefficient becomes smaller when natural selection narrows the distribution of X. Please see Pearson(1903, pp.22-23; especially, formula (lii.)).

[Reference]

Alexander, R. A., Carson, K. P., Allier, G.M. and Barrett, G. (1984)

Correction of Restriction of Range When Both X and Y Are Truncated.

Applied Psychological Measurement, 8(2), 231-241.

Pearson, K. (1903)

Mathematical Contributions to the Theory of Evolution. XI. On the Influence of Natural Selection

on the Variability and Correlation of Organs.

Philosophical Transactions of the Royal Society A, 321, 1-66.

Maybe it’s related to a “black swan”? Superficially it seems similar in that one tries to extract information from the world with a subset of data.

Simpson’s paradox?

Nope!

Maybe not Simpson’s but reminds me of Lord’s paradox, with inches replacing genders, which is sort of like Simpson’s.

The reversal described here is one of the many varieties of “paradoxes of aggregation”.

Simpson’s paradox is really part of the larger class of “paradoxes of aggregation” where trends or associations estimated on the basis of an aggregate change direction or are otherwise inconsistent with the estimates based upon some some stratum selected from the aggregate. The original Simpson’s paradox describes this effect when the measure of association is recorded as a ratio. But the same phenomenon can occur in regression. The regression line can go one way through one aggregation of the strata; and another way through a different aggregation. The thing is especially devious when one does not know — a-priori — what the relevant strata might be!

“Steph, Kobe, and Jordan are average (or below average) height in the NBA so height does not predict being good at basketball.“

Totally irrelevant. The correct average for guards is 6’3”!!!

“Point guards are generally the shortest players, and that has been the case since the 1950s. The average height of NBA point guards in recent years is 6’3″. The average NBA height for shooting guards is 6’5″, small forwards at 6’7″, power forwards at 6’9″, and centers at 6’11”.“

https://www.fantasybasketball101.com/nba-average-height/

Survivorship bias

could it simply be heteroskedasticity? the explained variance (R2) decreases when you condition on a specific range of X “simply” because the variance of Y changes when you condition by X, and it seems to me that it is the definition of heteroskedasticty.

have a look at http://www.statsmakemecry.com/smmctheblog/confusing-stats-terms-explained-heteroscedasticity-heteroske.html#:~:text=Heteroscedasticity%20is%20a%20hard%20word,second%20variable%20that%20predicts%20it. : “heteroscedasticity (also spelled heteroskedasticity) refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it.”

This comes up in education a lot, especially something similar to the second example, i.e., “Teachers’ educational attainment does not predict student success on standardized tests; therefore, it is relatively unimportant for a teacher to have advanced-level subject matter knowledge.”

There are at least two fallacies at work here. The first is what you describe, some kind of survivorship bias. The second is something along the lines of equivocation. If “educational attainment” simply means the level of degree earned, then it may well be that it does not predict students’ standardized test results. But (a) standardized test results are not the only important outcome and (b) educational attainment consists of much more than a degree (or two, or three). It would be folly to suggest, as some do, citing research, that a teacher does not need extensive subject matter knowledge to help the students progress. Subject matter knowledge is not in itself sufficient, but you can’t do away with it. There’s a big difference between elementary and high school in this regard–but even elementary school teachers need to know their subject well, or they will lack the flexibility to recognize different approaches to problems.

Am I insane for thinking this is just a fallacy of composition?

I’ll add two more contenders: “ascertainment bias” and “Berkson’s fallacy”. It seems to me that here many terms are applicable, from the most general “collider bias” (selection into the group being the collider between abilities and outcomes, it is not necessary to have actual “third variable” for a legit collider) to a bit more restrictive “selection bias” or “survivor bias” and even more specifically Berkson’s, which is specifically about the negative correlation or independence introduced by selection alone.

I saw this type of problem also be referred to as Berkson’s paradox. It is exactly what this is.

I’m not sure the collider answer should be dismissed. Wouldnt it always be possible to describe a lurking variable that restricts the range? Sometimes that lurking variable is somewhat easy to define (though not necessarily measure) like ability. Other times you would define it precisely as the variable which restricts the range. I may even be able to find a set of latent factors that almost always have this property.

This reminds me of another case, where the correlation REVERSES after range restriction. In 1979 I did a study of graduate admission in psychology at Penn. My report is here:

https://www.sas.upenn.edu/~baron/gradrep79.pdf

(I do not recommend this. It is a scan of the only remaining paper copy.)

I found a negative correlation between GRE scores and “success” in our graduate program, for those students admitted. (I should have made an effort to find what happened to the rejected ones.) Because many of our students were in clinical psychology and not even looking for academic jobs, I defined “success” as finishing the PhD and (presumably) using the knowledge acquired in one’s work. Being president of Estonia for two terms (like Toomas Ilwes – our most famous drop-out, but later) thus counted as failure.

I concluded that the problem was that we were using what amounted to two dimensions: being smart, and having an identity (in Erikson’s sense) in psychology. These were largely uncorrelated in the applicant pool. We were over-weighing the first and thus under-weighing the second – cutting off a corner of a bi-variate distribution, with the wrong slope.

My punishment was being appointed as grad chair. I tried hard to admit students with low GREs but strong identities. But in the long run it didn’t work. The grad dean said “We have to compare departments when we allocate money. And the only equivalent measure we have for all departments is GRE scores.” This is some other sort of fallacy, but I won’t try to give it a name.

It already has a name! Berkson’s paradox. http://corysimon.github.io/articles/berksons-paradox-are-handsome-men-really-jerks/

“I concluded that the problem was that we were using what amounted to two dimensions: being smart, and having an identity (in Erikson’s sense) in psychology.”

Actually what you’re doing is reversing the roles of the variables.

Comparing basketball success to white collar professional success, height in basketball is analogous to test scores in professional success. You need to be smart to succeed professionally, so test scores matter. But just like height doesn’t “predict” shooting skill, test scores don’t “predict” one’s ability to get jobs or increase sales or lead a team.

I would just call it a form of R^2 abuse. I find the metric frequently manipulated or mis-interpretted. Aggregating data before doing a regression enhances the R^2. Also, the same time series model can generate a low or high R^2 depending on whether one writes it has a change in the dependent variable or the variable itself.

I have seen the reverse of this example, someone picks a period where a variable varies the most and excludes other periods so they can impress people by saying their model has a high RZ^2.

In psychometrics, this is often called “restriction of range”. (Of course that’s just a variant of “restricted range”, which Michael Weissman mentioned above, but the exact phrase “restriction of range” is useful to search on.)

https://dictionary.apa.org/restriction-of-range

http://methods.sagepub.com/Reference/encyc-of-research-design/n388.xml

https://doi.org/10.1111/j.2044-8317.1986.tb00849.x

Sometimes called selecting on the dependent variable, or sample selection bias.

Similar to correlation between SAT scores and college grades *within* an institution being low, or even sometimes negative, right? The school selected partially on SAT scores – so students in the pool with lower SAT scores were selected because of other strong criteria. Information in SAT scores is “used up” in selection process, and may appear flat or even negative subsequent to the selection process. Lots of schools doing studies of their own students miss this fundamental point.

People are definitely calling this sort of thing collider bias on Twitter all the time. I think survivorship bias is also OK, but collider bias, as currently invoked, also subsumes survivorship bias. Basically anytime the observed correlation between X and Y is driven by a sampling procedure which is conditional on some function of X and Y in each sample.

> someone thinks cor(Y,X) = 0 because lim h -> 0 cor( X, Y | X \in (x-h, x+h) ) = 0

Wouldn’t that be correct if X is continuous and that correlation is zero everywhere?

Otherwise it’s like someone who thinks that a function is constant because its derivative it’s zero somewhere (or even it’s zero everywhere where it’s defined but it’s not continuous).

My vote is for over-extrapolation or unwarranted generalization or something like that.

> We make the point in the attached figure

Wouldn’t the correlation be the same in both cases?

Is this Berkson’s Paradox?: https://en.wikipedia.org/wiki/Berkson%27s_paradox

I agree with Jon – “just call it a form of R^2 abuse.” When I see it discussed, it isn’t labeled as a fallacy, just a case of overreliance on a measure outside its range of usefulness. E.g., Goldberger, A Course in Econometrics, p. 177: “In fact the most important thing about R^2 is that it is not important in the classical regression model. The classical regression model is concerned with parameters in a population, not with goodness of fit in the sample. … we did introduce the coefficient if determination … as a measure of strength of a relation in the population. But that measure will not be invariant when we sample selectively, as in the CR model, because it depends on the marginal distribution of the explanatory variables….”

I think it’s the fallacy of composition, if you extend the fallacy to talk about properties of subsets of a group and not just individuals of the group.

https://en.wikipedia.org/wiki/Fallacy_of_composition

This doesn’t answer the question, the phenomenon is a nice application of https://en.wikipedia.org/wiki/Law_of_total_variance

Another range-restricted sample problem I heard about but remember only vaguey (if anyone knows the source, please speak up): The study tested baseball players on some measure of viscual quickness – reading numbers flashed for shorter and shorter duration or something like that. Resutlt: no corelation between performance on the visual test and performance at the plate.

But before scrapping the idea that quickness of sight made no difference ability to hit a baseball, the researchers gave the same test to a non-MLB sample, who, it turned out, were out of their league. All their scores were far below that of the ballplayers.

Don’t throw Ted Williams out with the bathwater.

Interesting result.

I wonder if the test was the wrong test for baseball players. The most important aspect of visual quickness in baseball is detecting the rotation on a pitch. It’s not as complex as reading a number. It would be more analogous to, say, noticing where there is a bump on the edge of a circle – left or right?

I doubt very much that baseball players can detect the rotation itself. I think what they detect is the variation in trajectory caused by the forces caused by the rotation.

Hmmm…I can’t say for sure but my understanding is that players can see rotation. Movement is too late for them to detect it before they swing which is why they whiff on nasty curves, sinkers and sliders. I’m sure guys like Altuve and Trout spend hours studying video. They know what to look for. Maybe it’s not even the rotation per se, it just a difference in the way the color of the ball looks because of the way the strings rotate through the front of the ball.

“viscual”: Nice example of a typo that ought to be a word: Somewhere between viscous and visual.

Or ‘visceral.’

To me is seems like a variation on the ecological fallacy.

Precisely that. Various other names, “paradoxes of aggregation”. Reversals of some measure of association when restricted to various strata of a larger group. Imagine an (x,y) data cloud with upward trend overall as one looks from left to right in x. It’ll get a positive regression slope. Now imagine one realizes that the x’s really ought to be divided into a handful of distinct categories; and now imagine that when looks at each of the corresponding sub-clouds separately, they consist of data clouds with overall downward trend (as one looks from left to right). Each stratum has a negative regression slope. These examples can be constructed ad-libidum and they are easier to draw than to describe in text. They are not at all merely exotic counter-examples. They lurk just beneath the surface in every investigation which seeks to determine the direction and strength of association between entities where may be in a state of relative ignorance with respect to which “strata” are to be considered relevant.

Seems to me that this is, or is closely related to the ‘bias amplification’, see Models 9-10 of Cinelli et al. (https://ftp.cs.ucla.edu/pub/stat_ser/r493.pdf) and references therein. Conditioning a potential cause on non-confounding covariates reduces precision and enhances the effects of any unobserved actual confounders. afaik it’s an effect normally discussed in the context of persuading people not to put instruments into their propensity score models, but seems more general than that.