Thanks @Dan for sharing the article and the thoughtful write up!

@Erikson, it explicitly does not work for Gaussian latent variables. There is a fancy 1860 theorem that shows something even stronger (Gaussian is the *only* latent variables for which the rotation does not work). As for oblique rotations, I don’t think they do what we think. I think it sometimes looks like we want an oblique rotation because we didn’t do the centering properly (sometimes you should and sometimes you shouldn’t and sometimes both choices are wrong!). This is not clearly discussed in the paper, but is something that I am currently pondering. Remark 3.2 in the paper is related.

If you, or anyone you know, has a simulation (not a data analysis) that demonstrates the need for oblique rotations, I would be so so interested. Everything that I can imagine a simulation for… I can also prove that oblique rotations don’t help.

]]>Same thought occurred to me. However note that MacKay was also dismissive of principle component analysis as well. Which is why there is no chapter on that in his book.

]]>I wonder if this paper would have any significant impact in the area Thurstone worked on, i.e., the development of psychometric scales. Although Varimax is still popular, it has been losing ground to various oblique transformations of the factor loadings, e.g., Oblimin (the `psych` package for R does Oblimin transformation in EFA by default for a long time now). My math isn’t enough to work out if the theorems holds for oblique transformations, but the paper states that the data should come from independent latent variables, so I believe it doesn’t.

I also wonder what this would mean to normal-theory factor analysis, certainly the most used set of assumptions to estimate and evaluate model fit in exploratory, confirmatory factor analysis and Structural Equations Modeling in Psychology. The leptokurtic condition for Varimax to work seems to preclude any multivariate normal assumptions about the distribution of observed and latent variables, if we take the model seriously.

Anyway, the nagging problem of deciding the number of latent factor remains. The author use the classic scree plot to decide the number of factors to retain. If we are using FA solely for the purpose of dimensionality reduction, well, this method is good as any, but if want to build scientific theory on top of those unbeliveable models, it’s hard to trust methods that create new entities solely due to sample variability.

]]>lol – not all psychology for himmicanes either!

but here we are

“Not all psychology!”

]]>Growing up as a border brat, I encountered “zee”/ “zed”, “aluminum”/”aluminium”, “company”/”limited”, different pronunciations of “o”, different pronunciations of “ou”, which hand you hold your fork in, etc., from the get-go — and often just used whatever custom the person I was with used.

]]>Great except for footnote 1. I could be wrong but I’m sort of picking up a subtext that maybe Dan somehow doesn’t think that P-values are useful for inference. Possibly. Just my impression.

]]>Thanks for clarifying, and of course for the great post.

]]>Ah! Sorry OliP – I just realized what you meant. Apparently I just restarted the sentence immediately after. Eeeek

]]>Ricardo Lemos has done a lot of work related to my points above, unfortunately much of it tied up in NDAs. He did present some of it at the Bayesian meeting in Sardinia a few years ago.

]]>I think OliP is referring to an odd bit of formatting, where there’s a line break after a comma but the next thing is a capital letter, so it looks like the end of a sentence is missing.

At least, I got tripped up there for a bit, though it still makes sense. Three cheers for PCA!

]]>We borrowed the “s” from “maths” and lent it to “sport”. I’d say we were barbarianz, but we can’t even be consistent on when to replace an ess with a zed.

Also, +1 for McKay’s Oxford comma. I was ruined by grad school in Edinburgh, though my undoing was the brilliant copy editor of my first book, which was with Cambridge University Press. I’ve been at odds with my American colleagues over punctuation and slightly confused on spelling words like “colour” ever since.

Plus, hurray for Dan being back!

]]>From the paper:

“To enable the regime d = k, the results for ICA typically presume that A = ZM is observed with little or no noise. In contrast, Theorem 4.1 covers situations where (i) d grows at the same rate as n, (ii) there is an abundance of noise in A, and (iii) A is mostly zeros (i.e., sparse). This allows the theorem to cover the contemporary factor models in Section 5.”

]]>No. That’s what the paper shows. The scaling and centring is a classic (and key) part of any PCA so I didn’t feel the burning urge to type it up. Also it’s in the paper in lurid detail.

]]>in the late great David MacKay’s “Information Theory, Inference, and Learning Algorithms” where

we can recover latent variables if they are not distributed as a gaussian but have heavier tails.

He has maths also! (Note the ‘s’ barbarians! ;).

]]>