Joshua Vogelstein writes:

Since you’ve posted much on various independence test papers (e.g., Reshef et al., and then Simon & Tibshirani criticism, and then their back and forth), I thought perhaps you’d post this one as well.

Tibshirani pointed out that distance correlation (Dcorr) was recommended, we proved that our oracle multiscale generalized correlation (MGC, pronounced “Magic”) statistically dominates Dcorr, and empirically demonstrate that sample MGC nearly dominates.

The new paper, by Cencheng Shen, Carey Priebe, Mauro Maggioni, Qing Wang, and Joshua Vogelstein, is called “Discovering Relationships and their Structures Across Disparate Data Modalities.” I don’t have the energy to read this right now but I thought it might interest some of you. I’m glad that people continue to do research on these methods as they would seem to have many areas of application.

My quick comment to Vogelstein on this paper was to suggest “whether” to “how” in the first sentence of the abstract.

It is multivariate rather than merely bivariate.

Interesting – basically a search for the right neighborhoods that support common dependencies using a common measure of x and y nearness (the nearness in the x and y axis maybe different).

And as Ben pointed out, for y, strictly speaking read x.2, x.3, …, x.n.

Having discovered some highly nonlinear dependence structures, from a purely applied and practical POV, how might that be incorporated within the typically employed modeling framework of, e.g., a GLM? Or would it require stepping outside the bounds of these models? If so, how?

thanks for posting andrew! indeed, MGC supports multivariate, or data in metric spaces, and the theory holds under relatively weak conditions on those spaces.

@thomas b: i’m not sure i quite understand the question, but if you provide a concrete example, perhaps i can provide a more useful response..

@joshuavogelstein: apologies for asking such a vaguely worded question. Before making it more explicit let me note that your MGC paper is excellent in many ways, not least of which is its scope and rigorous detail. I particularly liked the many visualizations of possible patterns to dependence.

After rereading your paper, my original query has evolved into several issues: the first has to do with the distinctions between description vs prediction (original question), a second concern with masked or moderator relationships (e.g., 3-way and higher), a third concern with how MGC deals with extreme valued or heavy-tailed information and, finally, a concern with Euclidean distance functions.

1) The descriptive power of MGC seems clear. My original vaguely worded question was an effort at determining how one would use these descriptive insights to make predictions. Assuming MGC identifies them, what predictive modeling method or framework would be able to leverage highly nonlinear dependence structures?

2) If moderator relationships exist in the data, would MGC identify them?

3) I may have missed it but of the many patterns represented in the paper, none appear to be true for heavy-tailed or extreme valued information. This is highly relevant as Eklund, et al., note in their paper, Cluster Failure, that fMRI assumptions of Gaussian RFTs are flawed since “the empirical spatial autocorrelation functions are clearly far from a squared exponential, having heavier tails.”

4) Related to 3) is that Euclidean distance functions are symmetric L2 norms which provide limited information wrt sparse data. Hurley and Rickard, Comparing Measures of Sparsity, conclude that only one metric, the Gini Index, satisfies all of the propositions about sparsity stipulated to be important. What about leveraging Mahalanobis or cosine distance functions in place of Euclidean?

Thank you in advance for any comments or responses you may wish to share.

@Thomas_B:

1) though we didn’t realize it in the paper, i now view MGC as a variant of multiple kernel learning. while the objective function is a surrogate to what one typically wants, one can think of each of the n^2 possible scales as different “kernels”. so, given a particular scale, one can then use any number of kernel machines for predictions. of note, i have no idea how well this idea would work, but i’m interested in trying it. so, perhaps we could try something together?

2) in terms of 3-way interactions, there are a number of papers recently on “mutual independence”, meaning, are X & Y & Z all independent of one another. similarly, there are a few papers on conditional independence. these papers either develop these ideas using Dcorr or HSIC (a kernel generalization of Dcorr). i presume that each of them could be “MGC”ed, but we haven’t tried it yet. we are, again, excited to try, and would love to collaborate with anybody interested.

3) we did not explicitly try any heavy-tailed experiments, but we certainly can! if you propose a setting, we can run it real quick and post the solution here. or, of course, since it is all open source and in both R and MATLAB, anybody else could too :)

4) yup, Lyons published a paper in 2013 providing details about what properties the metric must satisfy in order for the theory to hold. from the kernel perspective, it has to be positive definite. so, there are certainly many possibly options, and i don’t think anybody has any results on how to choose a kernel/metric. on the other hand, because we try n^2 scales, it is *like* trying n^2 different metrics, each with difference “bandwidths”. we are starting to explore a few new learned metrics (such as Mahalanobis). again, excited to collaborate on this too!