Skip to content

The importance of descriptive social science and its relation to causal inference and substantive theories

Here’s the abstract to a recent paper, Escaping Malthus: Economic Growth and Fertility Change in the Developing World, by Shoumitro Chatterjee and Tom Vogl:

Following mid-twentieth century predictions of Malthusian catastrophe, fertility in the developing world more than halved, while living standards more than doubled. We analyze how fertility change related to economic growth during this episode, using data on 2.3 million women from 255 household surveys. We find different responses to fluctuations and long-run growth, both heterogeneous over the life cycle. Fertility was procyclical but declined and delayed with long-run growth; fluctuations late (but not early) in the reproductive period affected lifetime fertility. The results are consistent with models of the escape from the Malthusian trap, extended with a life cycle and liquidity constraints.

Without commenting on the full article, I just wanted to comment that the above represents an important form of social science research: It’s a descriptive study that has implications for causal inference and substantive theories.

It’s my impression that quantitative social science is generally taught with separation between measurement, descriptive analysis, causal inference, and theory building. Not complete separation—measurement is motivated by theory, etc.—but I feel there’s not a full appreciation, on the conceptual level, of how we learn from careful descriptive work.

Ummm, I’m not saying this quite right. At the individual level, descriptive work is influential and it’s celebrated. Lots of the debates in macroeconomics—Reinhart and Rogoff, Piketty and Saez, Phillips curve, Chetty, etc. etc.—center on descriptive work and descriptive claims, and it’s clear that these are relevant, if sometimes only indirectly, to policy. But in general terms, it seems to me that social scientists get so worked up regarding causal identification.

Don’t get me wrong—I agree with Rubin/Pearl/Hill/etc. that we as humans do think causally, and we as statisticians and social scientists should think about causation, and that in particular we should think about causality when collecting and analyzing data. It’s just that it can make sense to learn some facts on the ground as part of the larger goal of understanding the world.

I also sometimes speak of division of labor: it’s good that there are researchers like Chatterjee and Vogl (and me) who study what is happening in the world, so that researchers like Angrist and Pischke can make causal inferences to better understand why.

P.S. Alex writes of the above picture, “In reality, Henry is helping me do my tax return, but we can pretend he’s working on hierarchical modeling… or something. I dunno. I’m not a statistician.”


  1. Matt Skaggs says:

    Commenters have argued here that it is always more important how your model performs with synthetic data than with real world data. For those of us who don’t quite get how that principle can be universal, this is a perfect opportunity to explain why. How would the synthetic data be created and why would the model performance on the synthetic data be more important than on the real data?

    • gec says:

      I think some of the talk about “fake data simulation” etc. on this blog has been a bit loose, so I can understand your confusion. In particular, I don’t think people are saying that it is “more important” to deal with synthetic than observed data, just that it is in some sense a pre-requisite. Well, maybe some people are saying that, but I wouldn’t agree with that, so I’ll just say what I find valuable about “synthetic” data.

      A model is supposed to be an approximation to structure in the real world, whether that structure is causal or correlational. Either way, our model encodes assumptions about how that latent structure manifests in the variables we can observe.

      By picking what we think are plausible latent structures and using the model to simulate observables, we can see how our model’s assumptions lead to different data patterns. The more those data patterns look like ones we actually see, the better we can feel about the assumptions we’ve encoded in the model, since anything that is plausible in the model yields plausible data.

      We can also explore the relationships between specific assumptions (e.g., changing the values of specific parameters) and observables to get a sense of how we might be able to use the model to do (causal?) identification.

      Finally, fitting our model to data simulated from the model gives us a sense of whether our model is identifiable at all and/or what kinds of data we would need to do so.

      In other words, exploring simulated data tells you about how your model relates to the real world, what the model “means”. In that sense, it is an important prerequisite for fitting the model to real-world data, since just going and fitting won’t necessarily tell you what the resulting fit implies about the real world. And notice that all the points I made above are still about how models (could) relate to real-world data, so the simulation is not “elevated” above real data, synthetic data are part of the toolkit toward understanding real-world data.

      • Matt Skaggs says:

        “The more those data patterns look like ones we actually see, the better we can feel about the assumptions we’ve encoded in the model”

        Thanks gec! That makes sense even with a descriptive analysis.

      • Agree, but would also say exploring simulated data tells you about how your modeling relates to the fake world, what the modeling “means” in that fake world. In terms of repeated performance given you made the truth in that fake world so it is known. That learning is then transported to the real world in terms of conceptually repeated performance when you do not (can’t ever) know the truth.

        The simulation is not “elevated” above real data, synthetic data are part of the toolkit toward understanding _what to make_ of real-world data.

        Also agree that there is lot of confusion about this generally and I have been giving webinars to better discern how to address the confusion for the past few years.

        • gec says:

          That’s a great point, that exploring the model in the context of knowing the ground truth (because it is a simulated world) is really important. I guess in my own work, I tend to think of that as part of “parameter recovery”, but as you say, it is really much bigger than that. It is about knowing what you can—and can’t!—say about the world based on the model fits.

          And to Matt’s point, I think that good modeling is especially important in descriptive work, in that a good model provides a concise description of the data. And I think of simulation as critical to good modeling.

          And my thanks to Keith for spreading the gospel!

        • Christian Hennig says:

          This discussion makes me wonder whether you think that we can do well also without (probability) modelling, using exploratory/descriptive data analysis, visualisation and the like. For observed data there are often strong issues with whatever model anyone can come up with. Of course we can learn from using models and criticising them (often the best use of a model is to find out in which way it is wrong), but ultimately… if we are pretty sure that every model we can come up with is misleading in one way or another – wouldn’t we want to try something that is lighter in assumptions?

          I’m well aware that visualisation, descriptive and exploratory statistics cannot really be called “assumption free” – but should that be an excuse to not at least try hard to say something that doesn’t need to be mediated by a probability model?

          • gec says:

            Speaking for myself, I think the important thing is not that a model be a probability model, but that it be a *generative* model. Many of the models I work with have no closed-form solutions for likelihoods anyway, meaning that simulation is really the only way to fully understand them (e.g., many models in cognitive science generate behavior, but don’t assign likelihoods; same goes for many models in astronomy and materials science). But the important property for a model to have is that you can set up assumptions and see what the consequences are.

            I think of exploratory analysis and model development in terms of Emily Dickenson, that they are the two ends of the candle that you have to keep burning. Exploratory analyses indicate where the “hinges” in the data are, where you might need to have an assumption that explains some facet of the data. Modeling (with simulation!) lets you explore candidates for that assumption to see which are plausible. Indeed, good modeling can lead to situations in which what are apparently multiple “hinges” can actually be explained by a single model mechanism.

            • Christian Hennig says:

              If you speak of “generative models”, do you mean models that don’t have any probability ingredient? Wouldn’t you need one to generate data? I’m not sure whether I use the term “probability model” more generally than you do, in which case it may include what you call “generative model” (on Wikipedia generative models are probability models in my sense).

              • gec says:

                I use the term “probability model” to refer to any model that assigns probabilities to observed quantities.

                The way I use the term, whether a model is “generative” is orthogonal to whether it has any elements that are probabilistic. A decision tree is a generative model, in that it can produce new observables given assumptions. But it is not probabilistic, conditional on assumptions what it generates is deterministic.

                The material on wikipedia relates to a specific debate between “predictive” and “generative” models. In the context of that article, they use the term to differentiate between regression-type models that only model the distribution of outcomes conditional on predictors versus generative models that model the full joint distribution of predictors and outcomes. This is an interesting distinction, but not the one I was trying to make and it’s unfortunate that these terms are overloaded.

              • An example of a non-probabilistic generative model: An ODE for a nonlinear oscillator. Given the initial conditions, and the description of the force vs displacement curve, the ODE generates an entire “expected” dataset. It takes some additional steps to bolt on probability for the discrepancy between the expected result and the measured result (which in Bayes can be either modeling error, or measurement error, or both). It then takes additional steps to bolt on a description of the “probable values” of the initial condition and the force-displacement curve (a prior).

                But the ODE itself is capable of generating a dataset once you provide values for the initial conditions and the curve.

          • The variation/noise in observations somehow has to be represented and I am unsure how this could be done without a probability model of some sort.

            Now one can grasp what would repeatedly happen given any probability model with simulation and using a fake world metaphor for the assumptions that make the probability model’s role clearer. That role being to discern what repeatedly happen when you have set and know the truth.

            Now in the simple bootstrap, the fake world is automatically (and thoughtlessly?) taken as the sample in hand and in cross-validation the various hold out samples. The only difference being you thought you have assumed less and have less control.

            So how would one say something that has noise/variability doesn’t need to be mediated by a probability model?

            • Christian Hennig says:

              Well, as long as no probability model is convincing, one could say that noise and variability in the given situation are not quantifiable. Which communicates even more uncertainty (and maybe appropriately so) than saying, there’s noise and variabilty and the variance of this-or-that parameter estimator is so-and-so.
              Descriptive and exploratory statements would normally refer to the analysed dataset only, not to any idealised underlying population. But if that’s the best we can do…

              • > Descriptive and exploratory statements would normally refer to the analysed dataset only, not to any idealised underlying population. But if that’s the best we can do…

                Agree, sometimes one should not try to learn beyond the actual data in hand but simply note it’s variability/uncertainty cannot be well characterized but is simply what it is.

            • Christian Hennig says:

              Obviously the validity of bootstrap and cross-validation (in their most simple forms) requires an i.i.d. assumption, hence nonparametric probability modelling, if you wish. Plain indexes and graphs assume even less (except those that are based on probability models of course). At least cross-validation can be well motivated by saying, “what would have happened had we applied prediction method X only having observed a part of the data?” – at least as long as you can make the case that it would have been realistic to observe any part of the data involved in CV. The case for bootstrap is not quite as easy to make.

            • Christian Hennig says:

              By the way, I don’t say that modelling assumptions should be avoided as far as possible. I’m just questioning the “dogma” that inference based on probability modelling is always helpful. In statistical advisory I have come across many cases where in my view making non model-based statements about the data in hand was most convincing. People often do model-based inference because they think they have to, and at least before this is done with little understanding and insight, leading to the well known issues, I think it is sometimes helpful to say “you can stick to the data at hand with your statements and avoid generalisation”. I have met researchers who were really happy that I helped them to say things that they felt they could understand and were properly convinced of, rather than generalisation based on a model they don’t believe. I have of course also met enough who wanted a p-value in order to have something to publish – and probably something more sophisticated such as a posterior probability from a model they still don’t believe would have served them as well in this respect…

      • Martha (Smith) says:

        Good explanation, gec. Thanks

    • I tried to write up our best practices in succinct form with computational examples in a new part of the Stan User’s Guide, Part 3. Posterior Inference & Model Checking

      The basic outline is that we use

      1. prior predictive checks to test whether the prior assumptions are within the realm of reason,

      2. simulation-based calibration to test whether our modeling and computational set up can fit simulated data,

      3. posterior predictive checks to test how well the model fits one or more data sets, and

      4. held-out evaluation or cross-validation to test whether the model is overfitting.

      In particular, simulation-based calibration checks that our posterior intervals have the right coverage (i.e., they are calibrated). They do that by simulating parameters from the prior, then simulating data sets from the parameters, then using the sampling software to sample from the posterior given the simulated data. The posterior intervals should have appropriate coverage for the simulated parameters.

      I didn’t go into exploratory data analysis or the Bayesian equivalent of power calculations (for example, how certain do you expect to be about the posterior based on the data collection size and strategy given assumptions about effects).

      This new part of the User’s Guide also talks about (multilevel regression and) post-stratification, decision analysis, and even the bootstrap for frequentist confidence calculations (Stan can be used a general purpose maximum likelihood estimator).

      The bigger point is that the Bayesian modeling paradigm we recommend is iterative and exploratory and more holistic than just concentrating on one aspect of the problem. This is outlined in brief in the first paragraph of the first chapter of Bayesian Data Analysis.

      A large gang of us are writing our recommended Bayesian workflow up in book form (here’s the GitHub repo; we really need a precompiled version of this for people). The new chapters of the User’s Guide are aimed at showing people how to code all of this in Stan. So is the book, but the book also includes a lot of R and worked examples.

  2. DMac says:

    Completely agreed about the importance and value of descriptive work. For me, one underappreciated virtue of descriptive research is the way that good descriptive work ends up providing a lot of the empirical ‘scaffolding’ for causal inference in apparently unrelated domains. Any work that aims at causal inference ultimately ends up resting on a variety of auxiliary assumptions.

    I’ve often found it surprisingly hard to find good empirical measurements to assess the validity of assumptions of this kind, and I think high quality work taking on the challenges of descriptively measuring trends and associations should be more highly rated than it often seems to be.

  3. Peter Dorman says:

    All too often in this world the things we want to know aren’t measured directly, so we’re stuck with proxy measurements. For me, a lot of the value in descriptive work is identifying useful proxies and getting a sense of what their accuracy and bias might be. (I’m a social scientist; is this the same in the natural sciences?)

    • Martha (Smith) says:

      Good point, and good question. I think the answer to the question is (or should be) “yes”, but I’m at a loss at the moment to give good examples. I hope people from a variety of the natural sciences speak up to give examples from their fields.

    • jim says:

      Proxies are in wide use in the natural sciences, as always with varying degrees of success. Like most methods they require assumptions that may be more or less valid in individual cases or in general and like everything the further you get from a direct measurement the more tenuous the connection. They are also regularly pruned from usage, and commonly have their usage circumscribed by subsequent discovery of problematic factors.

      Stable isotope ratios are used widely in geology to infer all sorts of things from climatic conditions in which a marine microplankton lived to the evolution of magma bodies. Many fossil characteristics are in wide use, such as leaf stomata openings for CO2, tree rings for a variety of climate conditions, and plant and animal assemblages for general environmental conditions. Trace elements are also used to infer various aspects of paleoenvironments.

  4. Curious says:

    Science is about understanding the causal underpinnings of reality. Correlation can be a good first step, but it should never be the final step. Pretense that we can simply make some stronger assumptions to get causation rather than do the hard work of actually understanding the mechanistic causal processes and encoding those into models is where much current predictive analytics falls off the rails.

    • Curious says:

      Descriptive work is important to begin to understand a phenomenon. When done well and critically challenged it gives us information to think about and generate next step questions toward that ultimate goal of causal understanding.

      • Martha (Smith) says:

        Good point. Can you illustrate with some examples?

        • Phil says:

          I can think of a bunch of examples, but as a canonical one how about the Theory of Evolution? Observations by Cuvier, and Darwin himself, and presumably many others, gave Darwin the material he needed to begin to understand what was going on. If he hadn’t looked at hundreds of finches in detail, he wouldn’t have gotten anywhere.

          • Or Rutherford/Geiger/Marsden gold foil experiments. Or decades of paleontology, like Mary Anning and the ichthyosaurs etc. Or people collecting pond water and putting it under their new fangled “microscopes” and discovering tiny “animals” etc etc

        • Anoneuoid says:

          How many cells are there in different tissues as an organism grows and the rate of division is very important to cancer but theres still no good data on that.

          That type of data is what we need to constrain mechanistic models.

          • I can’t tell you how many times I’ve asked my wife “well, how does X work in wild-type mice?” and she’s basically said that no-one knows how that works, and there’s no money available to just do “descriptive” work of how things work so it’s unlikely that anyone is going to figure it out any time soon.

            • Anoneuoid says:

              All experiments in psychology are not of this type, however. For example, there have been many experiments running rats through all kinds of mazes, and so on—with little clear result. But in 1937 a man named Young did a very interesting one. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before.

              The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and, still the rats could tell.

              He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell.

              Now, from a scientific standpoint, that is an A‑Number‑l experiment. That is the experiment that makes rat‑running experiments sensible, because it uncovers the clues that the rat is really using—not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat‑running.

              I looked into the subsequent history of this research. The subsequent experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic of Cargo Cult Science.


          • Martha (Smith) says:

            “That type of data is what we need to constrain mechanistic models.”

            Makes good sense to me. You can’t say something (e.g., dangerous rate of growth of cells) is abnormal if you don’t know what normal is!

  5. jcs says:

    Any suggestions as to what younger scientists can do when trying to push the importance of descriptive work? Anecdotally, I often feel pressured into increasingly complex methods over good, straightforward descriptive work. My supervisor quite literally interprets ‘descriptive’ as a type of academic pejorative.

    • Martha (Smith) says:

      Aargh! I hope people respond to my requests to give specific examples of how descriptive work is important — and that you can use at least some of them to help persuade your supervisor of the importance of descriptive work.

    • jim says:

      To be perfectly honest, while descriptions are valuable, purely descriptive work isn’t valuable because no one has any idea idea if the descriptions will matter for anything ever.

      Suppose you want to describe the shape, color and skyline of various mountains. What discoveries would it lead to? Almost certainly none, because “mountain” is a group of things that reflect a wide variety of processes.

      But if you know something about what the mountains are made of and you’re describing the shape and color of mountains with certain characteristics, you might do some science. You might know for example that some mountains have dark colored lavas and some have light colored lavas. You exclude mountains with no lava and focus on these two types. You might then notice from your comparative descriptions that the mountains with predominantly dark colored lava tend to be broad with gently rounded summits and gentle slopes, while those with light colored lava tend to be steep with pointy or sharply rounded summits. You’ve noticed the difference between shield and stratovolcanoes.

      Had you left your study wide open, you might be studying mountains that result from so many different processes that it would impossible for you to find anything useful. By constraining your study you’re able to identify features that are relevant to understanding how (some) mountains form. Your work is descriptive, but it’s also comparative and addresses some question.

      Darwin’s work wasn’t descriptive either. He did a lot of describing, but He didn’t just describe some random group of animals. He picked describe things where the description served a specific purpose in a research study. He described the finches because there was a surprising range of forms in a small area. He described them *in order to perform a comparative anatomy study*. The description provides the data. The research is the analysis of the data.

      • Curious says:

        I suppose your comment is an example of the kind of descriptive work not to do, but not a strong argument against well done descriptive work as a rule. Understanding the baseline population distributions of some set of measures can certainly be useful to others doing work in the same area. I agree that the ultimate goal should not be description, but it certainly can be the ultimate goal of a given study that will in turn be used to develop understanding of the phenomenon under study as you aptly describe in your third paragraph.

        If you have an idea you want to develop and the relevant descriptive work has not yet been done by someone else, then that’s where it will likely begin. You seem to be saying in one breath, “descriptive work isn’t useful” and in the next saying, “descriptive was essential to these areas of study.”

        • jim says:

          ‘You seem to be saying in one breath, “descriptive work isn’t useful” and in the next saying, “descriptive was essential to these areas of study.”’

          Exactly! :)

          It should be done in the context of the study it’s used for. Most data that’s gathered without a specific purpose is never used and if it is used it’s not the right data for the job.

      • Anoneuoid says:

        You could notice mountains tend to be north-south, similar to why continents taper to the south.

  6. jonathan says:

    If I read you correctly, I have a similar issue with causality: people in any variety of settings spend a great deal of effort finding causal links which they then impose forward, generally without recognition that other causal links, even similar ones, would point in different directions, including a blunt sign change. Better to understand the state of things without becoming reliant on specific causal chains. In law, this becomes mortmain, the dead hand, which imposes a limit, like the rule against perpetuities, intended to limit the reach of the dead hand. The argument over ‘gerrymandering’ is that: to what extent should the winning of the legislature carry through time in the form of the mortmain of redistricting which is intended to preserve or enhance the conditions under which that legislature was elected. There’s no right or wrong there, just arguments about who should win.

    One of my problems wth causality is that the causes you identify as leading up to this point may not extend with great predictive power into the future, and this becomes more true in speific cases a) where you are arguing for causation instead of demonstrating its power so the extent can be evaluated, and b) where the elements are arranged as though they were first order or individual level when they are more complicated. In both cases, the relationships in complexity generate not only the identified chain but others, including contrary ones, which means the framing drives prediction, which automatically generates gaps as long as that part of life isnt a closed system, which it cant be.

    My impression is some social science people dont understand group operations. They impose symmetries that dont exist in real world examples, because each step in any direction leads out of that neighborhood, because any step is part of other neighborhoods. There are many ways to say that, but I like the neighbors idea because it says you play with Sue and Sue is your friend but Sue is part of a family which is not yours, and so on.

    And I tend to believe the statement that we need to identiy cause is misleading; we need to identify what to do next, not why we need to do it. If we’re in a battle, we need to survive the battle. Then we can sit down and figure out how to fight better next time. That’s a specific use of cause which largely separates from what you need to do next. What I see with social science research is a lot of generalization from small findings. These yes maybe if claims generate a lot of social prescriptions that dont work because they are mathematically inappropriate levels of generalizations that should be noted with a series of if’s.

    I’m sure this is weird but I think of social science findings as containing a lot of if true statements which I treat as countable. So each if could be a 4 or 23 or some other number and each occurence in the long chain of statements is describable in some order, counted in primes to generate uniqueness, and raised to the if number. Long chains of if true, with parantheses that have their own number. These hypotheses have a bunch of orderings so they have a bunch of possible unique numbers. The more absolute a statement, the fewer the alternative numberings until they become mechanically differentiable as the meanings drop and just the statements remain.

    • Martha (Smith) says:

      Lots of good points — but I lost you on the “I’m sure this is weird” part.

    • I’ve never seen anything in stats or computer science that even remotely addresses the philosophical complications of trying to delineate events in context and assign causes to them. (As an aside, I used to teach philosophy of language when I was a professor and half of my dissertation was on the semantics of events [specifically, intensional adverbs like “repeatedly” or “accidentally”].)

      On the flip side, I do appreciate the attempts to adjust for non-representative data and believe many of the discussions in the causal realm are relevant for modeling.

  7. Martha (Smith) says:

    Andrew wrote,
    “P.S. Alex writes of the above picture, “In reality, Henry is helping me do my tax return, but we can pretend he’s working on hierarchical modeling… or something. I dunno. I’m not a statistician.””

    I say baloney — Henry just found a nice spot (that green whatever) to sit and groom himself (taking advantage of the nice lighting from the lamps)

  8. John Williams says:

    It wasn’t social science, but On the Origin of Species is descriptive, and is pretty good science.

  9. Serge Lang, in his Challenges, wrote that fiction writers undertake better narration than many experts. I have to pull out the exact quote. I lean to that view as well. Maybe technical jargon overtakes descriptions.

    Yeah, Sander Greenland would appreciate John William’s comment.

Leave a Reply