Time series plot for distributions is pretty much a trace plot from MCMC. Although I wouldn’t put it passed people to start analysing time series plots like those tea leaf readers in stocks doing “technical analysis”.

]]>I have a large number of observations with distributions which change with age. I’ve plotted time series of quantiles which give some sense of the change in distributions, but plotting the distributions for different age ranges seems to make changes in the skewness much more clear than looking at the quantiles. This pattern, for some variables, suggests a maturational process. Any suggestions for more informative visualizations?

]]>Thanks, its useful to hear your line of thinking!

]]>I agree sometimes densities are not what we need. In this case the probability of different arrival times is what the bus riders wanted, so we tried to show them that.

The time series with bar charts example reminds me of this Barbara Tversky paper on spontaneous tendencies among students to read bar charts as discrete data and line charts as continuous:

https://cpb-us-w2.wpmucdn.com/sites.wustl.edu/dist/e/952/files/2017/09/zacksmemcog99-12d5ktx.pdf

Maybe the results would have been different on your students.

Good post. Thanks

]]>Interesting question. There were a few reasons that motivated the quantile dotplot… we were thinking about bus riders who want to see some representations of the uncertainty in point estimates of bus arrival times they would normally get, so they can decide things like when to leave for the bus stop. But they need it in a format that they can more or less glance at and make decisions quickly. The thought was that since its often one sided intervals they care about, with a small number of dots as long as you know the total number its easy to quickly see your probability of catching/missing the bus.

Also, I guess we were assuming the average app designer would have an easier time implementing something static. But I’m a big fan of animation and so rapid cycling through draws seems intuitive to me. I like how the weather app DarkSky visualizes precipitation predictions, which is something like this.

]]>This reminds me of: https://arxiv.org/pdf/1908.06716.pdf

]]>Good points.

]]>I’m looking at data on electricity outages and the fires that they cause. I have some potentially useful explanatory variables, such as estimates of fuel moisture content for various sizes of fuel (grass, small sticks, large sticks), and the current temperature and the mean temperature over the past 24 hours and past week, that sort of thing. I expect a relationship between ignition probability and fuel dryness, for a given cause of electricity outage, but I don’t expect that relationship to be linear (in either untransformed or logit space). It’s a large dataset — there are hundreds of thousands of outages per year, and a few hundred ignition events — and a lot of the methods I usually use for exploratory data analysis were not providing me with much or any insight.

One thing I found very useful was to divide many of the variables into quintiles and calculate ignition probability per quintile: what’s the probability that an outage causes a fire when it is extremely dry compared to when it isn’t? What about when it is extremely hot compared to when it isn’t? And so on. And of course I could combine these: what if it’s very hot and very dry; very hot and somewhat dry; and so on. Histograms and tabulation gave me a lot of understanding.

Now that I have a good feel for what explanatory variables are most important and which ones seem to have interactions too big to ignore, I’m switching to continuous models, since I don’t want artificial boundaries between one dryness category and another, but for exploratory analysis the discretization really helped.

]]>If the bins were actually supposed to have been unequal, then I probably got it wrong anyway.

The whole process was distracting and I don’t think it was very intuitive, if you get right down to it.

The only intuitive thing for me was that the pdf was pretty much a typical single-peak pdf, not Gaussian perhaps, but maybe much like a radial gaussian. As I think back on it, though, it has to be idealized instead of real data since there isn’t real-life variation like there would be with a histogram.

I hope these reactions are helpful for Jessica’s research.

]]>One reason I ask this question is because I remember, years ago, teaching intro statistics and having a homework problem near the beginning of the semester asking students to gather some data and plot a histogram. About half the students plotted time series as bar charts. At first I was annoyed, but then I started to think, Why am I so sure that students should be learning about histograms anyway, given how non-intuitive they can be.

So, I’m not saying to never show a histogram or display a distribution, but now I’m more happy with time series plot. In Jessica’s example above, we could plot bus waiting time vs. date.

]]>But these days, Monte Carlo is so fast, and the draws do replace a continuous distribution with a discrete one. Also, say 19 draws could be in grey and 1 in black. Possibly cycled through.

So the question is, why are particular discretizations being used?

]]>There are compromises, of course. The green or red shading can vary in intensity. Better yet, there can be an up or down arrow, varying in length proportional to the size of the change. Even subconsciously, these provide visual cues that are more accurate than the discrete version. So, the tradeoff between simplicity and accuracy may not be that worrisome in some cases.

]]>