The project took other turns and I never really had a chance to work on this idea. I would be interested if someone has a reference on an idea like the one I described above.

]]>If you limit yourself to sampling the ends, wouldn’t it be harder to discover any confounding factors that you might have missed? The data you’re getting can’t deviate from your model, there is no possibility of a misfit in any way.

For example, use a differently/wringly calibrated instrument ten years later, and you may see a change where there is none.

]]>Generally, in economics, if you make functional form assumptions, you need to be prepared to have good arguments to justify them; you’ll certainly be asked in seminars. This holds for theorists as much as for empiricists (though Nobel prize winners like Heckman get away with more…). At least that’s been the case since I’ve started (around 2012). Things might have been quite different before the “credibility revolution” and the empirical turn in economics.

]]>Depending on the magnitude of the autocorrelations, this consideration can be controlling.

]]>Cat:

See our book, Regression and Other Stories, which has four chapters on causal inference and other discussion of causal interest throughout.

]]>D-optimal designs are used in experiments conducted to estimate effects in a model. Their main application is in designs with an experimental goal of identifying the active factors.

A design is A-optimal if it minimizes the sum of the variances of the regression coefficients

I-optimal designs minimize the average variance of prediction over the design space. If the primary experimental goal is to predict a response or determine regions in the design space in which the response falls within an acceptable range, the I-optimality criterion is more appropriate than the D-optimality criterion.

A related approach is G-optimal designs, which minimize the maximum prediction variance over the design region.

quoted from https://www.researchgate.net/publication/320464986_Experimental_learning

]]>Let’s use the context of the question you were asked about: the probability of an event over time. Your demonstration of the superiority of collecting end point data over intermediate data contains in part “What prior information is available on δ and T? We first consider δ. If the treatment effect is monotone, then δ must be …” I can think of problems where motonocity is a reasonable prior: e.g., the proportion of eCommerce sales over time? Then we might wish to estimate the rate at which this is growing. More interesting would be whether the rate itself is growing over time, or not, and data on the endpoints may not be sufficient for this. I realize, we might assume a quadratic relationship, and that might get us somewhere. However, I think the more interesting question would be whether the rate at which the eCommerce proportion of retail sales is monotonic or not. Similarly, most of the interesting time series questions I can easily think of involve questions about whether or not the trend is monotonic.

I realize that your paper is talking about treatment effects and the original question is about time trends. But I think these contexts are different in terms of what a reasonable prior might be. Treatment effects would normally either be monotonic or quadratic (there are exceptions, but I think this covers the majority of cases). Time trends do not seem so readily characterized by me, at least the interesting ones I can think of. Maybe my imagination is too limited.

]]>It’s nonlinear, but not necessarily over the time period in question. Like if the probability starts at 0.02 and increases by 0.003 per year for 20 years, up to 0.08 that could definitely be linear-ish over that time frame.

]]>This is what I came to the comment section to write, and you did it better than I was going to.

]]>Art:

See P.P.S. above.

]]>Sandro, Dale:

See P.P.S. above.

]]>P_t = P_1999 + (t-1999)A + E_t

where E_t is a year-specific random variation term. In that kind of construction, just piling on measurements at the end-points will run into serious pseudo-replication issues, so you would indeed be better off spreading out your measurements over time. This is independent of any effects of nonlinearity….

]]>This sounds like something you can check with fake-data simulation (a proposal I only can make because of reading this blog).

Really, set up a generative model that includes various effects, including a (quasi)-linear trend, a year-wise variation and some other fun things (maybe confounders to other variables)? I could imagine doing a game-theory-related approach where it’s a good idea to throw more weight on the outer years, but not limit it to the extremes only.

For logistic regression the optimal design actually depends on the true unknown parameter values. The optimum is to put half the x’s where P(Y=1|x) is something like 15% and half where it is something like 85%. Of course we don’t know those points. This leads to sequential Bayesian methods. Chaloner and Verdinelli I think.

It is usual for the optimum to involve the same number of distinct x’s as there are parameters in the model, making it hard to test the model.

]]>What I find interesting in the paper you cite are statements such as the following: “For example, if the analyst is interested in estimating the effect of treatment and has strong priors that the treatment has a linear effect, then the sample should be equally divided on the endpoints of the feasible treatment range, with no intermediate points sampled.” I think the emphasis should be on the “priors.” Too often, the emphasis is on the mathematically correct conclusion and the importance of these priors is not appreciated. It may be that experimental economists are better practitioners of this than other fields, but I think economists are often guilty of glossing over such critical assumptions. Again, I can think of very few practical applications where it is safe to assume a linear time trend. Indeed, I think the question of whether (and what type of) a time trend exits is more interesting than estimating its average slope over the entire time period.

]]>I share the same thoughts, Dale, and at the risk of getting nitpicky over semantics I’d say comparing 100 measurements in 1999 to 100 measurements in 2019 answers a slightly different question than “growing or shrinking over time”. Comparing just the endpoints answers the question “Does P_t differ at 2019 versus 1999?”. Imagine a scatter plot with, say, 10 measurements per year and you superimpose a fitting trend and uncertainty bounds. Now imagine a scatter plot with 100 measurements each at 1999 and 2019 with superimposed indications of central tendency and uncertainty at each of those two times. I personally find the first scatter plot more informative in a situation where the true behavior of P_t is poorly understood. Who knows what might happen year to year and for all we know (as the problem is stated) there may be things that make certain years unusual.

]]>So with two points its the endpoints. With three you include the midpoint. With four you take the endpoints and the cos(pi/4) cos(3pi/4) points…

This is for the interval -1,1 so you shift and scale that interval…

Why Chebyshev points? For interpolation, because it avoids the Runge phenomenon of wild oscillations. Basically it weights points near the edges higher since the endpoints provide information that’s constrained to the interval

]]>