Henle in chapter 3 defines a hyperreal number system and then says you can skip to chapter 4, after that the rest of chapter 3 goes into some stuff about “quasi big” sets, skip it first time through.

Literally just read the first page or two of chapter 3 and move on, come back after you understand how to use infinitesimals from chapter 4-6 or so. It’s like the learning to drive a car analogy. Learn about wishbone suspension only when you know enough to care about high performance automobiles. The quasi-big stuff is interesting but not necessary for doing rigorous high quality mathematical modeling, it’s more like understanding how a compiler translates things from a high level language to CPU instructions.

So, my newly revised Henle reading suggestion is

1) Read chapter 1, it’s quick and nontechnical and motivating.

2) Skim chapter 2 very lightly, extract the idea that mathematics is actually about formal rules for manipulating strings of symbols, a language. This fact will be used to make correspondences between proofs using real numbers and proofs using hyperreals.

3) Chapter 3, Read 2 pages, skip when he says skip to chapter 4. Seriously, come back later.

4) Read Ch 4 and try to understand it deeply, the basic ideas that hyperreals can describe “orders of magnitude” (a/b is infinite, or a/b is infinitesimal, a^2/b is infinitesimal… etc), this gives you the big picture of hyperreals, this is where the bulk of the first week will be spent. Try it an hour each day.

Now you’ve arrived at your happy place where you have an idea that it could be useful to have new numbers and how to use them. Then go on the read chapter 5 and see how he uses these numbers to describe properties of functions related to continuity and continue on through the actual calculus.

]]>The one week thing is a red herring, even if it takes a year it doesn’t matter. The point is, you can go one of two different paths.

The path chosen around 1900 or so was epsilon-delta and is all about whether you can make things arbitrarily close to something by looking at a sequence and then going out sufficiently far in the sequence…

The original path used by people who invented calculus for use in physics (Newton, Leibniz etc) was to work with special numbers. The problem was no one had created a consistent set of rules to define those numbers. Well, that happened in the 1960’s and again in the 1970’s and now you can do calculus by working with an algebra of special numbers…

If you can get to the idea of why you’d want special numbers for calculus and the basics of how they work, whether it’s in a week or a year, it’s a useful tool to have for mathematical modeling.

]]>If you have Spivak’s Calculus as background, I’d recommend both Henle and Alain Robert’s book. There are actually 2 main “methods” of deriving the hyperreals, the Henle book goes along a simplified route of the one created by Abraham Robinson, the Alain Robert book goes along the Internal Set Theory (IST) route created by Edward Nelson at Princeton. I think it’s useful to see both, and I personally use IST if needed because I find it intuitive.

As for the “one week” estimate, it may be more a difference between what I mean by “a useful place” and what you think I mean by “a useful place”.

I don’t think you can start doing serious problems in calculus in a week, but I do think you can start to get a feeling for what an infinitesimal is and why you might care to have a number system that includes them.

On the other hand, you have to remember that I did a BS in Mathematics with a minor in Computer Science and an informal minor in Philosophy, and THEN I went into Engineering… so I may have a skewed view of what to expect.

]]>Thanks Daniel. Do you think the Henle book, as opposed to the Robert book, is still the right thing given I have Spivak’s Calculus book (at least a lot of it) under my belt? I’m interested to see what the nonstandard analysis view of things brings to the table.

In general, I see what you mean about nonstandard analysis providing a different sort of on-ramp to the necessary math. But I still think you are probably overestimating how quickly a rank math newbie could get up to speed… it certainly took me much longer than a week! Good math is presented in a certain style. Like any style, it takes a long time to mold one’s thoughts into that style and start extracting useful stuff.

]]>Having thumbed through Henle’s first few chapters again. Here are some ideas and assumptions:

1) Read all of chapter 1, it’s an introduction to the history and purpose of calculus and infinitesimals etc. It doesn’t have lots of technical content. It’s short.

2) Skim chapter 2 very lightly. I assume you know some logic and set theory of the kind you’d have learned in say an algebra class in high school. The point of chapter 2 is to point out that there is a language to the logic of math, you can write down a series of symbols that means “for all real numbers x there exists a number y such that y = sqrt(abs(x))”. This fact about a language will be used later, but the details of the symbols are not important, and different texts use different symbols anyway.

3) Read chapters 3 and 4 in some detail but skip proving anything yourself, the goal is to understand the basic picture of how to think about the hyperreals.

4) Read chapters 5,6,7 try to understand the concepts about functions and the arguments given. When he argues using the technique of “transfer” (that is, that every logical sentence about the reals is also true in the hyperreals and vice versa) go back to chapter 2 and 3 and look for understanding there about the basic technique. This is why I say the whole thing is appealing to “algebraists”. If you like programming, you’ll like nonstandard analysis.

At this point, you’ve arrived at your useful place. It should take less than a week, if you can devote an hour a day and you have at least 3 years of college prep high-school math.

Next you’ll be able to go through the remaining chapters with a new kind of logical structure in your head, learning facts about doing integrals or dealing with infinite series etc. That will take longer. After you’ve done some of this stuff, go back to chapter 2 and 3 and read them in depth, you’ll see why they discuss abstract ideas in logic and you’ll be able to do some of the proofs more easily having figured out what to learn and why to learn it from the later chapters.

Good luck!

]]>Also one week is to arrive at a useful place not to understand all of calculus.

]]>The key to the one week approach is to iteratively skim. You kneed to understand the bigger reasons for the proofs and that means power through the details and come back several times. Read the details in detail only after you decide that you understand why you’d want to know the fact. That might be after you get to an example problem where you don’t know what to do.

]]>Thanks for the book links. I’m interested to take a look at this non-standard analysis thing.

The week startup time sounds pretty ambitious though after glancing at the Henle book. What sort of mythical animal is it that is totally comfortable with definition-theorem-proof style so can easily digest that book, but doesn’t know any calculus?

]]>+1 on everything upthread (Michael, Natasha, and Bob)

My experience with math classes is that they hit the worst possible balance between pure math and applications. I learned to like math only later on my own by doing one or the other. Digging into pure math for its own sake, learning how to feel comfortable doing proofs, epsilon-delta limits, etc, was its own sort of intellectual motivation. Then, on the flip side, just as motivating was doing on-the-job learning about differential equations because I had a system I cared about solving/simulating, and probability densities I cared about integrating, and matrix calculations that made my simulations easier to do.

Contrast that with the math classes (that I took) that give thin justification, if any, for, say, integration by parts. Then throw some tricky-looking arbitrary function next to an integral sign as a puzzle.

Of course, it’s easy to point out the flaw and harder to think up how to do it well. Dan Meyer seems to be on the right track: https://vimeo.com/163821742 (tl;dr make the math work itself more interesting by “developing the question”. see 18:20 for an interesting example). In the world of applications, I like the style of the first part of deeplearningbook.org. Very readable primers on the linear algebra, probability, numerical methods they use in the rest of the book. Not in an appendix, but at the front.

]]>Understanding Abraham Robinson’s construction requires some deep logic, but just using the techniques correctly doesn’t. As you say, applied people routinely argue correctly with infinitesimals.

The book I linked by Henle is sufficiently simple enough to teach you derivative and integral calculus with hyperreals so that the average student could do calculus at least as reliably as if they’d learned the standard method.

After you skim chapters 1-4 of that book for background, you get into actually using the numbers for calculus stuff like calculating integrals and whatnot, and it’s relatively straightforward.

The first 4 chapters are not terribly long or terribly terse, so I think someone who wants to learn calculus can basically buy a $5 kindle book and arrive at a useful place in about a week of reading and thinking. Having some particular motivating questions in Probability/Stats can be a big boost for this person.

Remember, the goal is to make someone like an undergrad econ major able to understand something like a how to do seasonal adjustments using continuous functions instead of weekly-indicators, or able to understand how to set up a model for a distribution of a quantity that is only observed when it exceeds some threshold or stuff like that.

]]>Now that statistics is crossed with computer science = machine learning, you can’t avoid studying machine learning.

Bayesian statistics was hot, they even used it to search missing airplanes.

Today, it’s deep learning, even machine learning is out.

About the math you mentioned, sometimes you can’t explain the math, for instance, what’s the math proof behind convolution neural networks, nobody knows. Of course, I get your point about the math foundation and I agree with it.

]]>It is possible to make it rigorous axiomatically, for example based on the Alternative Set Theory (cf. Vopenka) although I am not aware of a text in English that does that.

]]>The proofs in standard calculus texts do indeed depend on what you can think of as a limiting process rather than the existence of infinitesimals. But folks who use calculus (physicists, mathematicisn … statisticians{?)) routinely argue correctly with infinitesimals. Making that approach rigorous requires a substantial excursion into mathematical logic. I think that’s a burden for most people.

There’s lots of discussion on this at math.stackexchange:

https://math.stackexchange.com/questions/51453/is-non-standard-analysis-worth-learning

There are several postings on Terry Tao’s blog. Search for terry tao nonstandard analysis

]]>I do agree with you that teaching the basics like lm and glm before going into deep learning and random forests would be a lot wiser because too many people now do complicated models without really understand it and this is a major problem.

On the other hand though, trees are way simpler than linear models. They are never mentioned in statistics courses because of historical reasons (they come mostly from CS literature) but they are such a neat and simple structure and then random forest (or BART) is just a natural extension of simple trees. I really feel like we should teach stats students trees before they learn regression.

]]>That’s the two-stage simulation, ABC, McElreath’s demos, etc. which I always thought of as just stage 1 of learning where simulations take you towards continuity and with the addition of importance sampling (re-weighting the prior by likelihood to get the posterior) an opportunity to distinguish density and distribution functions and for instance how the density can be a good approximation for the distribution function for observations.

The idea is not to avoid calculus forever, but rather introduce it at the right stage with right motivations for doing statistics.

I had only intro calculus for social science when I went into statistics (fortunately we did do proofs) and a lot of effort was spent learning math over many years. I was able to grasp all lot about statistics while this was going on because I learned simulation really early on. Perhaps that is what kept me going. But the math folks likely need will take, if taken _up front_ will likely take most about two years of full time study – not feasible for most (wasn’t for me). Also most of the math won’t be helpful but which is likely very uncertain.

Perhaps more importantly its not calculus that is needed but ways to represent and work with continuity in high dimensions given current computational resources to address variability and uncertainly less wrong (I think Daniel’s point).

But overall, most undergraduates have some vague sense of regression and starting with that, getting them to do things with it and then bringing in simulation based views of whats going on in regression – is likely the way to go.

]]>Sigh, that is about me too. It just happened that I worked on different problems and didn’t focus enough on coding and math. That was a strategic mistake, and I am still catching up. As Natasha.

]]>I don’t know if general calculus teaching can get any better but I had many science classes where profs basically said “well, this would all make more sense with calculus but I’m not going to teach that.” We could at least try teaching calculus in science classes (other than physics, they seem to do ok).

]]>This is one of the things I find really charming about my PhD field—sooner or later ecologists will take up pretty much any method as long as its useful. It would be nice for people to make the road less bumpy and just teach the stuff up front.

]]>We’ve been told that Stan would be challenging for people in ecology, but we’re actually seeing a lot of uptake. Sure, you have to marginalize out the latent states in your movement HMM, but when you do it mixes much much better, so you can actually get a result. You’ll find that the math you need to marginalize is in the 1980s literature back when they were doing optimization.

I think it’s helpful to think of the models in terms of the of the discrete parameterization. I always write the discrete parameterization down then do the marginalization. Given that the marginalization is over discrete parameters, it never involves calculus—just some algebra to keep everything computationally stable on the log scale. The algebra can get hairy if you generalize to something like HMMs—then you need the forward algorithm, a kind of dynamic programming algorithm, to compute the log density.

We want people to think about Bayesian models generatively—generate the parameters from the priors, generate the data from the parameters. (Acyclic) directed graphical modeling, as in BUGS, usually forces you to do that (unless you start using the zeros trick in BUGS/JAGS). I’d find it even easier in BUGS if you declared the types. I like that we force people to declare data vs. parameters, which is implicit in BUGS and only determined at runtime. It does make some missing data problems harder and also makes it impossible to reuse the same program for different inferences over the same joint probability model.

]]>Neither could I! I think the conceptual part’s important though. I’m talking about understanding the difference between probability mass and probability density and why the former is an integral over the latter. And about understanding how expectations are weighted averages over densities. And about how (Markov chain) Monte Carlo methods can solve integrals.

I agree that it really comes down to getting better at teaching at least the basics of calculus and linear algebra. I found that doing stats was immensely helpful in that it provided some concrete motivation for learning calc and linear algebra. I was a pure math major as an undergrad who mainly concentrated on logic and set theory. I did Lebesgue integration and topology, so I was all set for measure theory, but I couldn’t remember the chain rule (building an autodiff system really helps drive home differential calc!). Similarly I did abstract algebra and Galois theory, but never learned about determinants (Ben laughed at the first code I wrote for Stan for multivariate densities as it just literally followed the textbook—I had no idea that you shouldn’t apply inverses in numerical linear algebra). I found thinking about covariance matrices really drove home rotations and scalings; then Jacobians for changes of variables really helped understand the role of determinants. So learning stats along with calc can help with understanding both.

]]>Hah, yeah. I finished undergrad in 2011, at which point (at least in my experience), everyone suddenly woke out of their post-crisis stupor and said “oh shit, yeah go learn how to code and do math.” So since then I’ve spent a substantive portion of my ‘free time’ trying to self-teach these foundations. I haven’t done that badly I guess, but it often feels like a never ending grind. “Just one more chapter of this probability theory textbook and I’ll finally get it!” I try not to compare myself too much to people who did take those courses in their undergrad. There will always be someone who started a year earlier than me, or is smarter than I am etc.

]]>JAGS is still very very useful for the majority of folks in my field. The reason is that it is so much simpler to visualize and think through these problems as one conditional step after another. Having to work with the marginal is Stan is certainly more challenging. I’m mathematically inclined, but still feel like doing the doggie paddle in an ocean when it comes to integration for some of these models. I think you have start with the conditional model in teaching and learning though, mostly because it more closely matches intuition. From there Stan doesn’t free you from calculus. If anything JAGS lulls you into thinking you can live without it, until you encounter a problem that breaks JAGS and you find yourself having break out old coffee stained theory notebooks to try to remember how to do a convolution and solve for the determinant of the Jacobian.

One downside to all this calculus: it gets hard describing to the lay people that I love what I do for a living. The old Dunkin Doughnut’s commercial from the 80’s is a helpful metaphor: “Time to make the n-dimensional tori!”

https://www.youtube.com/watch?v=petqFm94osQ

Kaiser,

I don’t see why there shouldn’t be a division of labor in situations with a lot of unwieldy data that requires a lot of preparation.

Though, I suppose the danger is that a hard skill data specialist as such, over time, may not possess the proper insight to adequately determine what data should and should not be included and what data should and should not be reduced or summarized and how it should best be recoded, etc. People who typically analyze high level data often do not understand the value in including data at as low a level as possible for modeling and insight.

]]>“Can we stop sucking at teaching calculus?” pretty much sums up the question I think.

My own approach when I started to really need high level analysis ideas (such as to understand Lagrangian mechanics or whatever) was to throw out everything I learned in calculus classes, and use my math-major knowledge to read up on nonstandard analysis and adopt that. It worked for me, but we’re talking about people who haven’t even taken Calc 1 so what worked for me is irrelevant.

The question is, would scientific consumers of statistics be better off if they took a semester of calculus based on nonstandard analysis such as the approaches in Keisler or Henle as linked in my above comment http://gelmanstatdev.wpengine.com/2017/04/24/stan-without-frontiers-bayes-without-tears/#comment-471847

It’s an empirical question and requires a big grant and a randomized controlled trial. ;-)

]]>NatashaRostova, I’m in social science, too. If I could do undergrad over again, I’d definitely go through Cal II or III. But then again, I’d also take some computer science courses. I can only imagine how much more efficient my research would have been over the past few years had I had the benefit of those foundations.

]]>Hmmm alright. Hopefully, I learn a lot of the concepts and can teach myself RStan or something along the way. Thanks!

]]>Thank you for all of the recommendations—I really appreciate it!

]]>Having come from only a social science training background, I’ve taught myself all the math I know. What I have noticed though, which I think is true, is that understanding the logic and algorithmic operations of integration is sufficient to use applied methods.

That is to say, I probably couldn’t solve any sort of complicated integration question if you poised it to me now and gave me a pen and paper. But I’d understand exactly what the question is asking, and what the answer is doing.

So far, in my auto-didactic quest to learn Bayesian methods, this seems to be sufficient (as with everything quantitative though, being marginally better at math would only make life easier).

]]>There was no collusion on our posts, despite the fact that we sat back to back and wrote them!

]]>Every continuous statistical calculation is, by definition, an integral. Probability densities, for example, are not well-defined outside of an integral sign. In particular, _there are no Gaussians without calculus_!

First and foremost, this means that if you don’t know calculus then you shouldn’t be developing or implementing your own statistical algorithms. Without knowing calculus you don’t have any idea what operations are well-posed and which are ill-posed and hence will inevitably end up building something fragile and prone to error. If pushed I’d even go so far as to say that you’d need basic measure theory, in particular its behavior in high-dimensional spaces, but let’s just stick to calculus for now.

Even assuming software like Stan to handle all of the computations, you still have to have an idea of what you can calculate and what you can’t. In other words, what questions can you even ask in statistics? Then there’s the problem of understanding distribution which serve as the atoms of generative modeling. Some of this can be pattern-matched by example or learned from proof-by-authority, but is that really understanding?

The only way to build up a solid foundation is to try to convey the conceptual basics of densities as objects to be integrated and statistical queries answered by expectations of prior and posterior distributions or likelihoods. Not that it’s easy to identify the right decomposition of the concepts from the technicalities, of course, but there has been some progress towards this end. Ultimately, however, this is just a really sneaky way of _teaching calculus_!

So the question here shouldn’t be “can we teach statistics without calculus” the question should be “can we stop sucking at teaching calculus”?

]]>In fact, it’s calculus all the way down. In Bayesian stats, everything’s a posterior expectation. Parameter estimate? Expectation of a parameter. Event probability? Expectation of an indicator function. Prediction for unobserved quantity? Expectation of a posterior predictive quantity. What’s an expectation? An integral over a density.

More fundamentally, everything in continuous stats is an integral or derivative. A probability density function is just the derivative of a continuous cumulative distribution function.

You don’t need to learn to solve these integrals analytically—Stan will do that for you. But I think it helps to understand at least at a Calculus I and Calculus II level what it is you’re computing. And if you want to do MCMC, then probably Calc III so that you know about sequences and series, because MCMC reduces calculating integrals to series.

P.S. I loved Gelman and Hill. Probably wouldn’t have ever wrapped my head around stats without the combination of BUGS and Gelman and Hill. I’m not saying you can’t start pre-calc, the same way people often start econ or physics pre-calc. But if you actually want to understand what you’re doing, calculus is going to rear its beautiful head.

]]>I’d say it’s useful to learn the concept of a graphical model and its connection to generative modeling. Not so useful to learn BUGS itself. There are still some problems where it’s easier to use than Stan, like many missing data problems, and problems where you just can’t use Stan, like literally modeling missing count data. But often these programs where it’s easier to write the model in BUGS than in Stan won’t fit in BUGS.

If you must use something, at least move to JAGS, which is much more robust than WinBUGS and also portable to platforms other than Windows.

]]>+1

I have been pretty sold on the philosophical advantages of Bayesian methods over classical methods for a while (thanks in large part to this blog), but my math background is not great. Statistical Rethinking was exactly the book I needed, and the youtube lectures are a perfect compliment to the text.

]]>It’s even worse than what Andrew described. First, modeling has been reduced to a programming exercise. Second, people believe they have “all the data.” Third, “big data” makes it ok to ignore biases and missing not at random.

At a recent talk, I showed a DATA -> OUTPUT -> ACTION framework, and observed that hard skills are more important in the first arrow and soft skills in the second (although both are needed throughout). An audience member asked: if he’s only interested in “data science,” can he focus on the first arrow and only hard skills?

]]>Curious:

I understand your concerns. But, for better or worse, things are going in the opposite directions, with the increasing popularity of nonparametric methods, typically called machine learning. These methods are essentially impossible for most students to understand—they’re much more complicated than linear regressions or generalized linear models.

]]>I will accept your point, to a point. What I observe in reality is a bit more complicated:

1. People with moderate understanding of basic statistical modeling and some understanding some complex econometric functions are given automated ensemble methods with a bunch of defaults.

2. The problem I see with this is that there is very little ability to understand the outcomes of this process and literally no understanding of how to determine whether it is a sensible model or where to make changes if it is not. It is assumed sensible by the mere fact that it produced parameter estimates.

3. I am not saying this has to be the case, but I find it to be the case far more often than the person who is highly skilled at a wide number of modelling methods including distributional functions and data simulation who simply does not like to code (though clearly those types of people do exist).

]]>The challenge is that you will likely spend time and effort learning calculus (and analysis and linear algebra and complex analysis) to get the xx% that you actually need.

This is probably not there yet https://github.com/betanalpha/stan_intro/blob/master/stan_intro.pdf but what was interesting was it seemed to focus on just what you need for model based Bayesian statistics.

]]>Another potentially helpful idea:

In calculus, the two main concepts are derivative and integral.

In the “standard” approach, each one is a notation for a *process* of getting closer and closer to some quantity.

In the “nonstandard” approach, each one *is* a particular *object* namely the ratio of two differences or a particular sum of numbers.

nonstandard approaches invent new infinitesimally small numbers that let you take a “plug into a formula” approach where standard calculus uses standard numbers, but has to take a “there exists a sequence of things that gets ever closer to…” approach.

If you like to “plug in” a value to a formula, you are an algebraist ;-) if you like to think about how you could get ever closer to something by repeating something over and over again, you are an analyst.

]]>There’s also this: https://www.amazon.com/Nonstandard-Analysis-Dover-Books-Mathematics/dp/0486432793/ref=sr_1_2?ie=UTF8&qid=1493057593&sr=8-2&keywords=nonstandard+analysis

But it is really more of an advanced thing. I love it personally, but I wouldn’t recommend it as a first calculus book.

]]>Pepe:

That’s right, no need to learn Bugs.

]]>A part of me really wants to recommend calculus based on nonstandard analysis. Downsides are that it’s not the usual way things are taught, upsides are that I think it really matches better with the kind of reasoning that is important for applied people.

There are two books I can think of and I haven’t used them to teach, but they’re out there and they’re free or cheap, and you could look through them and see if any of it helps:

Keisler’s text is available online:

https://www.math.wisc.edu/~keisler/calc.html

And this book is very cheap and comes in a Kindle version:

https://www.amazon.com/Infinitesimal-Calculus-Dover-Books-Mathematics/dp/0486428869

I’ve read the second one and I found it pretty reasonable as a way to introduce calculus ideas.

Note that these will appeal more to you if you are more of an “algebraist” than an “analyst” but since you don’t know any calculus you’ll probably need some explanation of those ideas.

from this: https://www.quora.com/Why-do-so-many-algebraists-hate-analysis

“There’s a crispness to the algebraic side of things that I miss when I venture into the analytic realms. In algebra, a property either holds, or it doesn’t; whereas in analysis, the ballgame is frequently about getting ε-close. And speaking only for myself, the bits and pieces of analysis which I’ve found most fun have this crisp algebraic character to them”

The nonstandard analysis approach essentially invents a whole bunch of numbers that lets someone who likes equals signs deal with the idea of “closeness”

Now, Bayesian stats is a lot like that… dealing with a model in which you’d like for something to be equal, but you know that the best you can do is an approximate closeness that can’t be reduced to zero.

In that sense, I think nonstandard approaches to calculus kind of match what you need for Bayesian stats.

In the end though, what you need is very particular to *your* needs. so check it out, but ymmv

]]>In terms of math, I don’t think the mere presence of equations is bad. What is harmful is when equations and mathematical logic substitute understanding and intuition. I love applied statistics because you can’t succeed by pulling out a formula sheet and pressing the button. The math gives you a nominal standard, and it is the aberration from the norm that generates the insights.

]]>clarification: “with the most probable outcome having **f(x) = 1**”

]]>Specifically, to really GET the Bayesian modeling viewpoint, change your view of probability from “randomness” to “plausibility”.

In a regression problem y = a + b*x + error, if someone told you the “right” values for a,b how plausible is it that y – a – b*x = error would be near zero? How about near 1? near 300? near -3?

Draw a curve that describes relative plausibility of different outcomes with the most probable outcome at 1 and anything less than that below 1. Call this curve f(x).

Now, instead of determining the scale of f(x) by the fact that the maximum value = 1, re-normalize the whole thing so that the total plausibility = 1… Do this by dividing the whole curve by Z = integrate(f(x),x,-inf,inf). Call this curve f(x)/Z = p(x) the probability density.

There’s nothing in here about “how often” anything.

]]>+1

]]>