Just be glad for IEEE 754.

1) The SAS folks had to work very hard to get the exact same results on IBM mainframes, DEC VAX machines, and IEEE &54 microprocessors.

2) When we were doing SPEC benchmarks ~1989, we had to use fuzzy comparisons & algorithms whose iteration counts didn’t vary drastically or depend on least significant bit. One benchmark dropped out because it gave noticably different results for VAX, Motorola 68K machines and RISC micros (754).

You may recall regarding PL/I, “The product exhibits some lovely features but, unfortunately, the expression

25 + 1/3

yields 5.33333333333333”

https://web.archive.org/web/20200201104326/https://plg.uwaterloo.ca/~holt/papers/fatal_disease.html

]]>And the long form of this quote: “There are only two hard things in computer science: cache invalidation and naming things and off-by-one errors.”

]]>If someone is programming, they have to learn at least a little bit about these things, that’s just the way it is. I used to prefer interpreted languages, but I despise R so much I have come around to prefering strongly typed languages.

Type casting can have catastrophic consequences

]]>> integers aren’t closed under addition, making the expression “1 / 2” a thorny issue

pretty sure you meant aren’t closed under *division*, but yes.

I guess what I meant was that a typical “modeler” is thinking about real numbers, which the integers are a subset of. So if you already admit the entirety of the real line into your concept of “number”, then there’s no distinction about “the integers are special”. But the CPU makes this distinction. The distinction is artificial as far as a person working with real numbers is concerned. It doesn’t “help” them in any way mathematically, on the other hand, computationally it’s entirely a different set of instructions and different speed.

]]>Obmention of Leopold Kronecker’s quote (or at least attributed quote): “God made the integers; all else is the work of man.”

]]>The natural numbers (0, 1, …) and integers (…, -1, 0, 1, …) are very natural mathematically. So much so there’s a commonly used mathematical notation (N and Z). Their algebra’s different than that of real numbers. For example integers aren’t closed under addition, making the expression “1 / 2” a thorny issue. In C++, you get integer rounding, and the result is 1; in R, if you evaluate “1 / 2” you get 0.5.

But the real problem here isn’t what the ideal mathematician would do, but rather what a programmer in 2020 has to do on their computer to get code to work efficiently and transparently (and by transparently, I mean according to the IEEE 754 spec, which is where the float64 behavior is defined).

]]>Note that this isn’t the R language or even the standard library, but rather the tidyverse. Base R does most of that via indexing for values and names() for names.

]]>it’s baked into the CPU but it’s mathematically completely artificial. Real numbers contain the integers as a subset. people want real numbers, the closest they get is float64.

]]>Floating point absolute value for those who aren’t C++ or Stan coders. In retrospect, I see I made a mistake in just following C++ function naming conventions in Stan. I should’ve just went with:

real abs(real) int abs(int)

I still plan to deprecate `fabs` and go back to that. Part of the reason we didn’t do that was the C99/C++03 conflicts that have been better sorted in C++11 and our getting better at traits metaprogramming.

Instead, what we have in Stan now is

real fabs(real) real fabs(int) int abs(int)

But the middle one’s not necessary anywhere.

I hadn’t quite anticipated how hard users trained in R would find the distinction between integers and floating-point value, which is fundamentally baked into our CPUs.

]]>D’oh. I think I was fooled by the tone, or at least my recollection of it, in which he stumbles down the stairs in shame. The show really nails that furtive, outsider feeling.

]]>@mpledger: That’s Hungarian notation for R.

@jim: I should have clarified that I meant for writing programs, not interactive scripting in a REPL environment like R’s or Python’s.

Research code done for exploration is a bit diffeent than that done for a submitted paper, but neither demand the kind of naming effort required for a code base with a couple dozen developers.

For reference, it ight be easy to type this with autocomplete:

very_long_variable_name_one[my_first_long_index, my_second_long_index] = very_long_variable_name_one[my_first_long_index, my_midde_long_index] * variable_long_variable_two_name[my_middle_long_index, my_second_long_index];

but it’s hard to see the basic structure compared to:

a[i, j] = a[i, k] * b[k, j];]]>

We have one survey dataset in SAS with somewhere upwards of 1,700 variables (it’s a merge of multiple surveys, each repeated at several points in time in a “wide” format). If we still had 8-character variable name restriction in SAS I’m not sure what the heck we’d be using for names.

]]>In R, I’ve tended to extend the convention that dataframes are called something.df. So a gam model is something.gam and a nlme model is something.nlme. Since R is all about objects then it’s nice to know to have the type of object in the name.

In surveys/questionnaires/medical charts, I got taught the convention of having the question number as the start of the name. So, question 4 in section B which asked about diabetes would become B4_diab. But that was back when SAS only allowed 8 letters in a variable name but I think that is still useful even when variable names can be longer.

]]>If you use autocomplete and name “backwards” (specific to general):

subsubgroupname_subgroupname_groupname_supergroupname

autocomplete is your friend and length doesn’t matter that much.

]]>multiply_lower_tri_self_transpose

]]>covV

pd

typ

dcv

iir

rwQuant

fsQuant

Qual

covS

InclV

amb

Xamb

x

y

InclH

PRI

InclS

VeriCov

vp

expr

lastrow

res1V

res1H

res1

res2V

res2H

res2

resH

resV

stats

veriplot

whiteout

Although in fairness, the two most egregious ones (x and y) were chosen because they are the de facto standard variables in the literature for the calculations I was implementing.

]]>