Should there be "active" whitespace?

spike · January 29, 2022, 3:10pm

Should whitespace, outside of a string-like thing, change the meaning of a program?

Here’s some code:

let f = x => x + 3;
let g = u => 2*u
let i = {
  let h = g
  (4 + f(2))
}

and here’s the corresponding Javascript

function f(x) {
  return x + 3 | 0;
}

function g(u) {
  return (u << 1);
}

var i = 9;

exports.f = f;
exports.g = g;
exports.i = i;

Now let’s edit that code to remove a little whitespace:

let f = x => x + 3;
let g = u => 2*u
let i = {
  let h = g  (4 + f(2))
}

The corresponding JS looks like this:

function f(x) {
  return x + 3 | 0;
}

function g(u) {
  return (u << 1);
}

var i;

var k = 7;

exports.f = f;
exports.g = g;
exports.i = i;
exports.k = k;

Notice the difference in i.

I find this peculiar and slightly annoying. It’s not just any whitespace that causes trouble – it’s a newline that does it. The extra blanks between “g” and its argument seem to have no effect. It’s part of why I loved the semicolon in ReasonML.

Is this really a deliberate design choice?

–John

johnj · January 29, 2022, 3:57pm

You can use the semicolon in ReScript. This will compile to what you want:

let f = x => x + 3
let g = u => 2 * u
let i = {
  let h = g; 4 + f(2)
}

Without a newline (or a semicolon) between g and (, the compiler sees it as a function application: g(4 + f (2)). The spaces between the g and ( are ignored.

AFAIK, it is deliberate design choice to make ReScript sensitive to newline characters. It’s the tradeoff to making semicolons optional, since newlines separate expressions.

spike · January 29, 2022, 4:38pm

Thanks. I get most of what you said.

But “newlines separate expressions” seems a little ambiguous to me. in the example above, where g is an int => int function, is g(4) an expression? is g an expression? Why is it OK to write

let i = g(
4)

for instance? Surely g( is not an expression, so “newlines separate expressions” can’t be strictly true.

Is there a BNF for ReScript somewhere?

BTW, while playing around with this question, I came to the horrible realization that let n = 3 + + + 4 is perfectly OK in ReScript, which seems to mean that there are both unary and binary versions of +, and minus, too, and even -., so that you can write
let f = 3. -. - -. 4. With enough work, you could probably write Morse code!.

joakin · January 29, 2022, 7:42pm

Yes, g(4) is a function call expression.

Yes, g is an identifier expression.

When the parser sees an expression followed by maybe spaces and (, it starts parsing the function call where newlines are not significant.

For example, if you add a newline between the identifier and the parens:

let rec test2 = (n) => {
  test2
  (n)
}

the body of the function is the first identifier expression, and the second parenthesized expression (which is why the compiler complains about test2 not returning type unit, statement expressions must return unit).

If we only have spaces, then it parses as a single function call expression:

let rec test3 = (n) => {
  test3 (n)
}

https://rescript-lang.org/try?code=DYUwLgBATiDGFhAZ0gXggCgHYEoKoD4IBvAKAgWTA3Il1IF9TTRIZ5EUAmfTXfImQqcwXWthyNmraHEooAzLwkCStEUomMgA

It seems like a very deliberate design choice in line of what many languages that don’t use semicolons as line terminators do

tom-sherman · January 29, 2022, 9:46pm

Formatting this code makes it very clear what’s going on, kind of makes this discussion a non issue for me

let f = x => x + 3
let g = u => 2 * u
let i = {
  let h = g(4 + f(2))
}

These kind of syntax edge cases are why I would advocate for ReScript refusing to compile without first running the formatter. Similar to go’s linter IIRC.

yawaramin · January 29, 2022, 11:05pm

Just fyi, both versions of your code snippet result in the following warning:

[W] Line 4, column 6:

unused variable h.

I strongly recommend setting your build to treat warnings as errors. It makes it very unlikely for these kinds of silly mistakes to slip through. You would have very quickly caught the fact that that code wasn’t doing what it was supposed to.

spike · January 30, 2022, 11:06am

I completely agree, yawaramin. But my students do make mistakes, and this is actually a condensed-to-minimal-form case of something that happened to one of them. Also, everyone seems to think that what’s wanted is the value of g(4 + f(2)); in the student example, what was wanted was to define h, and then evaluate a second expression, which is in this case I’ve written as (4 + f(2)). “But why would anyone write those parens???”, you might ask. Well, sometimes operator precedence is confusing, and students do things like this to make certain things are computed in the order they intended. Anyone who’s programmed in Ocaml, where every function has a single argument, but currying happens all over the place, has surely suffered from trying to remember when/where parens are needed. This is that same phenomenon, just happening to a beginner in a language that seems “natural” to you because you’re used to a lot of others.

tom-sherman · January 30, 2022, 3:52pm

My point was more about formatting removing ambiguity and prompting an opportunity for learning. “Oh, my code has changed to a function application after I’ve run the formatter - something is wrong here as that wasn’t my intention”.

I generally disagree with the idea that significant whitespace is difficult to beginners, the chief counter example being python of course - a language where whitespace is very significant and one regarded by many as easy to learn for beginners because of it’s omission of syntax like semicolons. In any case, I think that teaching “spaces and newlines have different meanings” isn’t a massive leap.

Going back to your original post, I’m pretty sure it’s very deliberate. One of ReScript’s goals is to be easy for JS devs to learn. This kind of syntax quirk would be expected from developer’s of that background.

spike · January 30, 2022, 5:08pm

I got your point. Of course, if you want to prove anything about your language, it’s nice to not to also have to do proofs about the possible outputs of an auto-formatter that may have many different parameters/options. Maybe that’s the kind of thing only crazy mathematicians worry about.

“Spaces and newlines are different” isn’t a massive leap. Indeed, it’s so small, and so rarely matters, that a student is likely to never grasp it … until it bites them in the butt. Python’s insistence on space-use makes it a constant chivvy, and you pick it up very fast.

Of course, if your experience teaching students these languages differs from mine, so be it.

kevanstannard · January 31, 2022, 2:16am

Hi @spike

This doesn’t address your concern about the fundamental language syntax - it’s a good discussion, but is it an option for your students to use the formatter as @tom-sherman mentioned? E.g. VSCode with Format on Save enabled?

It’s an essential tool for ReScript development, and I suspect all of the difficulties you’ve encountered are either automatically fixed, or highlighted as problems when formatting is applied.

spike · January 31, 2022, 6:06pm

Sure; they can and (mostly) do use the formatter. That doesn’t prevent them getting into trouble when something screwy happens and Visual Studio Code decides that the formatter can’t run any more, etc.

I woke up today wondering about the term “formatter”, since when it’s applied to my example program, it actually changes the semantics. That’s a richer (?) definition of “formatting” than I’m used to.

tsnobip · January 31, 2022, 6:35pm

Well the opposite, it doesn’t change the semantics, it shows it more clearly!

spike · January 31, 2022, 7:07pm

Formatting my first example in VSCode led to this, i.e., extra spaces around the asterisk in the definition of g:

let f = x => x + 3
let g = u => 2 * u
let i = {
  let h = g
  4 + f(2)
}

I had mistakenly believed, from Tom Sherman’s comment, that it brought the 4 + f(2) up to the previous line, which changes the resulting JS code, hence my comment about semantics. I don’t really see how this alteration of let g makes it clearer, but one person’s clarity is another’s brick wall.

tom-sherman · January 31, 2022, 7:44pm

My formatted version was of the second example, where you removed the newline. The formatter will never change the meaning of your code.

Formatters are built specifically so we don’t have to have this debate