[Beginner-Help] Typing & Confusing Errors

bhoot · November 22, 2022, 4:16pm

Is there any other ways around it in regards to the external to get past that?

Basically, what you discovered (typing return value as unit) and what @fham suggested (multiple bindings).

Though I personally would stick to the correct return type in the binding, and ignore the returned value if I am not using it. So, if cd does return string, then

// binding
external cd: string => string = "cd"

// usage
let _ = cd(dir)

DanDan · November 22, 2022, 4:37pm

Most languages I use (ruby, js, scala, erlang/elixir, lisp), all tend to return the last expression, but don’t expect that side effects consume the values of the intermediate expressions. even in ocaml, you’re looking at in to chain the local scope bindings here… It’s just that it’s a silly trip-up here that I wasn’t expecting because I didn’t see any notation in the documentation, specifically because I was out here rewriting typescript into rescript.

It’s a nice guarantee that we’re not using the values, but also it’s ugly as hell imo .

fham · November 22, 2022, 5:07pm

It is exactly the same as in OCaml.

If you literally translate your example, it would look like this:

let cloneInTemp (temp : string) =
  let _ = cd temp in
  exec ("git clone git@github.com:myorg/myrepo.git")

OCaml playground link

EDIT: Sorry, the OCaml playground link does not work for some reason, it leads to the vanilla playground

bhoot · November 22, 2022, 5:08pm

OCaml would force you to do the same thing. So, in

(* OCaml code *)
let cloneInTemp temp = 
  let x = cd temp in
  exec "git clone git@github.com:myorg/myrepo.git"

you are still binding the return value. The alternatives

(* OCaml code *)
let cloneInTemp temp = 
  let _ = cd temp in
  exec "git clone git@github.com:myorg/myrepo.git"

(* OCaml code *)
let cloneInTemp temp = 
  (* cd temp; *) (* The commented expression is incorrect: https://forum.rescript-lang.org/t/beginner-help-typing-confusing-errors/3962/10 *)
  cd temp |> ignore;
  exec "git clone git@github.com:myorg/myrepo.git"

still make you choose to ignore the return value.

It’s just that it’s a silly trip-up here that I wasn’t expecting because I didn’t see any notation in the documentation, specifically because I was out here rewriting typescript into rescript.

Agreed. The error looks cryptic without the pre-requisite knowledge.

It’s a nice guarantee that we’re not using the values, but also it’s ugly as hell imo

While ugly is subjective, all I can say is that you will get used to it. If you see it as a con, then it will look like a small trade-off against all the advantages of ReScript.

Ryan · November 22, 2022, 5:23pm

Just to note, cd temp; (with the semicolon) ~~won’t work~~ gives a warning in OCaml. ; is an expression separator, but your cd is returning string not unit. So it is trying to put a non-unit type expression on the left-hand side of a sequence, which won’t work.

You would still need to ignore it if you want to use the semicolon: cd temp |> ignore; or ignore @@ cd temp; (or like you do with let _ = blah in ...).

Edit: see below, but it gives a compiler warning, which can be treating as an error in certain situations.

bhoot · November 22, 2022, 5:26pm

True. My bad. I forgot the semantics.

Ryan · November 22, 2022, 5:44pm

Actually, you were correct, you can use the semicolon with a non-unit returning function like you show. It isn’t an error, rather a warning (warning 10), but dune, the build system, promotes warnings to errors in certain profiles.

Ryan · November 22, 2022, 7:03pm

Here’s where you are going wrong I think. cd(temp) and exec(...) are expressions, not statements.

Let me just write it like this so it’s clear:

let someValue = {
  expression1
  expression2
  expression3
}

In that let-binding, there is a scope delimited with {}. In that scope there is a sequence of three expressions, expression1, expression2, and expression3. And in a sequence of expressions, they are evaluated in order and the value of the final expression is returned (docs, in OCaml, but there if you need it). Note that the expression sequencing is implicit, meaning that you don’t use an expression separator to separate them as in OCaml, so you need the brackets to define the scope of the expression as a whole.

In ReScript there are certain rules you have to follow when sequencing multiple expressions like that. Any expression on the left hand side of a sequence must evaluate to unit. That means in the above example, expression1, and expression2 must have a value of unit, whereas expression3 can be any value. Note that the value of the expression sequence is the last expression in the sequence (thus that mention in the docs about the value of the last expression is implicitly returned in a block/scope).

I actually don’t think it’s a silly trip up on your part…unless you’re coming from OCaml, I don’t think it’s obvious those rules about sequencing. And unless you dig through ReScript compiler github issues, even if you know how expression sequencing works in OCaml, you may not realize that ReScript is more strict with what it allows.

I also couldn’t find this in the docs anywhere, but here are some quotes/links from ReScript compiler github issues:

In general the typing rules are more strict expr1; expr2, the expr1 is expected to have type unit

(link)

in recent releases we already do a strict check over sequences of expressions: e0 ; e1, it will assume e0 to have type unit

(link)

In OCaml, if the value of an expression on the left-hand side of a sequence is not unit, it will be a warning (warning 10), and as far as I can tell, this was the case in older versions of Bucklescript too (see here). However, now in ReScript, it is always an error (link).

So back to your example.

You have a let binding for a functional value:

In that let-binding there is a sequence of two expressions, cd(temp) and exec(...). We now from the ReScript rules shown above that left-hand side expressions in a sequence must return unit. However, the cd function returns string. Thus the error: Somewhere wanted: unit.

glennsl · November 22, 2022, 7:49pm

Just to add that the rationale for this is that if the return value is not unit you’re throwing away a value. Throwing away the return value of pure (side-effect-free) functions is always an error, because why would you evaluate a pure function if you’re not going to use its result? And almost all of your code should be pure, because throwing side-effects around left and right is bad juju. It makes your code hard to reason about, and therefore prone to bugs.

Also for that reason, functions that (primarily) perform side-effects should return unit so that you can easily distinguish them from pure functions, and get the benefit of this warning if you do make this kind of mistake, which we all do now and then.

This doesn’t make it any easier to use JavsScript functions that have been created in a very different environment, but it does make perfect sense internally in the ReScript environment.

DZakh · November 22, 2022, 8:19pm

I agree, that the feature is nice, but would be nice to have a more guiding error message.

mouton · November 23, 2022, 5:38am

glenn how do you mean effectful functions should return unit… How about network requests, async, etc?

glennsl · November 23, 2022, 9:50am

Sorry, I could have phrased that better. What I mean is just that functions whose main output is a side-effect should avoid also returning something that can usually be ignored, for convenience typically. That way, throwing away the output isn’t just a chore, but something meaningful that should be communicated in the code. For example, if it’s a function that can fail, and returns a result to indicate that, explicitly throwing away the result communicates that this call is thought to never fail.

For functions whose main purpose is to retrieve some value through a side-effect, like network requests and other async functions, they should typically return a promise to indicate that. But there are exceptions to this too of course, like functions that generate random numbers that are returned immediately. Either way, in the context of this discussion these are all functions whose return values should not just be thrown away.

cristianoc · November 23, 2022, 1:06pm

There’s a natural insertion point for such a custom error message.
A “statement” is an expression either on the lhs of a sequence, or the body of a for/while loop.
The insertion point is where this is unified with unit.

What’s less clear, is how give an error message phrased is a way that is helpful to someone new to the language. What phrasing would have helped?

DZakh · November 23, 2022, 1:12pm

I suck at making error messages myself . But I remember one tweet. I think the recommendations are good and applicable to our case as well.

I think we should describe what caused the error and also provide some code examples of how to make the code work.

hoichi · November 23, 2022, 3:45pm

How verbose do we want to go? Off the top of my head:

A value of type string is being ignored. If you mean it, please use cd(temp)->ignore to make it explicit

(I’ guessing here ignore is more idiomatic than let _).

Not sure if this is going to be enough for all the newcomers though. So maybe ideally, we need actual error codes, like TypeScript, or short links like those included in (some of?) React error messages. But that’s beyond the scope of this thread, of course.

EDIT: Grammar

DZakh · November 23, 2022, 5:50pm

What do you think?

 9 │ 
10 │ let cloneInTemp = (temp: string): string => { 
11 │   **cd(temp)**
12 │   exec("git clone git@github.com:myorg/myrepo.git")
13 │ }

The return value of the highlighted expression is not used.
Assign the value to a variable, or pass it to the `ignore` function to silence the error.
Example:
let variable = expression
expression->ignore

woeps · November 23, 2022, 10:19pm

Since this thread already started a wider discussion, I wanted to state:

I always prefer to use the pattern let _nameOfUnusedValue = <domeSomethingWhichValueWillBeIgnored> over using ignore.
On several occasions, we needed to debug for some time just to find out, an ignore statement (often used by more inexperienced colleagues) “swallowed” an unintentionally partially applied function. Either due to an incorrect usage of the function in the first place or missing adaptions after refactoring.
Having the expression written without ignore at least simplifies the verification to just hover over it in the editor.

Worth noting: It’s always good to think about why you would intenionally ignore a value and what this should mean to your application logic. E.g. are you missing some error handling? Are you having the right assumptions?
In many cases this error just points to the fact that your logic is incomplete.

In this threads example cd eventually could fail. To make the code more resiliant, the returned string should be evaluated if it represents success/failure and e.g. either throw an exception or return unit.

To summarize: I think this is an important error to report, although I’m not quite sure on how to pack all the possible implications into an easy to understand error message for everybody.

glennsl · November 23, 2022, 10:47pm

I don’t know if ignore is idiomatic, but I certainly don’t think it should be. At the very least I would prefer the suggestion of let _ = ... over ignore, since that can more easily be annotated with a type to prevent accidental partial application.

But mostly I just think bad APIs ought to be fixed (looking at you Promise )

DanDan · November 29, 2022, 3:00pm

I actually think that’d be helpful here. The issue for me, specifically was “somewhere wanted unit”, but the somewhere wasn’t defined, which makes me want to debug in other areas.

I think you’re correct. The actual code is wrapped in a try block to catch the JS errors that may pop from using shelljs (the library containing cd). It is incomplete logic! The thing with js interop for a beginner here is the js method cd returns a cwd, the method will return a string and not a null nor undefined value, but we’re saying, in this case, the method is expected to return unit, but when in other contexts, the same method’s external signature may be string depending on the needs of the file, which I think is a bit of trouble to navigate when you’re interacting with libraries. Either you put a much larger interface over the js interop, or you have wonky bindings.

One other thing that I might be interested in (at least to the above), isn’t this how the semicolon would work in ocaml? I mean, practically it just indicates to discard the left value as a side effect (which, in essence, is how the expressions work in my example). Now, that’d be confusing as hell for a js dev to sometimes use semicolons, but just another way here if I remember correctly.

woeps · November 30, 2022, 10:48pm

You’re right. There are roughly 4 categories of bindings / libs:

“zero-cost” bindings (no wrapper, just externals), which try to stick as close to the actual js type representation as possible. (like cd returning a string) Users of such libs/bindings need to do their homework on how to safely interact with such
“zero-cost” bindings (no wrapper, just externals), but with fancy type trickery to improve type-safety
“thin wrapping” libs, which try to stick close to the js variant, but introduce some (minimal) additional code, to make interactions type-safe, where necessary (often still providing access to just the bindings (externals)
complete abstractions of underlying libs, which might introduce their own api surface

Since performance and bundle-size seem a common concern in the rescript community many (if not most?) libs try to take the first or second approach.

I find myself using approach 3 most of the time. - I don’t care (and don’t need to) for a minimal perf impact, if I can get every day life improvements out of it.

If you’d try to achieve your task in pure js, you still would need to verify the result of cd, if you care about correctness and defensive programming. The only difference in rescript is, that it becomes more obvious if you don’t.
Therefore, I would include the validation of the result in the binding-lib, if I’d have to write them myself. Probably providing an interface like let cd: string -> Belt.Result.t(unit, string) or similar.

I guess you already knew, but just wrapping the whole block with a try statement won’t help in certain cases: e.g. cd fails because dir doesn’t exist - due to no validation, program continues - next operation gets executed (maybe some file operations? e.g. create some files?) - now some invariants of your program are violated, but you wouldn’t get an exception

Regarding the ocaml syntax. I’m absolutely not well versed at ocaml and therefore not comfortable answering your question. But I believe, semicolons are not ment to just disregard (ignore) any expression.