Automatic module inference when piping typed data?

redbar0n · February 3, 2022, 10:23am

Pipes are used to emulate object-oriented programming. For example, myStudent.getName in other languages like Java would be myStudent->getName in ReScript (equivalent to getName(myStudent)). This allows us to have the readability of OOP without the downside of dragging in a huge class system just to call a function on a piece of data.

Small example taken from here:

module Articles = {
  let root: (
    ~offset: int=?
  ) => string = (~offset=0) => {
    let offset = offset->Belt.Int.toString
  }
}

It would beautify a lot of code if we would be able to remove inline module references like Belt.Int. to have this instead:

module Articles = {
  let root: (
    ~offset: int=?
  ) => string = (~offset=0) => {
    let offset = offset->toString
  }
}

Could the compiler not figure out that since offset is declared as an int, then calling ->toString on it should naturally use the Belt.Int.toString function?

In line with the quoted spirit of the language manual.

zth · February 3, 2022, 11:04am

You can do open Belt.Int above offset->toString to clean up the actual code.

The compiler needs to be pointed in the right direction though - things need to be in scope for it to be able to infer properly. And that’s what open does, bring a module into scope.

redbar0n · February 3, 2022, 11:51am

Yeah, I know about open. My question is more about if it is possible to automatically bring modules into scope. Either based on the type of the variable, as mentioned. Alternatively based on naming convention:

roomStore->RoomStore.create(room)
// could turn into:
roomStore->create(room)
// since the variable is named the same as the module

zth · February 3, 2022, 12:04pm

My 2c: I do agree with you that code can end up looking quite cluttered, and that it can be annoying to have to open multiple things. However, I think the trade off is good in this case. I personally think it’d be quite confusing to automatically bring things into scope, and on the contrary I think the explicitness is preferable here compared to the alternatives.

My layman’s opinion is also that it’d be really hard to build something robust for that other than for the most trivial cases. Inferring module names from variable names is also not desirable imo, it’d tempt developers into naming variables according to what type they are, rather than what function they have in the code. And it’d only work or the first binding/a single binding.

Worth noting also is that I’m guessing that one of the reasons the compiler can stay fast is that it doesn’t do potentially expensive lookup on modules not explicitly brought into scope. Imagine with a solution of “look for a module with a function called X that operates on the type Y” - the compiler would need to do that lookup for any function call that isn’t explicitly annotated with a module, and it’d potentially need to walk through all defined top level modules in the project. That most likely won’t scale that well.

I think the current autocomplete behavior for pipes is along the lines of what’s desirable - a heuristic for the typename t itself.

johnj · February 3, 2022, 12:13pm

This kind of inference is called “modular implicits” and it’s currently being researched for the OCaml compiler. My understanding is that it’s a lot more complicated than it looks, though (for example, multiple modules could have the same types and functions), and it’s several years away from being added to the language. It’s an interesting area of research, but I wouldn’t hold my breath on ReScript adding it any time soon.

I personally don’t think explicitness is a bad thing, and that it probably makes the code more readable to see exactly what module is being used. If you’re worried about verbosity, I usually alias modules with shorter names:

module I = Belt.Int

let offset = offset->I.toString

yawaramin · February 3, 2022, 6:31pm

Inference doesn’t work that way. Check Why doesn't the compiler infer the types for Array.map? - #6 by yawaramin

Based on that, the annotation doesn’t actually say ‘this is declared as an int’, it says ‘infer the type yourself and if the inferred type doesn’t equal int, then throw a type error’.

The compiler is actually doing very little work, as zth pointed out. Once it has all the information it needs e.g. which modules names are coming from, it just runs the type inference algorithm and figures everything out.

redbar0n · February 3, 2022, 10:05pm

the annotation doesn’t actually say ‘this is declared as an int’, it says ‘infer the type yourself and if the inferred type doesn’t equal int, then throw a type error’.

Yes, but the compiler still knows that it was declared as an int, because it independently inferred it, checked, and didn’t throw a type error. So at the time offset is later referenced, the compiler would have this knowledge. Right?

yawaramin · February 3, 2022, 10:12pm

Could it do all that? Perhaps. But it would likely take massive changes to the internal design of the compiler. I don’t really see a big appetite for that If you look at the history of the ReScript compiler, it has been a series of small incremental changes to get better and better JavaScript output, while sticking to the philosophy that explicit is better than implicit, and fast compilation is a core requirement. Can a lot of things be done to ReScript to make it more like TypeScript? Sure, and people ask about that from time to time. But ReScript has its own design philosophy, so I would try to refocus on building and shipping cool stuff with it, rather than posing a bunch of hypotheticals

redbar0n · February 3, 2022, 10:33pm

I’m not posing a hypothetical. I am trying to understand how the type inference actually works…

The explanation you gave didn’t make sense to me because the result should logically be the same whether or not annotations inform the compiler or is simply a point of reference against which the compiler verifies it’s own inference.

yawaramin · February 3, 2022, 11:25pm

That’s a hypothetical.

As mentioned before, the compiler doesn’t track which functions are available in all modules that fit the expected input and output types during type inference. That would introduce a massive amount of implicit search. And a lot of ambiguity if multiple modules had functions of the given types available. Which one should the compiler pick? It avoids all that complexity and just asks the developer to explicitly pick the exact function. This also makes the code a lot more readable because you always know where a function is coming from.

You can check how slow the Scala compiler is. Part of that is because of exactly this reason–it has to search for implicit conversions in the environment.

redbar0n · February 7, 2022, 12:15pm

That’s a hypothetical.

Well, sure. Though I don’t know what the actual capabilities or limitations of the compiler are (as you may do). What I meant by saying I’m not posing a hypothetical, is that I’m not asking these questions just for the sake of “posing a bunch of hypotheticals”, as I got the impression that you implied. I am asking because I am evaluating the capability of ReScript, to see if I want to use it to “building and shipping cool stuff with it”, which in my case is for a big multi-year project that is intended to go into production. (I am curious about these things as I happen to care a great deal about the readability of the code, especially as it scales to hundreds of lines).

redbar0n · February 7, 2022, 12:21pm

the compiler doesn’t track which functions are available in all modules that fit the expected input and output types during type inference. That would introduce a massive amount of implicit search. And a lot of ambiguity if multiple modules had functions of the given types available. Which one should the compiler pick?

I imagine the compiler could keep an index, and do a lookup in O(1) time, so it wouldn’t involve a massive search during parsing… On name conflicts, it could give a compile time error, so the programmer could include just enough code to disambiguate?

yawaramin · February 7, 2022, 2:25pm

But keeping and updating an index is not exactly free, right? Have you ever over-indexed a database table and slowed down a query? Same problem.

It also massively complicates type inference and checking when you can’t just assume the final type of an expression is exactly the same as its apparent type when inference begins, and instead you have to constantly do bookkeeping about what the appropriate type might be depending on what functions are being called on it.

I understand you’re trying to evaluate ReScript for a real-world project but I honestly think talking about large internal changes to the compiler is not a useful way to do it. My advice would be to try out an actual, low-risk project with it and then do an analysis of how it went. Plenty of people have done it that way, and I think it’s been a useful method.