[Proposal] Untagged Variants for ReScript

I really like this, it definitely solves a pain point for bindings.

One note: using
Symbol.toStringTag may end up being easier to extend than instanceof and/or typeof, (for example DOM prototype elements implement this according to MDN) in addition to being a more “modern” API.

Any idea about performance implications? This could turn out useful for half complex values, but perhaps not for the current values, and complex ones. E.g. it can’t be used to determine the shape of objects.

I didn’t before you asked but it looks like a simple case is quite a bit slower (~96% on my iPad) unfortunately

Perhaps not all that surprising if v8 actually has to perform the function call as it can be overridden in user code.

1 Like

Not frequently, but it happened once to me when I’m trying to port my MessagePack encoder into ReScript.

I expect the same thing to happen when I do something similar like CBOR or PackStream.

Or it might be useful when using class-based polymorphism, such as classifying the Error instance.

My first impression is that this seems like an overly complex feature with too many edge cases that doesn’t align very well with rescript’s existing type system. And it might be a good idea to first introduce it in a more limited form, with simpler and clearer boundaries. Such as just covering a limited set of primitives at first. It looks it could be quite useful for interop though, especially by making nullable types more intuitive, and generalizing that concept forms a natural path of progression for learning.

Without considering technical feasibility too much, I think my ideal design would be something along the lines of this:

// structural type definition
type t<'a> = [
  // keywords
  | null
  | undefined
  | unknown

  // literals
  | 42
  | "foo"

  // primitive types
  | #:string
  | #:number
  | #:bool

  // regular tagged union constructors
  | #ok('a) // compiles to { kind: "ok", ... }

  // possible extensions
  | #:instance<Error: Error.t> // will match `instanceof Error` and refine as `Error.t`
]
// pattern matching
switch x {
  // keywords
  | null => ...
  | undefined => ...
  | unknown => ... // refines `x` to built-in `unknown` type

  // literals
  | 42 => ... // this would match exact literal constructors, as well as `#:number`
  | "foo" => ... // this would match exact literal constructors, as well as `#:string`

  // primitive types
  | #:string => ... // `x` here might be refined to be of type string to be used directly
  | #:number as n => ... // or a new binding `n` of type `float` could be introduced using `as`
  | #:bool => ...

  // regular tagged union constructors
  | #ok("foo") => ... // nested pattern matching on contained values

  // possible extensions
  | #:instance<Error> as error => ... // `error` refined to `Error.t`
}

One of the design goals has been to not introduce new syntactic constructs.
The proposal used to add a single character to the language. Now it adds zero.

1 Like

I already miss the feature. It will make working with polymorphic values more convenient.

Right now I need to do this thing:

Which is very inconvenient to do pattern matching on…

4 Likes

Now for an advanced example of what one can do with this:

Unmarshalling Binary Data and Computing the Sum of an OCaml List

In this post, we’ll explore how to read binary marshalled data produced by an OCaml compiler and perform high-level list operations on it. Here’s a step-by-step explanation of the code:

First, we have a native OCaml program that creates a list of integers, marshals it to a string, and saves that string into aaa.marshal:

let foo v =
  let s = Marshal.to_string v [Compat_32] in
  let ch = open_out_bin "aaa.marshal" in
  let () = output_string ch s in
  close_out ch

foo [1;2;3;4;5]

Next, we read the marshalled file, unmarshal it (using jsoo), and pass it to the sum function. The sum function operates at a high level on lists, but its definition uses the runtime representation that corresponds to OCaml’s runtime representation for lists:

let s = caml_read_file_content("./aaa.marshal")

@unboxed
type rec myList<'a> = | @as(0) Empty | Cons((unknown, int, myList<'a>))

let v: myList<int> = unmarshal(s)

let rec sum = l =>
  switch l {
  | Empty => 0
  | Cons((_, i, l)) => i + sum(l)
  }

Js.log2("v", v)
Js.log2("sum:", sum(v))

To see what the runtime representation looks like, the first log gives:

v [ 0, 1, [ 0, 2, [ 0, 3, [Array] ] ] ]

v is a nested array where the first element is always 0, the second element is the integer, and the third element is the next item in the list.

The sum function walks the runtime representation directly and gives the correct sum 15.

This approach demonstrates a neat technique for working with OCaml’s runtime representation to manipulate lists while maintaining high-level abstractions.

2 Likes

I’m not familiar with the syntax near Cons. Specifically what can go in the unboxed choices? That looks like a very elegant solution to your specific problem.

I experimented with solutions for this feature and tried to figure out how much could be accomplished without changes to the compiler. See this code for unions. Maybe this approach will be useful to someone right now or can help generate ideas when there is time to consider syntax additions. Here’s what I learned:

I had success defining each variant option as a module type, like this. I wasn’t sure what to call this because it really is like any type with a couple useful additions. (1) a guard function, and (2) custom equality.

module type Pattern = {
  type t
  let isTypeOf: unknown => bool
  let equals: (t, t) => bool
}

Define the union like this, where each case is a module name that implements Pattern.

type response =
| NegativeOne
| Null
| Success
| Failure
| Date
| False

Pattern matching would look like usual but the module name is used in place of the constructor.

switch s {
    | Date(d) => ...
    | Success(s) => ...
    | Failure(f) => ...
    ...
}

Using this kind of definition the module is like a type-class instance and it becomes possible to:

  • Discriminate on any programmable criteria like typeof, instance of, type tags for unions defined differently than ReScript, or shape detection, JSON parsing, etc. All is implemented through the isTypeOf function. Validation could be included like ensuring that a Date is not just instanceof Date but is also !isNaN. This could take some of the discrimination logic out of the compiler and into user-land.
  • Safely assign values to the union. In the example above, any Success.t, Failure.t, False.t, etc. can be safely assigned.
  • When you have an unknown value from an external API, you can still safely try to assign it to the union. Each isTypeOf guard is attempted and the result is Some(response) if something works or None otherwise. The make function in my code does exactly that.
  • Equality can be optimized to first ensure the types match - meaning isTypeOf succeeds for both and then use the provided equals function.

Literals, like True or null or "yes" or -1 can be built manually and in a completely type-safe way - see functors for literal types - but compiler support would definitely make it less cumbersome.

My code for this all works, but because there is no compiler support there are a few disadvantages:

  • Pattern matching relies on a match function with cases for A, B, and C rather than the friendlier syntax. You could improve this by wrapping or extending my code.
  • Literals and functors are a bit cumbersome. The syntax @glennsl showed for keywords and literals looks really nice.
  • No genType support
  • Assignment to the union requires calling a fromA or fromB or fromC or Obj.magic rather than the = operator.
1 Like

The specification of the restrictions is in the descriptions of the PR:

This proposal worked out quite well. The next big release of ReScript is going to be great!

7 Likes