[Proposal] Untagged Variants for ReScript

Example:

Here’s part of the implementation of Js.Json in js_json.ml:

(** Efficient JSON encoding using JavaScript API *)

type t

type _ kind =
  | String : Js_string.t kind
  | Number : float kind
  | Object : t Js_dict.t kind
  | Array : t array kind
  | Boolean : bool kind
  | Null : Js_types.null_val kind


type tagged_t =
  | JSONFalse
  | JSONTrue
  | JSONNull
  | JSONString of string
  | JSONNumber of float
  | JSONObject of t Js_dict.t
  | JSONArray of t array

let classify  (x : t) : tagged_t =
  let ty = Js.typeof x in
  if ty = "string" then
    JSONString (Obj.magic x)
  else if ty = "number" then
    JSONNumber (Obj.magic x )
  else if ty = "boolean" then
    if (Obj.magic x) = true then JSONTrue
    else JSONFalse
  else if (Obj.magic x) == Js.null then
    JSONNull
  else if Js_array2.isArray x  then
    JSONArray (Obj.magic x)
  else
    JSONObject (Obj.magic x)

and here is how it looks instead:

  @unboxed
  type rec t =
    | @as(false) False
    | @as(true) True
    | @as(null) Null
    | String(string)
    | Number(float)
    | Object(Js.Dict.t<t>)
    | Array(array<t>)

  type tagged_t =
    | JSONFalse
    | JSONTrue
    | JSONNull
    | JSONString(string)
    | JSONNumber(float)
    | JSONObject(Js.Dict.t<t>)
    | JSONArray(array<t>)

  let classify = (x: t) =>
    switch x {
    | False => JSONFalse
    | True => JSONTrue
    | Null => JSONNull
    | String(s) => JSONString(s)
    | Number(n) => JSONNumber(n)
    | Object(o) => JSONObject(o)
    | Array(a) => JSONArray(a)
    }
4 Likes

No aux functions, no explicit typeof, no Obj.magic.

One thing this enables that I really don’t think should be overlooked is seamless handling of null/undefined for external values. That’s quite clunky today. We have @return(nullable), but that works in limited scenarios. We have Js.Nullable, but that requires explicit conversion, which kills the pattern matching flow.

With untagged variants, leveraging pattern matching for values that can be T | null or T | null | undefined can be a first class language feature:

@unboxed type nullable<'a> = | @as(null) Null | Present('a)

switch someNullable {
  | Present({age: Null}) => Console.log("Ooops, no age!")
  | _ => ()
}

// Or including the undefined case.
@unboxed type nullish<'a> = | @as(null) Null | @as(undefined) Undefined | Present('a)

This will drastically simplify many interop scenarios. And we can extend the helpers in Core to make using nullable and nullish (names not decided of course…) a developer experience close to what Option gives today.

5 Likes

Does the discrimination happen by tagged_t? Imagine an untagged union of Date.t | { year: int, month: int}. Or an authentication function returns Error.t | { userId: string, emailValidated: bool, email: string }. Or a 2-element vs. 3-element array. How do these get pattern matched? If the discriminator function were user-definable, it could perform shape discrimination using any variety of techniques and lightweight validation.

A bunch of JavaScript Array functions return an index of -1 for invalid and >=0 otherwise. And so we have flavors of these functions like findIndex and findIndexOption. I wonder if untagged variants could clean this up. Under the covers they could be zero-cost and return just an int but they could be wrapped with something like this below. It probably doesn’t quite work but I thought I’d throw it out there since it seems similar to the nullable example @zth mentioned. We could have an abstract arrayIndex type that compiles to an int, but we won’t be able to use the friendly ReScript pattern matching syntax. F# Active Patterns are a good solution for read-only situations.

@unboxed
type ArrayIndex =
| Invalid(@as(-1) int, i => i < 0)
| Valid(int, i => i >= 0)

@jmagaram in the prototype implementation of untagged variants I’ve tried to stay minimalistic. Just what’s described in this proposal (but made precise) and what’s required to support the examples discussed.
Even adding that brings a certain level of complexity. First at high level, but more importantly in the implementation.

The implementation of pattern matching is not the cleanest code. I should say “codes” as it happens in 2 distinct places in the compiler.
I had to work around some nasty bugs just to introduce what’s described here.

If one wants to expand the proposal further, some cleanup of the pattern matching implementation is in order. And we don’t really have resources for that. Plenty of other things going on.

For posterity:

Screenshot 2023-04-03 at 06.17.41


1 Like

I’ve been waiting for an unboxed multiple variant constructor for a long time, and this would be a great feature. It seems like an easy, type-safe, and nicely interoperable syntax.

4 Likes

It will be nicer if it could be extended to more specific instanceof tests as well as typeof tests for a few limited primitive types.

@unboxed
type rec t =
   | @as(false) False
   | @as(true) True
   | @as(null) null
   | String (string)
   | Number(float)
   | Object(Js.Dict.t<t>)
   | Array(array<t>)
   // extended types
   | @as(Date) Date
   | @as(Uint8Array) Binary
1 Like

This could possibly be arranged

// extended types
   | Date(Date.t)
   | Binary(Uint8Array.t)

Does this come up frequently in bindings requiring just one possible case of many being a date/binary?
Curious to see a link to an example out there.

I really like this, it definitely solves a pain point for bindings.

One note: using
Symbol.toStringTag may end up being easier to extend than instanceof and/or typeof, (for example DOM prototype elements implement this according to MDN) in addition to being a more “modern” API.

Any idea about performance implications? This could turn out useful for half complex values, but perhaps not for the current values, and complex ones. E.g. it can’t be used to determine the shape of objects.

I didn’t before you asked but it looks like a simple case is quite a bit slower (~96% on my iPad) unfortunately

Perhaps not all that surprising if v8 actually has to perform the function call as it can be overridden in user code.

1 Like

Not frequently, but it happened once to me when I’m trying to port my MessagePack encoder into ReScript.

I expect the same thing to happen when I do something similar like CBOR or PackStream.

Or it might be useful when using class-based polymorphism, such as classifying the Error instance.

My first impression is that this seems like an overly complex feature with too many edge cases that doesn’t align very well with rescript’s existing type system. And it might be a good idea to first introduce it in a more limited form, with simpler and clearer boundaries. Such as just covering a limited set of primitives at first. It looks it could be quite useful for interop though, especially by making nullable types more intuitive, and generalizing that concept forms a natural path of progression for learning.

Without considering technical feasibility too much, I think my ideal design would be something along the lines of this:

// structural type definition
type t<'a> = [
  // keywords
  | null
  | undefined
  | unknown

  // literals
  | 42
  | "foo"

  // primitive types
  | #:string
  | #:number
  | #:bool

  // regular tagged union constructors
  | #ok('a) // compiles to { kind: "ok", ... }

  // possible extensions
  | #:instance<Error: Error.t> // will match `instanceof Error` and refine as `Error.t`
]
// pattern matching
switch x {
  // keywords
  | null => ...
  | undefined => ...
  | unknown => ... // refines `x` to built-in `unknown` type

  // literals
  | 42 => ... // this would match exact literal constructors, as well as `#:number`
  | "foo" => ... // this would match exact literal constructors, as well as `#:string`

  // primitive types
  | #:string => ... // `x` here might be refined to be of type string to be used directly
  | #:number as n => ... // or a new binding `n` of type `float` could be introduced using `as`
  | #:bool => ...

  // regular tagged union constructors
  | #ok("foo") => ... // nested pattern matching on contained values

  // possible extensions
  | #:instance<Error> as error => ... // `error` refined to `Error.t`
}

One of the design goals has been to not introduce new syntactic constructs.
The proposal used to add a single character to the language. Now it adds zero.

1 Like

I already miss the feature. It will make working with polymorphic values more convenient.

Right now I need to do this thing:

Which is very inconvenient to do pattern matching on…

4 Likes

Now for an advanced example of what one can do with this:

Unmarshalling Binary Data and Computing the Sum of an OCaml List

In this post, we’ll explore how to read binary marshalled data produced by an OCaml compiler and perform high-level list operations on it. Here’s a step-by-step explanation of the code:

First, we have a native OCaml program that creates a list of integers, marshals it to a string, and saves that string into aaa.marshal:

let foo v =
  let s = Marshal.to_string v [Compat_32] in
  let ch = open_out_bin "aaa.marshal" in
  let () = output_string ch s in
  close_out ch

foo [1;2;3;4;5]

Next, we read the marshalled file, unmarshal it (using jsoo), and pass it to the sum function. The sum function operates at a high level on lists, but its definition uses the runtime representation that corresponds to OCaml’s runtime representation for lists:

let s = caml_read_file_content("./aaa.marshal")

@unboxed
type rec myList<'a> = | @as(0) Empty | Cons((unknown, int, myList<'a>))

let v: myList<int> = unmarshal(s)

let rec sum = l =>
  switch l {
  | Empty => 0
  | Cons((_, i, l)) => i + sum(l)
  }

Js.log2("v", v)
Js.log2("sum:", sum(v))

To see what the runtime representation looks like, the first log gives:

v [ 0, 1, [ 0, 2, [ 0, 3, [Array] ] ] ]

v is a nested array where the first element is always 0, the second element is the integer, and the third element is the next item in the list.

The sum function walks the runtime representation directly and gives the correct sum 15.

This approach demonstrates a neat technique for working with OCaml’s runtime representation to manipulate lists while maintaining high-level abstractions.

2 Likes

I’m not familiar with the syntax near Cons. Specifically what can go in the unboxed choices? That looks like a very elegant solution to your specific problem.

I experimented with solutions for this feature and tried to figure out how much could be accomplished without changes to the compiler. See this code for unions. Maybe this approach will be useful to someone right now or can help generate ideas when there is time to consider syntax additions. Here’s what I learned:

I had success defining each variant option as a module type, like this. I wasn’t sure what to call this because it really is like any type with a couple useful additions. (1) a guard function, and (2) custom equality.

module type Pattern = {
  type t
  let isTypeOf: unknown => bool
  let equals: (t, t) => bool
}

Define the union like this, where each case is a module name that implements Pattern.

type response =
| NegativeOne
| Null
| Success
| Failure
| Date
| False

Pattern matching would look like usual but the module name is used in place of the constructor.

switch s {
    | Date(d) => ...
    | Success(s) => ...
    | Failure(f) => ...
    ...
}

Using this kind of definition the module is like a type-class instance and it becomes possible to:

  • Discriminate on any programmable criteria like typeof, instance of, type tags for unions defined differently than ReScript, or shape detection, JSON parsing, etc. All is implemented through the isTypeOf function. Validation could be included like ensuring that a Date is not just instanceof Date but is also !isNaN. This could take some of the discrimination logic out of the compiler and into user-land.
  • Safely assign values to the union. In the example above, any Success.t, Failure.t, False.t, etc. can be safely assigned.
  • When you have an unknown value from an external API, you can still safely try to assign it to the union. Each isTypeOf guard is attempted and the result is Some(response) if something works or None otherwise. The make function in my code does exactly that.
  • Equality can be optimized to first ensure the types match - meaning isTypeOf succeeds for both and then use the provided equals function.

Literals, like True or null or "yes" or -1 can be built manually and in a completely type-safe way - see functors for literal types - but compiler support would definitely make it less cumbersome.

My code for this all works, but because there is no compiler support there are a few disadvantages:

  • Pattern matching relies on a match function with cases for A, B, and C rather than the friendlier syntax. You could improve this by wrapping or extending my code.
  • Literals and functors are a bit cumbersome. The syntax @glennsl showed for keywords and literals looks really nice.
  • No genType support
  • Assignment to the union requires calling a fromA or fromB or fromC or Obj.magic rather than the = operator.
1 Like

The specification of the restrictions is in the descriptions of the PR:

This proposal worked out quite well. The next big release of ReScript is going to be great!

7 Likes