[Proposal] Untagged Variants for ReScript

We don’t.
This is one of the first cases we considered. The ides for type variables, is to only accept at most 1 case with type variables in the definition. The alternatives would be way too complicated/inefficient to consider.

What’s interesting is that even with this restriction, untagged variants can encode quite a lot. Here are some reflections that were not included in the proposal above to avoid confusing people:

Appendix B: Encoding Tagged Variants Using Untagged Ones

In this conversation, we discussed the possibility of representing tagged variants using untagged ones. While it is possible to do so, it introduces verbosity and requires the user to manage the tags manually. Users need to create and maintain the object types for each case and ensure the tag property is set correctly.

In contrast, native tagged variants are more concise and less error-prone, as the language handles tagging internally. Additionally, pattern matching syntax is more ergonomic when working with tagged variants, since it is designed explicitly for that purpose.

Although it is technically possible to represent tagged variants using untagged ones, it may not be the most convenient or efficient way to work with them in practice. The benefits of using native tagged variants in terms of conciseness, ease of use, and pattern matching support typically outweigh the advantages of using untagged variants for this purpose.

Appendix C: Encoding Polymorphic Variants Using Untagged Variants

In this conversation, we explored the possibility of encoding polymorphic variants using untagged variants, focusing on their extensibility and expressiveness. Untagged variants differ from polymorphic variants in that they rely on runtime type information and structure of the values, instead of using tags to distinguish cases.

Untagged variants can express extensibility in a tag-less manner by including a case with a type variable as payload. This allows the type to be extended with new cases without modifying the original type definition. Here’s an example:

type untaggedFruit<'a> =
  |: Apple({ color: string })
  |: Banana({ length: int })
  |: Extra('a)

In this example, the Extra case with the type variable 'a enables the extensibility of the untaggedFruit type. By providing the appropriate payload type, we can extend the type with new cases, as shown in the example with the Orange case.

Untagged variants are strictly more expressive than polymorphic variants, as they can model tag-less, extensible types without relying on tags. However, this approach comes with trade-offs, such as potential performance implications and less concise pattern-matching syntax compared to polymorphic variants.

1 Like

Just a couple more questions…

Performance

Performance is mentioned a few times (well even has a section at the end)…has any benchmarking been down comparing current variants to the proposed untagged variety? Even normal variants will use typeof t === "number" for some cases (where there is no payload)…so curious about any actual performance differences specifically due to typeof vs mostly coparing the TAG field of the object.

Also it says the proposal says: “the JavaScript code maintains readability and efficiency by leveraging the native typeof operator for type checks.” and that “the performance difference between tagged and untagged variants is likely to be minimal in most cases”. Just a guess, but wouldn’t the main efficiency gains come from not having to represents variants as objects in JS, but rather ability to store them as basic values?

Eg, I could imagine a scenario in which a function in a hot loop is creating a lot of normal tagged variants (which are represented by an object at runtime) being slower than that same function that only needed to produced untagged values (ie represented as ints/strings directly, w/o needed to create all the extra objects to store them in).

Here is one tiny benchmark I did quickly to explore that idea.

Rescript looks like this:

  type t = String(string) | Int(int)

  let f = t =>
    switch t {
    | String(v) => String(v ++ " yo!")
    | Int(v) => Int(v + 1234)
    }

Here is the tagged version that rescript compiler spits out:

function tagged(t) {
  if (t.TAG === /* String */0) {
    return {
            TAG: /* String */0,
            _0: t._0 + " yo!"
          };
  } else {
    return {
            TAG: /* Int */1,
            _0: t._0 + 1234 | 0
          };
  }
}

and a hypothetical untagged version that may be emitted:

function untagged(t) {
  if (typeof t === "string") {
    return t + " yo!";
  } else {
    return t + 1234 | 0
  }
}  

Then four benches:

tagged({TAG: 0, _0: "hi"})
untagged("hi")
tagged({TAG: /* Int */1, _0: 1});
untagged(1)

In both string and int case, both the untagged functions were about 10% more ops/s as compared to the tagged versions (check the link above to see the numbers).

Edit: that 10% number was on an oldish laptop running new chrome. I ran it on my phone (newish phone running opera) and the untagged version was ~2x as many ops/s.


Unknown type

But the input to the process function as written is always a value of maybeString, never a string or any other type right? Okay, that’s being pedantic, I get what it is getting at, eg when the StringValue("yo") constructor is used, you get the "yo" payload as a string, etc. But “when the input is any other type” is still wrong as UnknownValue still has value of type unknown as the payload, not any type (eg UnknownValue(Unknown(123)) not UnknownValue(123) or UnknownValue(3.14)).

So would this proposal also change the way the unknown type currently works in rescript or was that a mistake?

More about unknown…“To handle unknown values safely, we can provide utility functions that perform type-safe operations on the unknown values.” <= What would working with unknowns be like in practice?

Currently, you can’t really do much with them right? (Whenever you try to return the payload of Unknown(whatever) you get the errors like The type constructor $Unknown would escape its scope …im assuming it is a gadt then…something like type t = unknown = Unknown : _ -> t maybe?) Would you be changing its internal representation to an untagged variant? Would this way of working with unknown be restricted to if it is used inside an untagged variant, or will unknown be able to be used outside?


Array syntax

I’m assuming that this

let apiData = StringValue("Apple") |: NumberValue(42) |: StringValue("Banana") |: NumberValue(3)

should be written as

let apiData = [StringValue("Apple"), NumberValue(42), StringValue("Banana"), NumberValue(3)]

Is that correct?


Compile-time vs run-time errors

Others mentioned it, but these should ideally be a compile-time errors rather than something that only shows up at runtime. Will that be addressed?


Meta question…I see you listed “Intelligentia A.” as an author…was the proposal itself written by the AI chat or just consulted? No judgement either way, just curious. (Maybe it is connected to Interested in AI/language models and want to explore useful things for ReScript? Look here!)

1 Like

This question answers most of your other questions too.
The AI was consulted from the beginning to explore, come up with examples, play as a sounding board. It’s the AI that came up with the syntax “|:” which led to realising that the scope can be wider than we had originally envisioned. For this reason “|:” is still in the document today, to acknowledge an interesting turning point. The final syntax will most likely be just @unboxed on the type def, as people have suggested.

The little errors you have spotted are exactly what they are: little errors that you can figure out the intended meaning anyway.

The document has been written entirely by the AI, with just a final pass to correct the tendency to change notation in mid flight.
The AI also answered all the question on this forum up until Bob’s one. Which was a bit too difficult even though it had been discussed before in the chat session.

I’ll check what other questions are still unanswered by this.

2 Likes

These are great experiments. More investigations would be great too.
There are aspects such as JIT de-optimization when it does not see a uniform object structure that come to mind too (would not happen with tags).
It seems difficult to come up with intuitions ahead of time without running more tests.
Also, little benchmarks might or might not transfer to real apps. But, giving some rough guidelines and things to look out for would be valuable for sure.

As for unknown, no change is required.
The comment was about the fact that even with type unknown, it is possible to extract some generic information. One, is to convert it to a string, Another one, is to ask for its type.
One can even imagine a module Unknown with a few of these functions if people end up using the often in their projects.

1 Like

Here is a simpler and more complete version of what I was thinking. I don’t know what GADTs are. To me, unknown is just an abstract type in the Types/Unknown module. We could use it as the return value on Object.getProperty. You could use unknown when receiving data from an external function, or when dealing with an array of heterogeneous items. You can’t do anything with it until you test it and convert it to something else.

The Types/Unknown module has typeof and classify functions. There are convenience functions like toString that runs anything, not just unknown, through typeof and returns a Some(string) or None. There could be unsafe variants like toStringUnsafe. Maybe specific modules like Date could have toDate that tests anything using instance of Date and returns an optional date. For anything more complicated use JSON parsers to make sense of unknown data.

Each part of an untagged variant has a constructor name, type, and unlike other variants, a run-time guard. Guards are predicates of the form unknown => bool. Guards could be optional or not allowed if the type is one that can be handled directly and more reliably by the compiler through typeof or other mechanisms like instance of Date. Maybe string literals like [#yes|#no] could be automatic. Maybe option<...> could be handled automatically if the inner type can be handled automatically. Here is an example…

let stringGuard = u => Unknown.typeof(u) == #string

type nonNegative = NonNegative(float)

let nonNegativeGuard = u => 
  u
  ->Unknown.toFloat
  ->Option.isSomeAnd(i => i > 0.0)

let sizeNameGuard = u =>
  u
  ->Unknown.toString
  ->Option.isSomeAnd(s => ["small", "medium", "large"]->Array.includes(s))

type size =
  |: Named([#small | #medium | #large], sizeNameGuard)
  |: Pixels(float)
  |: Percentage(nonNegative, nonNegativeGuard)

You can make any variant just like you normally would with the constructor like Percentage(NonNegative(25.0)) or Named(#small). Pattern matching would work like regular pattern matching except the guards are used rather than tags until a match is found. No exhaustiveness checks so a final | _ as unk is required unless there is a variant with a single unknown payload like Other(unknown) which should match everything.

Notes

My first post suggested that each part of an untagged variant has a guard of the form unknown => option<type>, which implies the type, rather than a type and a separate predicate. This introduces the possibility that the guard could generate a new object from the original unknown object. Maybe this is useful but could lead to lots of new objects each time you pattern match. The boolean predicate version might be simpler because it is just asking whether the unknown thing can be treated as if it is the type in question.

Thinking about general usefulness, when sending data out I’m not sure what is wrong with stripping the tags then with that unwrap or unbox mentioned earlier. And when pulling data in it isn’t so bad to parse it into a new record using a parsing library. I think using genType might be a problem because I don’t think it can do the unwrapping and puts tags in when TypeScript just expects a number | string.

Scala 3 actually has a Rust-like enum syntax for variants, you don’t need to manually define sealed trait hierarchies any more. In fact, Scala 3 has actual union types as well, e.g. def f(x: Boolean): Int | String = if x then 1 else "". So the AI is a bit out of date :wink:

In general, I’m not seeing the utility of this proposal over the Js.Types or Js.Json modules.

Consider a scenario where you are building a web application that fetches data from a third-party API.

In this scenario I assume we are fetching JSON, in which case we’d use Js.Json.parseExn or equivalent to read it into a JSON structure from a string. From there we have a variety of options, from the low-level Js.Json.classify up to PPXs and codegen tools that decode Js.Json.t into custom types.

Whereas with this proposal we have only typeof at our disposal. And typeof can’t even tell us when something is an array or null, it just says it’s an object.

There may be other scenarios where pervasive use of typeof is appropriate, but given the existing powerful runtime typechecking tools in ReScript i.e. variants and polymorphic variants, I’m hard-pressed to think of them.

P.S. personally I feel that it should be stated up front that this proposal is from an LLM, at the very least it avoids the confusion that it’s an original work by zth :slight_smile:

1 Like

I’m pasting below some discussion about how a Json library implemented with untagged variants would look like.
(I am formulating the question and the AI answers, but in practice w only care about the result)


In the spirit of untagged variants, I would try to avoid having to, for example, turn an object into a list. That’s an expensive extra allocation.
Isn’t there a way to change the type JArray so the deserialise function can return directly the result of parseJson?

Yes, you can indeed change the JArray and JObject types to avoid the extra allocation and more closely match the output of the parseJson function. One possible approach is to update the JsonValue type definition to use untagged variants for arrays and objects directly:

type JsonValue =
  |: JNull
  |: JBool(bool)
  |: JNumber(float)
  |: JString(string)
  |: JArray(array<JsonValue>)
  |: JObject(dict<string, JsonValue>)

In this definition, the JArray variant directly holds an array of JsonValue instances, and the JObject variant holds a dictionary with string keys and JsonValue values.

Now, the deserialize function can return the result of parseJson directly when handling arrays and objects, without the need for extra allocations:

So json looks clean and that’s nice.
But really, the interest is in interfacing with APIs for TS. And TS used untagged unions in functions we need to bind to.
This aspect seems pretty straightforward: today we simply don’t have a nice way to bind to TS.
Well it’s JS, but expressing things at high level this way using untagged variants we will get the corresponding TS types for free via genType.

1 Like

I see this proposal as something useful for interop scenarios, or, more generally, for scenarios where you can expect your types to be predictable. So, for instance, if we define “third-party APIs” as some frontend lib written in TypeScript, provided that lib and your ReScript code won’t get out of sync (depending on your deploy/loading sheme), you don’t have to parse its output. So here untagged variants could be quite enough and therefore better (the simplest/least powerful tool for the job).

Now, when you actually fetch data, you probably should parse, because even your own client and backend can easily get out of sync. In that case, using Js.Json or rescript-struct is fully justified.

4 Likes

Could you explain the real specific problem you’re trying to solve? Is typeof part of the solution because this still seems like a limited way of categorizing things. I’m getting confused about what is real here and what is an AI joke.

I can imagine a scenario where we receive a TypeScript type we trust, perhaps already validated with zod, that is a tagged discriminated union. But it is tagged differently than how ReScript does it. Actually every discriminated union we receive will be tagged differently than how ReScript does it.

type Result =
  | { ok: true; data: string }
  | { ok: false; error: Error };

We could interop with this - parsing data we receive or constructing data to send out - if we had new attributes like @discriminator("ok") on the type itself and something like @discriminator(true) on each variant.

The approach I described earlier is more flexible and could be used for both tagged unions and untagged unions where typeof or instance of or powerful parsing options are used to discriminate. This could be made safer with better support for literals so you couldn’t construct a Success without true as the ok value.

let hasProperty = (prop, value, u: unknown) =>
  u
  ->Unknown.get(prop)
  ->Option.isSomeAnd(v => v === Unknown.toUnknown(value))

type result =
  |: Success({ok: bool, data: string}, hasProperty("ok", true))
  |: Failure({ok: bool, error: Error.t}, hasProperty("ok", false))

If we receive a proxy and convert it to our kind of tagged discriminated union, the proxy won’t work any more.

1 Like

Such a mechanism exists in master

Example:

Here’s part of the implementation of Js.Json in js_json.ml:

(** Efficient JSON encoding using JavaScript API *)

type t

type _ kind =
  | String : Js_string.t kind
  | Number : float kind
  | Object : t Js_dict.t kind
  | Array : t array kind
  | Boolean : bool kind
  | Null : Js_types.null_val kind


type tagged_t =
  | JSONFalse
  | JSONTrue
  | JSONNull
  | JSONString of string
  | JSONNumber of float
  | JSONObject of t Js_dict.t
  | JSONArray of t array

let classify  (x : t) : tagged_t =
  let ty = Js.typeof x in
  if ty = "string" then
    JSONString (Obj.magic x)
  else if ty = "number" then
    JSONNumber (Obj.magic x )
  else if ty = "boolean" then
    if (Obj.magic x) = true then JSONTrue
    else JSONFalse
  else if (Obj.magic x) == Js.null then
    JSONNull
  else if Js_array2.isArray x  then
    JSONArray (Obj.magic x)
  else
    JSONObject (Obj.magic x)

and here is how it looks instead:

  @unboxed
  type rec t =
    | @as(false) False
    | @as(true) True
    | @as(null) Null
    | String(string)
    | Number(float)
    | Object(Js.Dict.t<t>)
    | Array(array<t>)

  type tagged_t =
    | JSONFalse
    | JSONTrue
    | JSONNull
    | JSONString(string)
    | JSONNumber(float)
    | JSONObject(Js.Dict.t<t>)
    | JSONArray(array<t>)

  let classify = (x: t) =>
    switch x {
    | False => JSONFalse
    | True => JSONTrue
    | Null => JSONNull
    | String(s) => JSONString(s)
    | Number(n) => JSONNumber(n)
    | Object(o) => JSONObject(o)
    | Array(a) => JSONArray(a)
    }
4 Likes

No aux functions, no explicit typeof, no Obj.magic.

One thing this enables that I really don’t think should be overlooked is seamless handling of null/undefined for external values. That’s quite clunky today. We have @return(nullable), but that works in limited scenarios. We have Js.Nullable, but that requires explicit conversion, which kills the pattern matching flow.

With untagged variants, leveraging pattern matching for values that can be T | null or T | null | undefined can be a first class language feature:

@unboxed type nullable<'a> = | @as(null) Null | Present('a)

switch someNullable {
  | Present({age: Null}) => Console.log("Ooops, no age!")
  | _ => ()
}

// Or including the undefined case.
@unboxed type nullish<'a> = | @as(null) Null | @as(undefined) Undefined | Present('a)

This will drastically simplify many interop scenarios. And we can extend the helpers in Core to make using nullable and nullish (names not decided of course…) a developer experience close to what Option gives today.

5 Likes

Does the discrimination happen by tagged_t? Imagine an untagged union of Date.t | { year: int, month: int}. Or an authentication function returns Error.t | { userId: string, emailValidated: bool, email: string }. Or a 2-element vs. 3-element array. How do these get pattern matched? If the discriminator function were user-definable, it could perform shape discrimination using any variety of techniques and lightweight validation.

A bunch of JavaScript Array functions return an index of -1 for invalid and >=0 otherwise. And so we have flavors of these functions like findIndex and findIndexOption. I wonder if untagged variants could clean this up. Under the covers they could be zero-cost and return just an int but they could be wrapped with something like this below. It probably doesn’t quite work but I thought I’d throw it out there since it seems similar to the nullable example @zth mentioned. We could have an abstract arrayIndex type that compiles to an int, but we won’t be able to use the friendly ReScript pattern matching syntax. F# Active Patterns are a good solution for read-only situations.

@unboxed
type ArrayIndex =
| Invalid(@as(-1) int, i => i < 0)
| Valid(int, i => i >= 0)

@jmagaram in the prototype implementation of untagged variants I’ve tried to stay minimalistic. Just what’s described in this proposal (but made precise) and what’s required to support the examples discussed.
Even adding that brings a certain level of complexity. First at high level, but more importantly in the implementation.

The implementation of pattern matching is not the cleanest code. I should say “codes” as it happens in 2 distinct places in the compiler.
I had to work around some nasty bugs just to introduce what’s described here.

If one wants to expand the proposal further, some cleanup of the pattern matching implementation is in order. And we don’t really have resources for that. Plenty of other things going on.

For posterity:

Screenshot 2023-04-03 at 06.17.41


1 Like

I’ve been waiting for an unboxed multiple variant constructor for a long time, and this would be a great feature. It seems like an easy, type-safe, and nicely interoperable syntax.

4 Likes

It will be nicer if it could be extended to more specific instanceof tests as well as typeof tests for a few limited primitive types.

@unboxed
type rec t =
   | @as(false) False
   | @as(true) True
   | @as(null) null
   | String (string)
   | Number(float)
   | Object(Js.Dict.t<t>)
   | Array(array<t>)
   // extended types
   | @as(Date) Date
   | @as(Uint8Array) Binary
1 Like

This could possibly be arranged

// extended types
   | Date(Date.t)
   | Binary(Uint8Array.t)

Does this come up frequently in bindings requiring just one possible case of many being a date/binary?
Curious to see a link to an example out there.