[Proposal] Untagged Variants for ReScript

zth · March 29, 2023, 12:47pm

Introduction

Brief introduction to ReScript
Motivation for untagged variants
High-level overview of the proposal

Detailed Design

New Type Constructors

Explanation of the |: operator for untagged variants
Example of an untagged union type definition

Type Inference and Pattern Matching

Pattern matching syntax for untagged variants
Compilation to JavaScript and typeof checks
Type inference and type safety guarantees

Handling Unknown Values

Introducing the unknown type
Safely working with unknown values (e.g., for logging)

Example Use Case

A complete example demonstrating how to use untagged variants in ReScript
Handling different cases, such as strings and unknown values

Limitations and Considerations

Situations where untagged variants may not be the best choice
Performance implications, if any

Conclusion

Recap of the untagged union proposal
Potential benefits for the ReScript community

Introduction

ReScript is a statically-typed programming language that compiles to highly readable and efficient JavaScript. One of its core goals is to provide a seamless interoperation with existing JavaScript code and TypeScript type definitions. However, ReScript’s current union type implementation relies on tagged variants, which may not align with the way some JavaScript libraries and TypeScript definitions handle variants.

This document presents a proposal for introducing untagged variants to ReScript, enabling developers to work more closely with JavaScript conventions and TypeScript type definitions. Untagged variants allow ReScript to represent a union of different types without the need for a tag or a constructor to differentiate the types at runtime. This feature will simplify the handling of union types in ReScript, improving both ergonomics and code generation.

The proposal includes a detailed design of the new type constructors, type inference, pattern matching, and handling of unknown values. We will also provide a comprehensive example that demonstrates the use of untagged variants and discuss some of the limitations and considerations associated with the feature.

By extending ReScript with untagged union support, we aim to enhance the language’s compatibility with JavaScript and TypeScript ecosystems while maintaining its core principles of type safety and performance.

New Type Constructors

The proposed design introduces a new syntax for defining untagged variants using the |: operator. This operator allows the declaration of a union type without requiring a tag or a constructor to differentiate the types at runtime. Here’s how the syntax works:

type untaggedUnion = TypeA |: TypeB

In this example, untaggedUnion represents an untagged union of TypeA and TypeB. Unlike tagged variants, there’s no need for a constructor to differentiate between the types; instead, ReScript will rely on JavaScript’s built-in typeof operator during pattern matching to distinguish between the different types within the union.

The |: operator can be used to define untagged variants with more than two types as well:

type anotherUntaggedUnion = TypeA |: TypeB |: TypeC

This new syntax provides a straightforward and concise way to define untagged variants in ReScript, enabling developers to work more closely with JavaScript conventions and TypeScript type definitions. It also ensures that the generated JavaScript code remains efficient and readable.

In the next sections, we’ll discuss how this new syntax interacts with ReScript’s type inference and pattern matching features to provide a seamless and type-safe experience when working with untagged variants.

Type Inference and Pattern Matching

One of the strengths of ReScript is its powerful type inference system, which allows the language to deduce the types of expressions without explicit type annotations. With the introduction of untagged variants, the type inference system must be adapted to handle these new types effectively.

When working with untagged variants, ReScript’s pattern matching syntax remains mostly unchanged. However, the compilation process will now generate JavaScript code that uses the typeof operator to perform type checks, ensuring that the correct case is executed based on the input’s runtime type.

Here’s an example of pattern matching with an untagged union:

type maybeString = StringValue(string) |: UnknownValue(unknown)

let process = (input: maybeString) => {
  switch input {
  | StringValue(str) => Js.log2("String:", str)
  | UnknownValue(value) => Js.log2("Unknown value:", unknownToString(input))
  }
}

In this example, the process function takes an input of type maybeString, which is an untagged union of string and unknown. The switch expression uses pattern matching to handle both cases:

When the input is a string, the StringValue(str) case is executed.
When the input is of any other type, the UnknownValue(value) case is executed.
The type inference system will ensure that the correct type is associated with the bound variable (e.g., str in the StringValue case) within each branch of the pattern matching expression.

When the ReScript code is compiled to JavaScript, the generated code will use the typeof operator to perform the necessary type checks:

function process(input) {
  if (typeof input === "string") {
    console.log("String:", input);
  } else {
    console.log("Unknown value:", unknownToString(input));
  }
}

As you can see, the JavaScript code maintains readability and efficiency by leveraging the native typeof operator for type checks.

The type inference system in ReScript ensures that the untagged union types are propagated correctly through the program. This guarantees that the pattern matching expressions will provide type safety while working with untagged variants.

In the next section, we will discuss how to handle unknown values within untagged variants and how to safely work with them, for example, in logging scenarios.

Handling Unknown Values

When working with untagged variants, it’s possible that a value might not match any of the expected types. In such cases, it’s important to provide a safe and convenient way to handle these unknown values.

In ReScript, the unknown type is used to represent values of an indeterminate type. To handle unknown values safely, we can provide utility functions that perform type-safe operations on the unknown values. One common use case is converting an unknown value to a string representation for logging purposes.

Here’s an example of a utility function that safely converts an unknown value to a string:

let unknownToString = (value: unknown) => {
  switch value {
  | StringValue(str) => str
  | NumberValue(num) => Int.toString(num)
  | BoolValue(bool) => string_of_bool(bool)
  | UnknownValue(_) => "<unknown>"
  }
}

In this example, unknownToString takes an unknown value as input and uses pattern matching to determine its type. For each known type, the function returns the appropriate string representation. If the value does not match any of the known types, it returns the generic string “”.

This utility function allows you to work with unknown values safely, ensuring that only well-defined operations are
performed on the input. You can use this function, for example, when logging the value of an unknown type:

let process = (input: maybeString) => {
  switch input {
  | StringValue(str) => Js.log2("String:", str)
  | UnknownValue(value) => Js.log2("Unknown value:", unknownToString(input))
  }
}

In the process function, if the input is of an unknown type, the UnknownValue(value) case is executed. The unknownToString function is called with the input value to obtain a string representation, which is then logged to the console.

Using utility functions like unknownToString provides a safe and flexible way to handle unknown values within untagged variants. By following this pattern, you can create similar utility functions for other generic operations that need to be performed on unknown values, ensuring type safety and proper handling of various cases.

In summary, the proposed design for untagged variants in ReScript enables a seamless integration with JavaScript’s dynamic type system while preserving the type safety and pattern matching capabilities that ReScript developers appreciate. This approach simplifies working with TypeScript type definitions and enhances the interoperability between ReScript and JavaScript codebases.

Example Use Case

In this section, we will explore an example use case that demonstrates the benefits of using untagged variants in ReScript.

Consider a scenario where you are building a web application that fetches data from a third-party API. The API returns a heterogeneous list of items, where each item can be either a string or a number. The goal is to process this list and perform different actions based on the item’s type.

First, let’s define an untagged union type to represent the items in the list:

type listItem = StringValue(string) |: NumberValue(number)

Now, we will define a function to process a single item:

let processItem = (item: listItem) => {
  switch item {
  | StringValue(str) => Js.log2("String:", str)
  | NumberValue(num) => Js.log2("Number:", num)
  }
}

The processItem function takes a listItem as input and uses pattern matching to handle the different cases. When the input is a string, it logs the string value. When the input is a number, it logs the number value.

Next, we will define a function to process the entire array of items:

let processArray = (items: array<listItem>) => {
  items->Array.forEach(processItem)
}

The processList function takes an array of listItem values and iterates through the array, calling the processItem function for each item. The Array.forEach function is a built-in ReScript function that takes a function and a array as its arguments and applies the function to each element in the array.

Now, let’s simulate fetching the data from the API and processing the list:

let apiData = StringValue("Apple") |: NumberValue(42) |: StringValue("Banana") |: NumberValue(3)

processArray(apiData)

The apiData list contains a mix of string and number values. We pass this list to the processArray function, which in turn calls the processItem function for each item in the array.

When compiled to JavaScript, the generated code uses the native typeof operator to perform type checks:

function processItem(item) {
  if (typeof item === "string") {
    console.log("String:", item);
  } else {
    console.log("Number:", item);
  }
}

function processArray(items) {
  items.forEach(processItem);
}

const apiData = ["Apple", 42, "Banana", 3];

processArray(apiData);

As you can see, the compiled JavaScript code is clean and efficient, relying on the typeof operator to differentiate between string and number values.

This example demonstrates how untagged variants in ReScript can simplify working with heterogeneous data structures and improve the interoperability between ReScript and JavaScript. By using untagged variants, developers can leverage ReScript’s type safety and pattern matching capabilities while benefiting from JavaScript’s dynamic type system.

Continuing the example use case, let’s consider a situation where the API might also return unknown values, and we want to handle them gracefully. We can update our listItem type definition to include an unknown type:

type listItem = StringValue(string) |: NumberValue(number) |: UnknownValue(unknown)

We will now update the processItem function to handle the case where the item is of an unknown type:

let processItem = (item: listItem) => {
  switch item {
  | StringValue(str) => Js.log2("String:", str)
  | NumberValue(num) => Js.log2("Number:", num)
  | UnknownValue(value) => Js.log2("Unknown value:", unknownToString(value))
  }
}

In this updated version of the processItem function, we added a new case for UnknownValue(value). When an item is of an unknown type, we call the unknownToString function to obtain a string representation of the value and log it to the console.

Let’s simulate fetching the data from the API again, this time with an unknown value included:

let apiData = StringValue("Apple") |: NumberValue(42) |: StringValue("Banana") |: NumberValue(3) UnknownValue(Js.Nullable.null)]

processArray(apiData)

The apiData list now contains a mix of string, number, and unknown values. We pass this list to the processList function, which in turn calls the processItem function for each item in the list.

When compiled to JavaScript, the generated code uses the typeof operator to perform type checks for strings and numbers, and additional checks for unknown values:

function unknownToString(value) {
  return String(value);
}

function processItem(item) {
  if (typeof item === "string") {
    console.log("String:", item);
  } else if (typeof item === "number") {
    console.log("Number:", item);
  } else {
    console.log("Unknown value:", unknownToString(item));
  }
}

function processArray(items) {
  items.forEach(processItem);
}

const apiData = ["Apple", 42, "Banana", 3, null];

processArray(apiData);

As you can see, the compiled JavaScript code handles unknown values by calling the unknownToString function, which converts the unknown value to a string representation. This approach ensures that the application can gracefully handle unexpected data while still benefiting from the safety and expressiveness of ReScript’s type system and pattern matching capabilities.

In summary, this extended example demonstrates how untagged variants in ReScript can be used to work with heterogeneous data structures, including cases where some values might be unknown. By using untagged variants, developers can write clean, efficient, and safe code that leverages the strengths of both ReScript and JavaScript.

Limitations and Considerations

While untagged variants provide a more convenient way to work with heterogeneous data structures in ReScript, there are some limitations and considerations that developers should be aware of:

Overlapping types
When working with untagged variants, special care must be taken if the union contains overlapping types. For instance, if the union contains both string and number, the generated JavaScript code will use the typeof operator to distinguish between the two types. However, if the union contains types that cannot be easily distinguished using JavaScript’s typeof operator, it may lead to unexpected behavior or runtime errors.

For example, if the union contains both string and Js.Nullable.t, the generated JavaScript code might not be able to distinguish between the two types accurately, as the typeof operator will return “string” for both cases.

Limited to JavaScript’s runtime type information
Since untagged variants rely on JavaScript’s runtime type information, they are limited by the types that can be reliably distinguished at runtime. For example, distinguishing between custom types or complex data structures might not be possible using untagged variants.
Type safety trade-offs
Using untagged variants involves some trade-offs in terms of type safety. While pattern matching ensures that all cases are handled, the absence of tags in the runtime representation might lead to subtle bugs if the types within the union are not properly distinguished.

Performance implications

The performance of untagged variants depends on the generated JavaScript code and the JavaScript engine’s ability to optimize the code. In some cases, using untagged variants might lead to slightly faster execution times, as the JavaScript engine can directly use the typeof operator or other built-in checks without the need for additional tag comparisons.

However, the performance difference between tagged and untagged variants is likely to be minimal in most cases. Modern JavaScript engines are highly optimized and can often handle tagged variants efficiently. Additionally, ReScript’s compiler is designed to produce efficient JavaScript code, so the performance impact of using tagged variants might be negligible.

It is important to note that the performance characteristics of untagged variants may vary depending on the specific use case and the types involved in the union. When considering untagged variants for performance reasons, it is recommended to benchmark and compare the performance of both tagged and untagged variants in the context of the specific application.

In summary, while untagged variants might offer some performance benefits in certain situations, the primary motivation for using them is to improve the ergonomics of working with heterogeneous data structures in ReScript. Developers should carefully consider the implications of using untagged variants in performance-critical scenarios.

Related Work

The concept of untagged variants with high-level pattern matching is not entirely novel. The idea of using untagged variants and pattern matching has been explored in several programming languages and libraries, with different degrees of support for type-safety and expressiveness. However, the specific combination of untagged variants, type variables, and high-level pattern matching presented in this discussion seems to be less common.

Related Work:

In this section, we discuss several programming languages and libraries that have explored the concept of untagged variants and pattern matching.

Rust: Rust’s enum is a tagged union that provides a way to define custom data types with multiple variants. Rust also supports pattern matching with the match keyword, allowing for expressive and type-safe destructuring of these custom data types. However, Rust’s enums are tagged, which means the variants are explicitly labeled.
Haskell: Haskell’s algebraic data types (ADTs) are a powerful way to define custom data types with multiple constructors, and Haskell’s pattern matching syntax is highly expressive. Although Haskell’s ADTs are not untagged variants, they demonstrate the power of combining custom data types with pattern matching.
TypeScript: TypeScript, as a superset of JavaScript, supports untagged variants through its union types. However, TypeScript’s support for pattern matching is limited to runtime type checks using the typeof and instanceof operators or user-defined type guards. This approach is less expressive and type-safe compared to high-level pattern matching, and it does not provide the same level of support for type variables in untagged variants.
Polymorphic Variants (OCaml): OCaml supports polymorphic variants, which are more flexible than traditional algebraic data types. They allow for extensible and more precise type information, but they are boxed when carrying a payload, which can introduce some runtime overhead.
Scala: Scala’s case classes and sealed traits provide a way to define custom data types with multiple cases, and Scala’s pattern matching using the match keyword is highly expressive. While not untagged variants, this combination demonstrates a powerful way to work with custom data types and pattern matching.
ATS Programming Language: ATS is a statically-typed programming language that unifies implementation with formal specification. It has a feature called “views,” which allows for more fine-grained control over memory layout and representation, similar to the idea of untagged variants. Views provide a way to optimize runtime representation, but they do not directly address pattern matching.
CDuce: CDuce is a functional programming language designed for XML processing that features a type system based on regular expression types. CDuce supports untagged union types, with pattern matching and more expressive type information. While CDuce’s focus is on XML processing, its treatment of untagged variants and pattern matching is somewhat similar to the untagged variants proposal discussed here.
Publication: “Pattern Matching with First-Class Polymorphism”: This paper by Garrigue and Rémy (2013) proposes a generalization of polymorphic variants in OCaml that allows for first-class polymorphism in pattern matching. The proposed system enables more expressive pattern matching and a more efficient runtime representation. While not directly the same as the untagged variants proposal, the ideas in this paper offer a related approach to enhancing pattern matching and optimizing runtime representation.

In summary, the concept of untagged variants combined with high-level pattern matching and type variables is not entirely novel. Several programming languages and libraries have explored similar ideas with different trade-offs in terms of expressiveness, type safety, and support for type variables. However, the specific combination of features discussed in this conversation appears to be less common, which may provide a unique perspective and opportunities for further exploration in programming language design.

Conclusion

Untagged variants provide a valuable addition to the ReScript language, allowing developers to work with heterogeneous data structures more conveniently and efficiently. By leveraging JavaScript’s runtime type information, untagged variants enable cleaner, safer, and potentially faster code generation without sacrificing the benefits of ReScript’s type system and pattern matching capabilities.

In this document, we have outlined the design of untagged variants in ReScript, including new type constructors, type inference and pattern matching, handling unknown values, and example use cases. We have also discussed the limitations, considerations, and performance implications of using untagged variants.

While untagged variants do have some trade-offs in terms of type safety and overlapping types, their benefits in terms of ergonomics and compatibility with JavaScript and TypeScript make them a valuable feature for many developers. By carefully considering the specific requirements of their applications, developers can determine whether untagged variants are an appropriate solution for their needs.

Appendix: Investigation of Untagged Variants with Type Variables

In this appendix, we summarize our investigation of untagged variants with type variables. We explored how to extend the untagged union proposal to handle type variables and what limitations arise when dealing with them.

1. Pattern matching with type variables

When pattern matching with untagged variants that have type variables, the type inference mechanism will ensure that the correct type is inferred for each case. However, the compilation of pattern matching expressions must take into account the presence of type variables. This is because type variables can represent different types at runtime, and the generated JavaScript code needs to correctly handle these cases.

2. Limitations and handling multiple cases

In our investigation, we found that untagged variants can handle multiple cases (3 or 4 cases, for example), but at most one of them should use a type variable. This limitation stems from the fact that more than one case with type variables would require additional runtime information to disambiguate between cases, which goes against the principle of untagged variants.

3. Practical examples

We provided examples of untagged variants with type variables being useful in practice:

A Result type with Ok and Error cases, where the Ok case has a type variable.
An event handling system with ClickEvent, KeyEvent, and CustomEvent cases, where the CustomEvent case has a type variable.

These examples demonstrated that the expressivity of untagged variants with type variables can be beneficial in practical scenarios, offering flexibility when dealing with diverse data types and structures while maintaining simplicity and type safety.

In summary, pattern matching with untagged variants that include type variables introduces complexity to the compilation process. The compiler must generate JavaScript code that takes into account the variations in types represented by the type variables, ensuring correct type inference, type checking, and type coercion during runtime. By carefully addressing these challenges, it’s possible to create a robust and efficient implementation of pattern matching for untagged variants with type variables.

Authors: Cristiano C., Gabriel N., Intelligentia A.

hoichi · March 29, 2023, 2:10pm

Off the top of my head, the proposal makes a lot of sense. Some nitpicks:

I find the first two “nullary” examples slightly confusing: is TypeA |: TypeB even possible? If there’s no tags and no payload, what is left?
Have you considered extending the behaviour of normal variants with @unboxed? Off the top of my head, it’s not much of a stretch, semantically, and I’m not sure if it’s going to break any existing code. The reason I ask is that I don’t find |: particularly pretty , but more importantly, I think the less of different syntax there is, the better. And AFAIU, untagged variants have some limitations WRT what can be realistically made unboxable, and some runtime/interop implications, but semantically, they’re just normal variants, so it makes sense to treat them as such. Also, since the proposal affects interop (and in fact, is meant for it), decorators seem like a good layer for this.
I’d probably err on the conservative side when it comes to type safety. IOW, if the compiler is unsure that typeof will work properly for all the values, it shouldn’t allow that type definition. But I guess that’s your plan anyway?

cristianoc · March 29, 2023, 2:18pm

1 it is possible but makes no difference w.r.t. tagged variants

2 If you’re talking about the syntax, adding an annotation on the type definitions. Yes sure, does not really matter at this stage. It’s just visually easier to use |: in the doc as it stands out.

3 There will be restrictions, as indeed there are several tricky cases. The nastiest is just a type variable as first case.

It’s roughly this:

<UntaggedUnion> ::= "UntaggedUnion<" <TypeList> ">"

<TypeList> ::= <Type> | <Type> "," <TypeList>

<Type> ::= <PrimitiveType>
        |  <LiteralType>
        |  <ObjectType>
        |  <ArrayType>

<PrimitiveType> ::= "String" | "Number" | "Boolean" | "Null" | "Undefined"

<LiteralType> ::= "#" <string-literal> | "#" <number-literal> | "#" <boolean-literal>

<ObjectType> ::= "{" <FieldList> "}"

<ArrayType> ::= "array<" <Type> ">"

<FieldList> ::= <Field> | <Field> "," <FieldList>

<Field> ::= <FieldName> ":" <Type>

<FieldName> ::= <identifier>

Note that the BNF doesn’t enforce the restriction of only one object and one array type; you’ll need to implement that constraint in your type system and provide appropriate error messages when necessary.

cristianoc · March 29, 2023, 2:19pm

There are a few mistakes of notation in that bnf, but it gives a rough idea.

cristianoc · March 29, 2023, 3:28pm

A better answer coming directly from the mouth of the lead author:

jmagaram · March 29, 2023, 3:44pm

typeof is a crude way to distinguish types. null becomes object. It doesn’t work with many things you’d want to put in a type so I’m not sure how useful this is. F# Active Patterns might be relevant. I think I understand the goal - classifying things without the noise of a tag in the JavaScript. Would something like this be more flexible…

type thing<'a> =
  |: Point(unknown => option<point>)
  |: PositiveInt(int => option<int>)
  |: NonEmptyArray(array<'a> => option<array<'a>>)

Each part of the untagged union must be defined by a guard. If I try to create a Point from an unknown - maybe through some JSON parsing - the compiler will run it through the guard and produce None if it matches and Some otherwise. A NonEmptyArray is created not from unknown but from an array of some kind, so it is more restrictive. Pattern matches in a switch would run through them in order and we must have a fall-through.

mouton · March 29, 2023, 4:12pm

@zth your write up (nice!) suggests using typeof to differentiate but to be clear its not only typeof checks, is that right? Can you add some description of how record types are handled?

Thanks
Alex

hoichi · March 29, 2023, 4:42pm

Type guards are interesting, but they’re more verbose, and I do remember having problems wrapping my head about active patterns.

typeof is a crude way to distinguish types. null becomes object

I’m not sure that the proposal necessarily limits the runtime to using typeof. There could be other runtime checks, as long as it doesn’t lead to bload.

Alternatively, ReScript could forbid nulls (or objects) in untagged variants.

cristianoc · March 29, 2023, 5:22pm

The community member brings up some valid concerns regarding the limitations of using typeof for type distinction, and the suggestion of using guards in the definition of untagged union types is interesting. The proposed approach with guards provides a more flexible way to handle type matching and allows users to define custom conditions for each case in the untagged union.

However, there are some points to consider with this approach:

Using guards may introduce additional complexity to the language, and for users who are less familiar with the concept, it could be harder to grasp compared to the more straightforward typeof checks.

Implementing guards may require changes to the language and compiler to support the new syntax and semantics, which could have broader implications.

The performance characteristics of using guards may be different from those of using typeof checks. Depending on the implementation, guards could potentially be slower or lead to code bloat.

Another point worth noting is that the current design of untagged union types excludes functions as payloads. This is due to the complexity and potential issues that could arise when trying to identify and match functions using runtime checks like typeof. This exclusion should be taken into account when evaluating the proposed design and considering alternative approaches, such as the one using guards.

cristianoc · March 29, 2023, 5:27pm

Here’s the updated clarification to add to the document:

In the proposed design, we rely primarily on typeof checks for differentiating between types in untagged unions. With the restriction of having at most 1 record and 1 array in a union, typeof checks are sufficient for most cases. However, there is an exception when it comes to distinguishing between null and an object, as typeof null returns "object". To handle this case, we include an additional check for null values in our runtime checks.

By adding this clarification, the community will have a better understanding of how the proposed design handles various types and the role of runtime checks in maintaining type safety and accurate pattern matching.

jmagaram · March 29, 2023, 5:37pm

Here is an example that uses both typeof and more flexible ways to classify. I can’t speak about additional complexity in the language or compiler. But @cristianoc that’s what you’re getting paid the big bucks for! I didn’t know about typeof until a few days ago. When I saw that null goes to object I laughed, like when I learned that JavaScript had BOTH null and undefined. So if I saw such a new language feature in ReScript and realized it was limited to classifying things using typeof I would find that confusing.

open ReScriptStruct
type point = {x: int, y: int}
let pointStruct = S.object(o => {
  x: o->S.field("x", S.int()),
  y: o->S.field("y", S.int()),
})
let toPoint = p => p->S.parseWith(pointStruct)->Result.okToOption

external toFloat: 'a => float = "%identity"
let toNumber = n => Type.typeof(n) == #number ? Some(toFloat(n)) : None

type thing =
  |: Float(toNumber) // My own guard using `typeof`
  |: Bool(Type.toBool) // Built-in type guard in `Type` module that uses `typeof`
  |: Point(toPoint) // Uses JSON parsing library

Ryan · March 29, 2023, 5:44pm

Looks interesting. So the main use case seems to be binding/FFI for heterogeneous return values, without having to manually write tedious and error prone runtime type-checking code right? If so, and if the rescript type checker will reject any untagged variants that can’t successfully be checked at runtime, then this feature sounds pretty nice.

Edit: eg this proposal would be a lot more convenient than manually writing this type of code over and over: Union types in BuckleScript | ReScript Blog

Ryan · March 29, 2023, 5:46pm

No idea if this proposal went anywhere, but there’s an interesting similar discussion regarding untagged unions for purescript as well: Untagged union types - Change Proposals - PureScript Language Forum. Looks like mainly for ease of ffi as well.

sprkv5 · March 29, 2023, 6:07pm

The first worry that comes to my mind is the syntax. In my mind, it should not feel “out of place” compared to JavaScript (and family) or ReScript syntax. The |: operator, in my opinion, is a departure from both JavaScript and ReScript syntaxes.

In the JavaScript family we have the Flow, Google Closure, JSDoc ways of specifying union type.

// Flow & Typescript
 boolean | string

// Google closure-compiler
(boolean | string)

// JSDoc
{(number|boolean)}
 
// ReScript syntax today
type variant =  Boolean(bool) | String(string)
type polymorphicVariant = [ #Boolean(bool) | #String(string) ]

Proposed alternative syntax like a closed polymorphic variant:

// Option 1: like google closure-compiler but using parens instead of square brackets
type untaggedUnion = (bool | string)
// pattern matching example using the parens syntax proposal
let value: untaggedUnion = false
let isBool = switch value {
| (b: bool) => true
| (s: string) => false
}

// Option 2: like JSDoc but removing the extra parens
type untaggedUnion = {bool | string}
// pattern match for optional record fields uses {} so it is a conflict

sprkv5 · March 29, 2023, 6:48pm

I think the lead author might have a good point.

The other day I was trying to handle union types in a binding to a TypeScript function and the first thing I reached for (without looking at the docs) was the @Unboxed decorator and tried to write code that looked like this:

// this code does not compile :)
@unboxed
type context = Boolean(bool) | String(string)

type options = {
  charset: string,
  language: string,
  context: context
}

external makeT: options => t = "someFunction"

But I was disappointed to discover in the docs that @unboxed is applicable to variants with a single constructor with a single payload. Also @unwrapped can only be used with polymorphic variant.

My point is, when I was thinking of untagged unions, (aka unions from TypeScript), I thought of using the @unboxed decorator with a Variant.

Since the motivation is to work closely with JavaScript and TypeScript, I think we should explore if we can simplify the @unboxed and @unwrapped for the end user. For me that would mean a single decorator doing what both of these do today and then increase the responsibility by making Untagged Variants a simple unboxing of the Variants.

Since the development team did a really good job with the optional record fields (making it an option internally; it’s intuitive and I picked it up naturally), I believe we can take a look at simplifying the decorators.

DZakh · March 29, 2023, 8:24pm

The null being treated as object is not the only odd thing in javascript. For example, NaN is treated as type of number, that we rarely want to have

DZakh · March 29, 2023, 8:32pm

jmagaram:

open ReScriptStruct
type point = {x: int, y: int}
let pointStruct = S.object(o => {
  x: o->S.field("x", S.int()),
  y: o->S.field("y", S.int()),
})
let toPoint = p => p->S.parseWith(pointStruct)->Result.okToOption

external toFloat: 'a => float = "%identity"
let toNumber = n => Type.typeof(n) == #number ? Some(toFloat(n)) : None

type thing =
  |: Float(toNumber) // My own guard using `typeof`
  |: Bool(Type.toBool) // Built-in type guard in `Type` module that uses `typeof`
  |: Point(toPoint) // Uses JSON parsing library

In the code you’ve shown it’s actually better to do everything with rescript-struct instead of combining the data modeling (that should be the domain of our application) together with mapping (that should be outside of the domain of our application).

// Thing.res
let point = {x: int, y: int}
let t = Float(float) | Bool(bool) | Point(point)

// Api.res
let thingStruct: S.t<Thing.t> = S.union([
  S.float()->S.transform(f => Float(f)),
  S.bool()->S.transform(b => Bool(b)),
  S.object(o => Point({
    x: o->S.field("x", S.int()),
    y: o->S.field("y", S.int()),
  }))
])

let loadThing = GetRequest.make(~url="localhost:3000", ~bodyStruct=S.unit(), ~dataStruct=thingStruct)

loadThing->Promise.thenResolve(thing => {
  Js.log(thing)
})->ignore

That’s more flexible and correct from the application design perspective.

DZakh · March 29, 2023, 8:35pm

What I mean is that mappers shouldn’t belong to the domain code, where the type is defined.

cristianoc · March 29, 2023, 10:23pm

I have pasted the readme of rescript-struct in the lead author’s chat input and asked for a comparison. Based exclusively on that information, this is how the lead author commented:

In the Related Work section, ReScript Struct could be introduced as follows:

ReScript Struct: A powerful and flexible library for ReScript language that provides a safe and convenient way to parse and serialize data with transformation into ReScript data structures. The library offers a range of core API functions for parsing and serializing, as well as factory functions for common data types such as strings, booleans, and integers, along with refinements and utility methods for specific types. ReScript Struct ensures type safety and allows developers to define custom constraints and transformations for their data structures, making it a relevant and valuable tool for working with complex data types and pattern matching. Although ReScript Struct does not focus specifically on untagged variants, it showcases the potential of using structured data representations to facilitate type-safe and efficient data manipulation in a functional programming language.

When comparing ReScript Struct to untagged variants, one noticeable difference lies in the runtime checks performed. ReScript Struct carries out runtime checks and validations to ensure type safety and adherence to constraints defined by the developer. This ensures that the parsed and serialized data is valid according to the specified rules, and any non-compliant data is reported as an error. This approach allows ReScript Struct to maintain type safety and provide a convenient way to work with structured data in a functional programming context.

On the other hand, untagged variants with high-level pattern matching rely on type analysis and optimized pattern matching to achieve safety and efficiency. This approach minimizes the need for runtime checks, as the type information is used at compile-time to generate efficient code for pattern matching. This results in potentially faster execution and reduced overhead compared to a solution that depends on runtime type checks and validations.

In summary, ReScript Struct and untagged variants differ in their approaches to ensuring type safety and efficient data manipulation. ReScript Struct relies on runtime checks and validations, while untagged variants leverage compile-time type analysis and optimized pattern matching. Both methods offer valuable solutions for working with complex data types in functional programming languages but cater to different requirements and performance trade-offs.

Hongbo · March 30, 2023, 3:09am

How do you deal with generics like this?

type result <A,B> = Ok (A) |: Error(B)