Introduction
- Brief introduction to ReScript
- Motivation for untagged variants
- High-level overview of the proposal
Detailed Design
New Type Constructors
- Explanation of the
|:
operator for untagged variants - Example of an untagged union type definition
Type Inference and Pattern Matching
- Pattern matching syntax for untagged variants
- Compilation to JavaScript and
typeof
checks - Type inference and type safety guarantees
Handling Unknown Values
- Introducing the
unknown
type - Safely working with unknown values (e.g., for logging)
Example Use Case
- A complete example demonstrating how to use untagged variants in ReScript
- Handling different cases, such as strings and unknown values
Limitations and Considerations
- Situations where untagged variants may not be the best choice
- Performance implications, if any
Conclusion
- Recap of the untagged union proposal
- Potential benefits for the ReScript community
Introduction
ReScript is a statically-typed programming language that compiles to highly readable and efficient JavaScript. One of its core goals is to provide a seamless interoperation with existing JavaScript code and TypeScript type definitions. However, ReScript’s current union type implementation relies on tagged variants, which may not align with the way some JavaScript libraries and TypeScript definitions handle variants.
This document presents a proposal for introducing untagged variants to ReScript, enabling developers to work more closely with JavaScript conventions and TypeScript type definitions. Untagged variants allow ReScript to represent a union of different types without the need for a tag or a constructor to differentiate the types at runtime. This feature will simplify the handling of union types in ReScript, improving both ergonomics and code generation.
The proposal includes a detailed design of the new type constructors, type inference, pattern matching, and handling of unknown values. We will also provide a comprehensive example that demonstrates the use of untagged variants and discuss some of the limitations and considerations associated with the feature.
By extending ReScript with untagged union support, we aim to enhance the language’s compatibility with JavaScript and TypeScript ecosystems while maintaining its core principles of type safety and performance.
New Type Constructors
The proposed design introduces a new syntax for defining untagged variants using the |: operator. This operator allows the declaration of a union type without requiring a tag or a constructor to differentiate the types at runtime. Here’s how the syntax works:
type untaggedUnion = TypeA |: TypeB
In this example, untaggedUnion represents an untagged union of TypeA and TypeB. Unlike tagged variants, there’s no need for a constructor to differentiate between the types; instead, ReScript will rely on JavaScript’s built-in typeof operator during pattern matching to distinguish between the different types within the union.
The |:
operator can be used to define untagged variants with more than two types as well:
type anotherUntaggedUnion = TypeA |: TypeB |: TypeC
This new syntax provides a straightforward and concise way to define untagged variants in ReScript, enabling developers to work more closely with JavaScript conventions and TypeScript type definitions. It also ensures that the generated JavaScript code remains efficient and readable.
In the next sections, we’ll discuss how this new syntax interacts with ReScript’s type inference and pattern matching features to provide a seamless and type-safe experience when working with untagged variants.
Type Inference and Pattern Matching
One of the strengths of ReScript is its powerful type inference system, which allows the language to deduce the types of expressions without explicit type annotations. With the introduction of untagged variants, the type inference system must be adapted to handle these new types effectively.
When working with untagged variants, ReScript’s pattern matching syntax remains mostly unchanged. However, the compilation process will now generate JavaScript code that uses the typeof operator to perform type checks, ensuring that the correct case is executed based on the input’s runtime type.
Here’s an example of pattern matching with an untagged union:
type maybeString = StringValue(string) |: UnknownValue(unknown)
let process = (input: maybeString) => {
switch input {
| StringValue(str) => Js.log2("String:", str)
| UnknownValue(value) => Js.log2("Unknown value:", unknownToString(input))
}
}
In this example, the process function takes an input of type maybeString, which is an untagged union of string and unknown. The switch expression uses pattern matching to handle both cases:
When the input is a string, the StringValue(str) case is executed.
When the input is of any other type, the UnknownValue(value) case is executed.
The type inference system will ensure that the correct type is associated with the bound variable (e.g., str in the StringValue case) within each branch of the pattern matching expression.
When the ReScript code is compiled to JavaScript, the generated code will use the typeof operator to perform the necessary type checks:
function process(input) {
if (typeof input === "string") {
console.log("String:", input);
} else {
console.log("Unknown value:", unknownToString(input));
}
}
As you can see, the JavaScript code maintains readability and efficiency by leveraging the native typeof operator for type checks.
The type inference system in ReScript ensures that the untagged union types are propagated correctly through the program. This guarantees that the pattern matching expressions will provide type safety while working with untagged variants.
In the next section, we will discuss how to handle unknown values within untagged variants and how to safely work with them, for example, in logging scenarios.
Handling Unknown Values
When working with untagged variants, it’s possible that a value might not match any of the expected types. In such cases, it’s important to provide a safe and convenient way to handle these unknown values.
In ReScript, the unknown type is used to represent values of an indeterminate type. To handle unknown values safely, we can provide utility functions that perform type-safe operations on the unknown values. One common use case is converting an unknown value to a string representation for logging purposes.
Here’s an example of a utility function that safely converts an unknown value to a string:
let unknownToString = (value: unknown) => {
switch value {
| StringValue(str) => str
| NumberValue(num) => Int.toString(num)
| BoolValue(bool) => string_of_bool(bool)
| UnknownValue(_) => "<unknown>"
}
}
In this example, unknownToString takes an unknown value as input and uses pattern matching to determine its type. For each known type, the function returns the appropriate string representation. If the value does not match any of the known types, it returns the generic string “”.
This utility function allows you to work with unknown values safely, ensuring that only well-defined operations are
performed on the input. You can use this function, for example, when logging the value of an unknown type:
let process = (input: maybeString) => {
switch input {
| StringValue(str) => Js.log2("String:", str)
| UnknownValue(value) => Js.log2("Unknown value:", unknownToString(input))
}
}
In the process function, if the input is of an unknown type, the UnknownValue(value) case is executed. The unknownToString function is called with the input value to obtain a string representation, which is then logged to the console.
Using utility functions like unknownToString provides a safe and flexible way to handle unknown values within untagged variants. By following this pattern, you can create similar utility functions for other generic operations that need to be performed on unknown values, ensuring type safety and proper handling of various cases.
In summary, the proposed design for untagged variants in ReScript enables a seamless integration with JavaScript’s dynamic type system while preserving the type safety and pattern matching capabilities that ReScript developers appreciate. This approach simplifies working with TypeScript type definitions and enhances the interoperability between ReScript and JavaScript codebases.
Example Use Case
In this section, we will explore an example use case that demonstrates the benefits of using untagged variants in ReScript.
Consider a scenario where you are building a web application that fetches data from a third-party API. The API returns a heterogeneous list of items, where each item can be either a string or a number. The goal is to process this list and perform different actions based on the item’s type.
First, let’s define an untagged union type to represent the items in the list:
type listItem = StringValue(string) |: NumberValue(number)
Now, we will define a function to process a single item:
let processItem = (item: listItem) => {
switch item {
| StringValue(str) => Js.log2("String:", str)
| NumberValue(num) => Js.log2("Number:", num)
}
}
The processItem function takes a listItem as input and uses pattern matching to handle the different cases. When the input is a string, it logs the string value. When the input is a number, it logs the number value.
Next, we will define a function to process the entire array of items:
let processArray = (items: array<listItem>) => {
items->Array.forEach(processItem)
}
The processList function takes an array of listItem values and iterates through the array, calling the processItem
function for each item. The Array.forEach function is a built-in ReScript function that takes a function and a array as its arguments and applies the function to each element in the array.
Now, let’s simulate fetching the data from the API and processing the list:
let apiData = StringValue("Apple") |: NumberValue(42) |: StringValue("Banana") |: NumberValue(3)
processArray(apiData)
The apiData list contains a mix of string and number values. We pass this list to the processArray function, which in turn calls the processItem function for each item in the array.
When compiled to JavaScript, the generated code uses the native typeof operator to perform type checks:
function processItem(item) {
if (typeof item === "string") {
console.log("String:", item);
} else {
console.log("Number:", item);
}
}
function processArray(items) {
items.forEach(processItem);
}
const apiData = ["Apple", 42, "Banana", 3];
processArray(apiData);
As you can see, the compiled JavaScript code is clean and efficient, relying on the typeof operator to differentiate between string and number values.
This example demonstrates how untagged variants in ReScript can simplify working with heterogeneous data structures and improve the interoperability between ReScript and JavaScript. By using untagged variants, developers can leverage ReScript’s type safety and pattern matching capabilities while benefiting from JavaScript’s dynamic type system.
Continuing the example use case, let’s consider a situation where the API might also return unknown values, and we want to handle them gracefully. We can update our listItem type definition to include an unknown type:
type listItem = StringValue(string) |: NumberValue(number) |: UnknownValue(unknown)
We will now update the processItem function to handle the case where the item is of an unknown type:
let processItem = (item: listItem) => {
switch item {
| StringValue(str) => Js.log2("String:", str)
| NumberValue(num) => Js.log2("Number:", num)
| UnknownValue(value) => Js.log2("Unknown value:", unknownToString(value))
}
}
In this updated version of the processItem function, we added a new case for UnknownValue(value). When an item is of an unknown type, we call the unknownToString function to obtain a string representation of the value and log it to the console.
Let’s simulate fetching the data from the API again, this time with an unknown value included:
let apiData = StringValue("Apple") |: NumberValue(42) |: StringValue("Banana") |: NumberValue(3) UnknownValue(Js.Nullable.null)]
processArray(apiData)
The apiData list now contains a mix of string, number, and unknown values. We pass this list to the processList function, which in turn calls the processItem function for each item in the list.
When compiled to JavaScript, the generated code uses the typeof operator to perform type checks for strings and numbers, and additional checks for unknown values:
function unknownToString(value) {
return String(value);
}
function processItem(item) {
if (typeof item === "string") {
console.log("String:", item);
} else if (typeof item === "number") {
console.log("Number:", item);
} else {
console.log("Unknown value:", unknownToString(item));
}
}
function processArray(items) {
items.forEach(processItem);
}
const apiData = ["Apple", 42, "Banana", 3, null];
processArray(apiData);
As you can see, the compiled JavaScript code handles unknown values by calling the unknownToString function, which converts the unknown value to a string representation. This approach ensures that the application can gracefully handle unexpected data while still benefiting from the safety and expressiveness of ReScript’s type system and pattern matching capabilities.
In summary, this extended example demonstrates how untagged variants in ReScript can be used to work with heterogeneous data structures, including cases where some values might be unknown. By using untagged variants, developers can write clean, efficient, and safe code that leverages the strengths of both ReScript and JavaScript.
Limitations and Considerations
While untagged variants provide a more convenient way to work with heterogeneous data structures in ReScript, there are some limitations and considerations that developers should be aware of:
- Overlapping types
When working with untagged variants, special care must be taken if the union contains overlapping types. For instance, if the union contains both string and number, the generated JavaScript code will use the typeof operator to distinguish between the two types. However, if the union contains types that cannot be easily distinguished using JavaScript’s typeof operator, it may lead to unexpected behavior or runtime errors.
For example, if the union contains both string and Js.Nullable.t, the generated JavaScript code might not be able to distinguish between the two types accurately, as the typeof operator will return “string” for both cases.
-
Limited to JavaScript’s runtime type information
Since untagged variants rely on JavaScript’s runtime type information, they are limited by the types that can be reliably distinguished at runtime. For example, distinguishing between custom types or complex data structures might not be possible using untagged variants. -
Type safety trade-offs
Using untagged variants involves some trade-offs in terms of type safety. While pattern matching ensures that all cases are handled, the absence of tags in the runtime representation might lead to subtle bugs if the types within the union are not properly distinguished.
Performance implications
The performance of untagged variants depends on the generated JavaScript code and the JavaScript engine’s ability to optimize the code. In some cases, using untagged variants might lead to slightly faster execution times, as the JavaScript engine can directly use the typeof operator or other built-in checks without the need for additional tag comparisons.
However, the performance difference between tagged and untagged variants is likely to be minimal in most cases. Modern JavaScript engines are highly optimized and can often handle tagged variants efficiently. Additionally, ReScript’s compiler is designed to produce efficient JavaScript code, so the performance impact of using tagged variants might be negligible.
It is important to note that the performance characteristics of untagged variants may vary depending on the specific use case and the types involved in the union. When considering untagged variants for performance reasons, it is recommended to benchmark and compare the performance of both tagged and untagged variants in the context of the specific application.
In summary, while untagged variants might offer some performance benefits in certain situations, the primary motivation for using them is to improve the ergonomics of working with heterogeneous data structures in ReScript. Developers should carefully consider the implications of using untagged variants in performance-critical scenarios.
Related Work
The concept of untagged variants with high-level pattern matching is not entirely novel. The idea of using untagged variants and pattern matching has been explored in several programming languages and libraries, with different degrees of support for type-safety and expressiveness. However, the specific combination of untagged variants, type variables, and high-level pattern matching presented in this discussion seems to be less common.
Related Work:
In this section, we discuss several programming languages and libraries that have explored the concept of untagged variants and pattern matching.
-
Rust: Rust’s enum is a tagged union that provides a way to define custom data types with multiple variants. Rust also supports pattern matching with the match keyword, allowing for expressive and type-safe destructuring of these custom data types. However, Rust’s enums are tagged, which means the variants are explicitly labeled.
-
Haskell: Haskell’s algebraic data types (ADTs) are a powerful way to define custom data types with multiple constructors, and Haskell’s pattern matching syntax is highly expressive. Although Haskell’s ADTs are not untagged variants, they demonstrate the power of combining custom data types with pattern matching.
-
TypeScript: TypeScript, as a superset of JavaScript, supports untagged variants through its union types. However, TypeScript’s support for pattern matching is limited to runtime type checks using the typeof and instanceof operators or user-defined type guards. This approach is less expressive and type-safe compared to high-level pattern matching, and it does not provide the same level of support for type variables in untagged variants.
-
Polymorphic Variants (OCaml): OCaml supports polymorphic variants, which are more flexible than traditional algebraic data types. They allow for extensible and more precise type information, but they are boxed when carrying a payload, which can introduce some runtime overhead.
-
Scala: Scala’s case classes and sealed traits provide a way to define custom data types with multiple cases, and Scala’s pattern matching using the match keyword is highly expressive. While not untagged variants, this combination demonstrates a powerful way to work with custom data types and pattern matching.
-
ATS Programming Language: ATS is a statically-typed programming language that unifies implementation with formal specification. It has a feature called “views,” which allows for more fine-grained control over memory layout and representation, similar to the idea of untagged variants. Views provide a way to optimize runtime representation, but they do not directly address pattern matching.
-
CDuce: CDuce is a functional programming language designed for XML processing that features a type system based on regular expression types. CDuce supports untagged union types, with pattern matching and more expressive type information. While CDuce’s focus is on XML processing, its treatment of untagged variants and pattern matching is somewhat similar to the untagged variants proposal discussed here.
-
Publication: “Pattern Matching with First-Class Polymorphism”: This paper by Garrigue and Rémy (2013) proposes a generalization of polymorphic variants in OCaml that allows for first-class polymorphism in pattern matching. The proposed system enables more expressive pattern matching and a more efficient runtime representation. While not directly the same as the untagged variants proposal, the ideas in this paper offer a related approach to enhancing pattern matching and optimizing runtime representation.
In summary, the concept of untagged variants combined with high-level pattern matching and type variables is not entirely novel. Several programming languages and libraries have explored similar ideas with different trade-offs in terms of expressiveness, type safety, and support for type variables. However, the specific combination of features discussed in this conversation appears to be less common, which may provide a unique perspective and opportunities for further exploration in programming language design.
Conclusion
Untagged variants provide a valuable addition to the ReScript language, allowing developers to work with heterogeneous data structures more conveniently and efficiently. By leveraging JavaScript’s runtime type information, untagged variants enable cleaner, safer, and potentially faster code generation without sacrificing the benefits of ReScript’s type system and pattern matching capabilities.
In this document, we have outlined the design of untagged variants in ReScript, including new type constructors, type inference and pattern matching, handling unknown values, and example use cases. We have also discussed the limitations, considerations, and performance implications of using untagged variants.
While untagged variants do have some trade-offs in terms of type safety and overlapping types, their benefits in terms of ergonomics and compatibility with JavaScript and TypeScript make them a valuable feature for many developers. By carefully considering the specific requirements of their applications, developers can determine whether untagged variants are an appropriate solution for their needs.
Appendix: Investigation of Untagged Variants with Type Variables
In this appendix, we summarize our investigation of untagged variants with type variables. We explored how to extend the untagged union proposal to handle type variables and what limitations arise when dealing with them.
1. Pattern matching with type variables
When pattern matching with untagged variants that have type variables, the type inference mechanism will ensure that the correct type is inferred for each case. However, the compilation of pattern matching expressions must take into account the presence of type variables. This is because type variables can represent different types at runtime, and the generated JavaScript code needs to correctly handle these cases.
2. Limitations and handling multiple cases
In our investigation, we found that untagged variants can handle multiple cases (3 or 4 cases, for example), but at most one of them should use a type variable. This limitation stems from the fact that more than one case with type variables would require additional runtime information to disambiguate between cases, which goes against the principle of untagged variants.
3. Practical examples
We provided examples of untagged variants with type variables being useful in practice:
- A
Result
type withOk
andError
cases, where theOk
case has a type variable. - An event handling system with
ClickEvent
,KeyEvent
, andCustomEvent
cases, where theCustomEvent
case has a type variable.
These examples demonstrated that the expressivity of untagged variants with type variables can be beneficial in practical scenarios, offering flexibility when dealing with diverse data types and structures while maintaining simplicity and type safety.
In summary, pattern matching with untagged variants that include type variables introduces complexity to the compilation process. The compiler must generate JavaScript code that takes into account the variations in types represented by the type variables, ensuring correct type inference, type checking, and type coercion during runtime. By carefully addressing these challenges, it’s possible to create a robust and efficient implementation of pattern matching for untagged variants with type variables.
Authors: Cristiano C., Gabriel N., Intelligentia A.