Here’s my stab at a GADT tutorial, borrowing some examples from the github issue I hope this can be a starting point for some official documentation of GADTs. As I mention in the tutorial, GADTs are usually not necessary, but when they are necessary, they really help, so it would be great to get this finally documented.
I would greatly appreciate any feedback!
GADTs
Generalized Algebraic Data Types (GADTs) are an advanced feature of the type system. “Generalized” can be somewhat of a misnomer – what they actually allow you to do is add some extra type-specificity into your variants. Using a GADT, you can give the individual constructors of your variant different types.
For a quick overview of the use cases, reach for GADTs when:
- You need to distinguish between different members of a variant at the type-level
- You want to “hide” type information in a type-safe way, without resorting to casts.
- You need a function to return a different type depending on its input.
GADTs usually are overkill, but when you need them, you need them! Understanding them from first principles is difficult, so it is best to explain through some motivating examples.
Distinguishing Constructors (Subtyping)
Suppose a simple variant type that represents the current timezone of a date value. This handles both daylight savings and standard time:
type timezone =
| EST // standard time
| EDT // daylight time
| CST // standard time
| CDT // daylight time
// etc...
Using this variant type, we will end up having functions like this:
let convert_to_daylight = tz => {
switch tz {
| EST => EDT
| CST => CDT
| EDT | CDT /* or, _ */ => failwith("Invalid timezone provided!")
}
}
This function is only valid for a subset of our variant type’s constructors but we can’t handle this in a type-safe way using regular variants. We have to enforce that at runtime – and moreover the compiler can’t help us ensure we are failing only in the invalid cases. We are back to dynamically checking validity like we would in a language without static typing. If you work with a large variant type long enough, you will frequently find yourself writing repetitive catchall switch
statements like the above, and for little actual benefit. The compiler should be able to help us here.
Lets see if we can find a way for the compiler to help us out with normal variants. We could define another variant type to distinguish the two kinds of timezone.
type daylight_or_standard =
| Daylight(timezone)
| Standard(timezone)
This has a lot of problems. For one, it’s cumbersome and redundant. We would now have to pattern-match twice whenever we deal with a timezone that’s wrapped up here. The compiler will force us to check whether we are dealing with daylight or standard time, but notice that there’s nothing stopping us from providing invalid timezones to these constructors:
let invalid_tz1 = Daylight(EST)
let invalid_tz2 = Standard(EDT)
Consequently, we still have to write our redundant catchall cases. We could define daylight savings time and standard time as two separate types, and unify those in our daylight_or_standard
variant. That could be a passable solution, but that makes a distinction really would like to do is implement some kind of subtyping relationship. We have two kinds of timezone. This is where GADTs are handy:
type standard
type daylight
type rec timezone<_> =
| EST: timezone<standard>
| EDT: timezone<daylight>
| CST: timezone<standard>
| CDT: timezone<daylight>
We define our type with a type parameter. We manually annotate each constructor, providing it with the correct type parameter indicating whether it is standard or daylight. Each constructor is a timezone
,
but we’ve added another level of specificity using a type parameter. Constructors are now understood to be standard
or daylight
at the type level. Now we can fix our function like this:
let convert_to_daylight = tz => {
switch tz {
| EST => EDT
| CST => CDT
}
}
The compiler can infer correctly that this function should only take timezone<standard>
and only output
timezone<daylight>
. We don’t need to add any redundant catchall cases and the compiler will even error if
we try to return a standard timezone from this function. Actually, this seems like it could be a problem,
we still want to be able to match on all cases of the variant sometimes, and a naive attempt at this will not pass the type checker. A naive example will fail:
let convert_to_daylight = tz => {
switch tz {
| EST => EDT
| CST => CDT
| CDT => CDT
| EDT => EDT
}
}
This will complain that daylight
and standard
are incompatible. To fix this, we need to explicitly annotate to tell the compiler to accept both:
let convert_to_daylight : type a. timezone<a> => timezone<daylight> = // ...
type a.
here defines a locally abstract type which basically tells the compiler that the type parameter a is some specific type, but we don’t care what it is. The cost of the extra specificity and safety that GADTs give us is that the compiler is not able to help us with type inference as much.
This is just one of the use cases we listed above, and it’s a somewhat contrived example – you probably don’t want to roll your own timezone library. However, similar situations arise in other domains. Here are just a few concrete examples where this usage of GADTs is helpful:
- Encoding what kinds of state transitions are valid in a UI (or any state machine)
- Writing parsers (this is the archetypal use case for GADTs)
- Bindings to JS libraries that use inheritance/subtyping.
Hiding Type Information
Sometimes, we want to hide or abstract over some type information. This is especially the case when you have complex types with a lot of parameters that you don’t care about.
Suppose you need to pass an external library a function that returns “any react component.” A naive implementation will fail as different react components in Rescript will have different types unless their props are all the same. We don’t always want to care about this.
We could define an abstract type like this and cast to it to deal with this:
type anyComponent
external castToAny: React.componentLike<'a, 'b> => anyComponent = "%identity"
You could just make do with a typecast, and in simple cases that can be fine. But what if we could do more without bypassing the type system?
// the @unboxed here ensures that the runtime representation of an anyComponent is a React.componentLike
@unboxed type rec anyComponent = AnyComponent(React.componentLike<'a,b'>) : anyComponent```
This is a GADT with one constructor. Providing AnyComponent with any react component will hide the type parameters 'a
and 'b
from us and let us treat them as if they are the same. Of course, since we lost the type information that specifies how to call these components, we can no longer call them from Rescript. It is possible to recover this type information, but we will return to that in a later section.
Varying return type
Sometimes, a function should have a different return type based on what you give it, and GADTs are how we can do this in a type-safe way. We can implement a generic add
function that works on both int
or float
:
type rec number<_> = Int(int): number<int> | Float(float): number<float>
let add:
type a. (number<a>, number<a>) => a =
(a, b) =>
switch (a, b) {
| (Int(a), Int(b)) => a + b
| (Float(a), Float(b)) => a +. b
}
let foo = add(Int(1), Int(2))
let bar = add(Int(1), Float(2.0)) // the compiler will complain here
How does this work? The key thing is the function signature for add. The number GADT is acting as a type witness
. We have told the compiler that the type parameter for number
will be the same as the type we return – both are set to a
. So if we provide a number<int>
, a
equals int
, and the function will therefore return an int
.
We can also use this to avoid returning option
unnecessarily. This example is adapted from Real World Ocaml, chapter 9. We create an array searching function can be configured to either raise an exception, return an option
, or provide a default
value depending on the behavior we want.
module IfNotFound = {
type rec t<_, _> =
| Raise: t<'a, 'a>
| ReturnNone: t<'a, option<'a>>
| DefaultTo('a): t<'a, 'a>
}
let flexible_find:
type a b. (~f: a => bool, array<a>, IfNotFound.t<a, b>) => b =
(~f, arr, ifNotFound) => {
open IfNotFound
switch Array.find(arr, f) {
| None =>
switch ifNotFound {
| Raise => failwith("No matching item found")
| ReturnNone => None
| DefaultTo(x) => x
}
| Some(x) =>
switch ifNotFound {
| ReturnNone => Some(x)
| Raise => x
| DefaultTo(_) => x
}
}
}
Recover Type information Dynamically
A very advanced technique with GADTs is to combine GADTs as type witness with the above “type hiding” behavior to make it possible to recover type information that we previously hid. We do this by embedding a type-witness GADT inside of our type-hiding GADT. We can use this to write a
generic sum function over arrays of numbers:
type rec num_ty<'a> =
| Int: num_ty<int>
| Float: num_ty<float>
and num = Num(num_ty<'a>, 'a): num
and num_array = Narray(num_ty<'a>, array<'a>): num_array
let add_int = (x, y) => x + y
let add_float = (x, y) => x +. y
let sum = (Narray(witness, array)) => {
switch witness {
| Int => Num(Int, array->Array.reduce(0, add_int))
| Float => Num(Float, array->Array.reduce(0., add_float))
}
}