Unicode support in pattern matching

Unicode strings with backticks don’t seem to work as expected in pattern matching.

This code will return false:

let test = switch `ü` {
  | `ü` => true
  | _ => false
}

Compiled JS output:

var test = "ü" === "\xc3\xbc" ? true : false;

In the release post for ReScript 9.1 it looks like this will be supported in a future release, so I suppose it is on the roadmap for v10.

I think this may work fine as a workaround until we have proper unicode support:

let test = switch `ü` {
  | a if a == `ü` => true
  | _ => false
}

Is there a better approach or any problems with this one to be aware of?

[EDIT] As a side-note, since this behaviour is (at least from my side) unexpected and there is no warning from the compiler, so I wonder if it should at least be mentioned in the docs or (even better) backticks should not be allowed at all in patterns until unicode is supported too. Any opinions?

1 Like

If you’re going to do switch something { | a if a == ... => ... } then you might as well just do if something == ... { ... }. In other words, pattern matching is overkill in this case.

Yes, this is a very simplified case, but my aim was to provide an example for a general rule for these kinds of situations. Another example would be pattern-matching in lists of strings, where simple if-statements would not work as well.

this is a bug should be fixed in next release, can you file an issue on github to keep it on track? thanks

4 Likes

Good to know it’ll be fixed, here is the issue: https://github.com/rescript-lang/rescript-compiler/issues/5290 (I hope it’s the correct repo)

1 Like

Note that pattern matching on unicode can be problematic because a string can be encoded in different ways. One would need to perform normalization on both the input and the string pattern.
ṩ, ṩ and all represent the same character but are encoded differently and would not be equal with the usual operator ==.
("\u{0073}\u{0307}\u{0323}", “\u{0073}\u{0323}\u{0307}” and “\u{1e69}”)

3 Likes