Unicode support in pattern matching

Unicode strings with backticks don’t seem to work as expected in pattern matching.

This code will return false:

let test = switch `ü` {
  | `ü` => true
  | _ => false
}

Compiled JS output:

var test = "ü" === "\xc3\xbc" ? true : false;

In the release post for ReScript 9.1 it looks like this will be supported in a future release, so I suppose it is on the roadmap for v10.

I think this may work fine as a workaround until we have proper unicode support:

let test = switch `ü` {
  | a if a == `ü` => true
  | _ => false
}

Is there a better approach or any problems with this one to be aware of?

[EDIT] As a side-note, since this behaviour is (at least from my side) unexpected and there is no warning from the compiler, so I wonder if it should at least be mentioned in the docs or (even better) backticks should not be allowed at all in patterns until unicode is supported too. Any opinions?

1 Like

If you’re going to do switch something { | a if a == ... => ... } then you might as well just do if something == ... { ... }. In other words, pattern matching is overkill in this case.

Yes, this is a very simplified case, but my aim was to provide an example for a general rule for these kinds of situations. Another example would be pattern-matching in lists of strings, where simple if-statements would not work as well.

this is a bug should be fixed in next release, can you file an issue on github to keep it on track? thanks

4 Likes

Good to know it’ll be fixed, here is the issue: https://github.com/rescript-lang/rescript-compiler/issues/5290 (I hope it’s the correct repo)

1 Like

Note that pattern matching on unicode can be problematic because a string can be encoded in different ways. One would need to perform normalization on both the input and the string pattern.
ṩ, ṩ and all represent the same character but are encoded differently and would not be equal with the usual operator ==.
("\u{0073}\u{0307}\u{0323}", “\u{0073}\u{0323}\u{0307}” and “\u{1e69}”)

4 Likes

I’m confused by the behavior of the compiler.If there is an escape character in the pattern matching case, the compiled code will have an extra \.

Rescript code:

let s = `{let {[x 10]}
           {+ x 1}}`

let c = Js.String2.charAt(s, 1)
let r = switch c {
 | "\t"
 | "\r"
 | "\n" => ""
 | _ => c
}

Compiled JS code:

var s = "{let {[x 10]}\n           {+ x 1}}";

var c = s.charAt(1);

var r;

switch (c) {
  case "\\n" :
  case "\\r" :
  case "\\t" :
      r = "";
      break;
  default:
    r = c;
}

exports.s = s;
exports.c = c;
exports.r = r;

.

What version of ReScript do you have?

I use the latest version 10.0.1 and the Rescript Playground with the same result.code

Install 10.1.0. It’s an escaping issue of the 10.0.x

I installed 10.1.0 but it doesn’t work for me. Then I installed 9.1.0 and solved the problem