ReScript decoding libraries benchmark

DZakh · June 13, 2022, 5:47pm

I’m working on a new decoding library for ReScript, and decided to make a benchmark for other popular libraries to get the idea of the current state of the ecosystem.

The result highlight is:

decco ~ 2_715_999 op/s
bs-json ~ 1_575_450 op/s
rescript-struct ~ 1_555_331 op/s
rescript-jzon ~ 1_069_012 op/s

For comparison there are some popular JS libraries:

ajv ~ 31_198_292 op/s (Assertion not decoding)
zod ~ 405_092 op/s
runtypes ~ 111_345 op/s
superstruct ~ 114_114 op/s

You can take a look at the full graph at:
https://dzakh.github.io/rescript-runtime-type-benchmarks/

I haven’t done a benchmark for Js.Json.t, but if you’re interested in my opinion it should be slower than rescript-jzon if you want to have a result based API, and slower than bs-json if you want to have exception based API.

Also, I’ve taken a look at the code generated by decco, and had a thought that making it a little bit more dirty and unsafe, it might have a similar performance to ajv or even better. That’s the opposite to my library rescript-struct where I almost hit a performance maximum caused by library design.

If I missed some good decoding libraries, let me know, I’ll add them to the benchmark.

vdanchenkov · June 13, 2022, 6:53pm

It would be interesting to add atd to the comparison.

DZakh · June 13, 2022, 7:11pm

Can you send a link to a decoding library that can be used with it.

vdanchenkov · June 13, 2022, 7:29pm

https://atd.readthedocs.io/en/latest/atdgen.html

Atdgen generates *.ml files that can be consumed from the rescript code. We evaluated this setup once and everything seemed to work, but due to overall complexity (opam, code generation, ml syntax) we decided to stick with decco.

DZakh · June 13, 2022, 7:41pm

I’ll take a look at it when I have more time. But PRs are always welcomed https://github.com/DZakh/rescript-runtime-type-benchmarks

spyder · June 13, 2022, 11:53pm

To use atdgen in a ReScript project, follow the instructions in the runtime:

I’ve taken ownership of the executable generator project but haven’t had time to rename it to a rescript name yet.

I use this in production and find it much better than decco, which I used to rely on but the overhead of a ppx on a large project is substantial. The atdgen runtime is based on bs-json, though, so it might only be as fast as that (but it’s much easier to develop with).

I’ll contribute atd to the benchmark if I get a chance, but I’m fairly busy at the moment.

tom-sherman · June 14, 2022, 10:08am

Could you add https://github.com/gcanti/io-ts to your benchmark?

DZakh · June 14, 2022, 11:26am

I forked https://github.com/moltar/typescript-runtime-type-benchmarks, so it’s already there

DZakh · July 20, 2022, 9:58pm

From the moment of the initial post there were some movements in the ecosystem related to decoding libraries:

The author of bs-json @glennsl released rescript-json-combinators - a spriritual successor of bs-json, that looks very good.
@ryb73 released the 1.6.0 version of decco, where one of the changes is adding “Decco is not being actively maintained” to the project status. That’s not something big, because Decco is production ready already for a long time, but @moondaddi recently mentioned Spice that might become a better option than Decco.
I released the 0.20.1 of rescript-struct making it more performant and mature.

Here’s the updated benchmark result:

decco ~ 2_168_825 op/s
rescript-json-combinators ~ 1_669_887 op/s
rescript-struct ~ 1_502_190 op/s
bs-json ~ 1_280_373 op/s
rescript-jzon ~ 769_964 op/s

You can take a look at the full graph at: https://dzakh.github.io/rescript-runtime-type-benchmarks

Mng12345 · July 21, 2022, 4:30pm

Very interested to know why decco is almost twice faster than others.

DZakh · July 21, 2022, 8:03pm

In the benchmark libraries need to decode an object to data record. It has a nested record and some primitive fields. More times the object decoded is better.

type nested = {foo: string, num: float, bool: bool}
type data = {
  number: float,
  negNumber: float,
  maxNumber: float,
  string: string,
  longString: string,
  boolean: bool,
  deeplyNested: nested,
}

By my observation an object allocation results in ~250_000 ops/s decrease in performance in this kind of benchmark. While an additional if statement or a function call during a field decoding would cost between 50_000 and 100_000 ops/s.

In this case decco is faster because it generates a decode function in compile time with inlined logic of applying decoders for fields. But actually, the question should be why decco is only a few percent faster than other libraries, that combine decoders in runtime without an ability to inline everything.

For instance, if rescript-json-combinators didn’t create the type fieldDecoders = {optional, required} record every time decoding an object, it could be even faster than current decco. This is two object allocations, so about 500_000 ops/s improvement in the benchmark.

Some libraries like rescript-jzon use Js.Json | ReScript API under the hood, which creates additional overhead. Theoretically, it should be as fast as rescript-struct.

While my rescript-struct has a declarative API compared to rescript-json-combinators, making it almost impossible to improve performance even a little bit more (in the current design). There are some unavoidable function calls and iterators.

So if you need to be real fast, you need to generate a decoding function beforehand with everything inlined. That’s possible to do either in compile time like decco does, or in runtime with eval/new Function that nobody does, but I’m thinking about trying to do so

And the problem with decco, is that it uses Js.Json | ReScript API and doesn’t inline a lot of code that’s possible to inline. For example changing all Belt_Option.getWithDefault(Js_dict.get(dict$1, "fieldName"), null) in the deccos generated decoder to dict$1["fieldName"] ?? null increased performance from 3_200_122 ops/s to 19_841_462 ops/s on my local machine. You can take a look at the spectypes benchmark, decco might be as fast.

But all of these are micro-optimizations, 1_000_000 ops/s is already very fast, and sometimes it might be better to invest time in a more convenient API, than increasing some benchmark numbers.

spyder · July 24, 2022, 12:37pm

Decco hasn’t been actively maintained for over a year, it’s just only become obvious with v1.6.

My new laptop should arrive this week then I might have time to PR atd to the benchmark suite.

DZakh · December 5, 2022, 8:35pm

I’ve released rescript-struct@3.0.0, and all the benchmarks above became utterly irrelevant. Funny to read my own messages where I say that I’ve reached the performance maximum. Well, now it might be actually true

And performance is not the biggest change in rescript-struct V3. I’ll write a separate post presenting the new exciting features.

Also, I added ppx-spice to the benchmark. If you are a decco fan, go and check out this one.

The benchmark:

danielo515 · December 6, 2022, 9:42am

I am a decco fan, and I love spice. I’m very happy with their performance and DX experience, and I wish they just do some more little performance improvements, given the opportunity they have to generate code at compile time.

Thanks for the benchmarks, they look outstanding.

DZakh · December 6, 2022, 9:49am

Yeah, they can be as fast as spectypes.

hoichi · December 6, 2022, 1:14pm

I’d probably be a Decco/Spice fan too if ppxs were better supported by ReScript. As it is now, I don’t really want to lose compilation speed.