[ANN] rescript-struct@5.1 - Make JS/TS a full-fledged target

DZakh · October 12, 2023, 11:35am

Starting from the version JS/TS API users are as crucial as ReScript ones. rescript-struct has a unique design combining good DX, a small JS footprint, and insane performance. I see how it can benefit not only the ReScript ecosystem but also become a good fit for JS/TS performance-critical projects.

To support the vision, I’ve split the documentation into 4 parts:

Now, each target audience will be able to find as detailed answers as possible without looking at the source code. Also, I’ve added the Table of contents section to improve your reading experience.

Comparison section

When talking about statistics, I prefer numbers over gut feeling. So, I’ve prepared a comparison section where I compare rescript-struct with the currently hyping Zod and Valibot libraries. I’ve tried to highlight the strong sides of each library and keep it as honest as possible. Here’s the gist of the comparison:

	rescript-struct@5.1.0	Zod@3.22.2	Valibot@0.18.0
Total size (minified + gzipped)	9.67 kB	13.4 kB	6.73 kB
JS/TS API example size (minified + gzipped)	5.53 kB	12.8 kB	965 B
Nested object parsing	153,787 ops/ms	1,177 ops/ms	3,562 ops/ms
Compile + Nested object parsing	54 ops/ms	110 ops/ms	1,937 ops/ms
Eval-free
Codegen-free (Doesn’t need compiler)
Ecosystem

Improved tree-shaking

When Valibot came out, it became a role model in terms of the JS footprint for parsing libraries. While having a similar modular design with small independent functions, I’ve decided to put some effort into reducing rescript-struct bundle size as much as possible. So, if we are talking of the JS/TS API example size, here are the improvements in the release:

	V.5.0.1	V5.1.0	Diff
JS/TS API example size	17.3 kB	15.3 kB	-2 kB
JS/TS API example size (minified + gzipped)	6.08 kB	5.53 kB	-0.55 kB

JS/TS API enrichment

`S.merge` (Not available for ReScript users)

You can add additional fields to an object schema with the merge function.

const baseTeacherStruct = S.object({ students: S.array(S.string) });
const hasIDStruct = S.object({ id: S.string });

const teacherStruct = S.merge(baseTeacherStruct, hasIDStruct);
type Teacher = S.Output<typeof teacherStruct>; // => { students: string[], id: string }

The function will throw if the structs share keys. The returned schema also inherits the “unknownKeys” policy (strip/strict) of B.

Added missing primitive `S.undefined`

// empty type
S.undefined;

Alias for S.unit in ReScript API.

Advanced object struct

Sometimes you want to transform the data coming to your system. You can easily do it by passing a function to the S.object struct.

const userStruct = S.object((s) => ({
  id: s.field("USER_ID", S.number),
  name: s.field("USER_NAME", S.string),
}));

S.parseOrThrow(userStruct, {
  USER_ID: 1,
  USER_NAME: "John",
});
// => returns { id: 1, name: "John" }

// Infer output TypeScript type of the userStruct
type User = S.Output<typeof userStruct>; // { id: number; name: string }

Compared to using S.transform, the approach has 0 performance overhead. Also, you can use the same struct to transform the parsed data back to the initial format:

S.serializeOrThrow(userStruct, {
  id: 1,
  name: "John",
});
// => returns { USER_ID: 1, USER_NAME: "John" }

Advanced tuple struct

Sometimes you want to transform incoming tuples to a more convenient data-structure. To do this you can pass a function to the S.tuple struct.

const athleteStruct = S.tuple((s) => ({
  name: s.item(0, S.string),
  jerseyNumber: s.item(1, S.number),
  statistics: s.item(
    2,
    S.object({
      pointsScored: S.number,
    })
  ),
}));

type Athlete = S.Output<typeof athleteStruct>;
// type Athlete = {
//   name: string;
//   jerseyNumber: number;
//   statistics: {
//     pointsScored: number;
//   };
// }

The same as for advanced objects, you can use the same struct for transforming the parsed data back to the initial format. Also, it has 0 performance overhead and is as fast as parsing tuples without the transformation.

`name`

S.name(S.literal({ abc: 123 }));
// `Literal({"abc": 123})`

Used internally for readable error messages.

Subject to change

`setName`

const struct = S.setName(S.literal({ abc: 123 }, "Abc"));

S.name(struct);
// `Abc`

You can customise a struct name using S.setName.

Other changes

Fixed TS type for S.literal to support any JS value
Added S.Error.reason helper to get an error reason without location
Documented S.classify/S.name/S.setName for ReScript users

cometkim · October 18, 2023, 4:18pm

I would like to add some feedback

eval would not wowrk in some edge environments such as Cloudflare Workers. Some environments don’t add a cost to verify that it is safe to use eval / new Function() / dynamic import(). If you want it to use in a wider envronments it will need to provide a compatibility mode.
The test results for zod are a little different from my experience. (this post) I think it should be evaluated differently on a case by case basis. For example, its main costs from compilation and validation, and the better implementation may vary depending on whether the use case is a long-running process or edge/serverless.

DZakh · October 18, 2023, 7:12pm

Thanks, I didn’t know. I’ll add it to the warning in the docs.

Unfortunately it’s not possible. Well, kind of is, but it’ll require literally writing a brand new library. Also, if we want to keep the same API, it’ll be insanely slow. To the point when it doesn’t worth to use the library.

Wow, I’ve actually messed up the benchmark. I’ll fix it asap.

DZakh · October 18, 2023, 7:43pm

The correct benchmark values:

Create a new schema on every run and parse
Zod x 348,526 ops/sec ±0.65% (90 runs sampled)
Valibot x 5,499,474 ops/sec ±0.36% (99 runs sampled)
rescript-struct x 82,005 ops/sec ±0.76% (93 runs sampled)

Reuse an existing schema and parse
Zod x 3,776,711 ops/sec ±0.41% (89 runs sampled)
Valibot x 8,580,592 ops/sec ±0.26% (100 runs sampled)
rescript-struct x 9,603,890 ops/sec ±0.24% (95 runs sampled)

As you told, rescript-struct has a compilation phase which happens on every operation run. I’ve tried to optimize it as much as possible, but it still takes quite some time, and if you need to run an operation a few times, it’ll be most likely slower than other libs.
And my thoughts about this: it’s still very fast + operation runs a few times + the lib doesn’t work in workers anyways = the performance isn’t affected by compilation time.

Also, benchmarks are tricky and results vary a lot depends on the case. And quite funny, that the used benchmark is one of the worst for rescript-struct, since most of the time is taken by email validation which prevents other super optimized parts from shining. Maybe it’s not very honest, but I think I’ll replace the benchmark from the table to the one used in the GitHub - moltar/typescript-runtime-type-benchmarks: 📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support. There it parses a simple nested object without any additional validation refinements, which doesn’t really show the library performance itself.

If we sum up. There are different cases when benchmarks show different results:

The schema/struct is used only once - rescript-struct will be slower than the most alternatives. I’ve decided that it’s not a performance critical case, so we can sacrifice it for more highload cases.
You use some advanced custom validations - they are optimised in rescript-struct, but if the validation takes longer than the rest of the parsing, it won’t show amazing results compared to other libs.
Parsing JSON compatible data multiple times - in this case rescript-struct will be slow on the first run, and almost instant on others.
Parsing JSON compatible data multiple times (+ fields renaming) - this is an ultimate case where rescript-struct will out perform all existing libraries including. For rescript-struct it will be as fast as the 3rd case, while for other libs there will be an overhead with a function call + object allocation.

I’m working on rescript-struct almost 2 years now and I’ve realised one thing. It’s impossible to create a perfect library, every solution will have some pros and cons. And when we choose a library to our project we need to decide which tool suites us more.

In this terms rescript-struct is quite contraversary, because it really has a noticable con as eval, but it allows to better manifest pros which wouldn’t be possible without the con.

DZakh · October 18, 2023, 7:50pm

By the way, here are the benchmarks I’m going to use in the docs now:

// It'll test the parsing speed of the following data
const data = {
  number: 1,
  negNumber: -1,
  maxNumber: Number.MAX_VALUE,
  string: "string",
  longString:
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Vivendum intellegat et qui, ei denique consequuntur vix. Semper aeterno percipit ut his, sea ex utinam referrentur repudiandae. No epicuri hendrerit consetetur sit, sit dicta adipiscing ex, in facete detracto deterruisset duo. Quot populo ad qui. Sit fugit nostrum et. Ad per diam dicant interesset, lorem iusto sensibus ut sed. No dicam aperiam vis. Pri posse graeco definitiones cu, id eam populo quaestio adipiscing, usu quod malorum te. Ex nam agam veri, dicunt efficiantur ad qui, ad legere adversarium sit. Commune platonem mel id, brute adipiscing duo an. Vivendum intellegat et qui, ei denique consequuntur vix. Offendit eleifend moderatius ex vix, quem odio mazim et qui, purto expetendis cotidieque quo cu, veri persius vituperata ei nec. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.",
  boolean: true,
  deeplyNested: {
    foo: "bar",
    num: 1,
    bool: false,
  },
};

Results
Zod x 1,177,451 ops/sec ±0.58% (93 runs sampled)
Valibot x 3,562,755 ops/sec ±0.37% (100 runs sampled)
rescript-struct x 153,787,453 ops/sec ±0.18% (98 runs sampled)

Also, if create a new schema on every run the results will be the follwing:
Zod x 110,832 ops/sec ±0.54% (93 runs sampled)
Valibot x 1,937,449 ops/sec ±0.61% (96 runs sampled)
rescript-struct x 54,359 ops/sec ±0.47% (97 runs sampled)

I’m actually proud to be only twice slower than Zod. It took me a lot of work and I consider it a very good result.

DZakh · October 18, 2023, 7:51pm

Thank you one more time for noticing my mistake

kk3 · January 16, 2024, 5:57pm

@DZakh Regarding the eval not working/safe in some envrionment, would not something like safe-eval or another re-implementation would do the trick as a fallback ?

DZakh · January 17, 2024, 8:17am

Thank you for letting me know. I’ll consider it when somebody comes complaining about eval not working for them

cometkim · February 1, 2024, 7:13pm

I reported it because of portability issue, not because of security. The probelm is the implementation relies on the API surface that are often prohibited.

Prohibiting eval is a host’s decision and cannot be bypassed by user script. In the context of prohibiting eval , new Function is also prohibited. eval itself is not the actual problem.

safe-eval is just a wrapper that creates an isolated context with the Node.js’ vm API and is not an alternative. If it can be a solution, there are standard methods; ShadowRealm rather than Node-only solution.