Serde.ml – should I port this to ReScript?

ostera · September 1, 2023, 7:10pm

Hi folks!

A bit of an odd announcement since this is a framework I wrote for OCaml, but I’m wondering if it’d be useful in ReScript, and if you folks have some specific use cases for it. Since it’s mostly pure-OCaml it should be very straightforward to port it

Serde is a serialization framework (much like the one in Rust), that uses your types to derive very efficient serializers/deserializers where the data format is pluggable.

I think on the web we mostly work with JSON, so that’d be the primary use-case for it, but I wonder if any of you are using any other formats or dealing with such big payloads that a compact representation would be useful for your apps?

Here’s how serde.ml works today (I’ll use ReScript syntax but bear in mind that it doesn’t wor for ReScript yet):

@deriving(serializer, deserializer)
type myUser = {
  name: string,
  email: Email.t,
  role: Role.t
}

Once you annotate your type, you get 2 functions for free: serializeMyUser and deserializeMyUser. You can of course choose to derive only one of both functions if you want.

Then you have some input, say some JSON, and you can try to parse your user like this:

switch Serde.Json.parse(deserializeMyUser, "some json here") {
| Ok(user) => /* here's a valid user */
| Error(reason) => /* here's some parsing errors */
}

Some of the advantages it has:

it parses incrementally, so you only really read the JSON you care about
it’ll support many type-level/field-level annotations to configure it (like defaults, renaming fields, etc)
can be used to derive ser/de from formats like FormData, QueryParams, db representation, logging, etc
formats can be swapped, and you can even read from one format and write to another format

That’s it. If this isn’t a thing you need, that’s fine, just thought I’d ask before porting it

Have a good weekend

hellos3b · September 1, 2023, 8:16pm

I always love the idea of auto creating decoders from types, though I find intellisense in VSC pretty lacking for PPX’s so I prefer writing them by hand.

I didn’t take a deep dive into serde’s features, but I’d like to mention there are two rescript libraries that implement the same idea:

GitHub - green-labs/ppx_spice: ReScript PPX which generates the JSON (de)serializers
GitHub - reasonml-labs/decco: Bucklescript PPX which generates JSON (de)serializers for user-defined types

ostera · September 1, 2023, 9:04pm

The core differences are that spice and decco generate explicit JSON serializer/deserializer functions that operate over a Js.Json.t value (IIRC), whereas serde uses an intermediate representation that can drive a format serializer/deserializer, and operates over a lexer of your choosing.

In other words, you can do @deriving(serializer, deserializer) and then use Serde_json, Serde_xml, Serde_sexpr, or even write your own format and plug it in, without having to modify your original code.

And it doesn’t parse the entire into JSON before trying to validate the shape of the Js.Json.t, but rather parses only exactly what you want it to, and ignores the rest.

For payloads coming from a trusted API, a format could be written to use type-versioning and allow the decoding to be zero-cost too, without losing safety.

yawaramin · September 2, 2023, 4:13am

For payloads coming from a trusted API, there would be no need to spend cycles decoding the JSON at all, it could just be cast into the expected shape and used directly

tsnobip · September 2, 2023, 7:44am

I think all those points make it a quite appealing solution for rescript, I think it’d definitely be useful for the community. Go for it!

glennsl · September 2, 2023, 1:51pm

A notable downside to be aware of with format-agnostic serialization frameworks is that they can’t embed “raw” values of the intended target format. If you have a data structure that you want to embed an arbitrary JSON value in, for example, there is no format-agnostic way of doing that.

Obviously, you could say, a format-agnostic framework is not the right tool for that job. But I’d argue that it very rarely is. I think it’s usually chosen to have the flexibility and freedom to change format if the need arises, but that comes at the cost of less flexibility and freedom in what you can represent. That need is usually much more likely to change I think.

Further, if, as in Rust, the entire serialization ecosystem is based on a format-agnostic framework, then if this need does arise you’re pretty screwed. I’ve been bitten by this in Rust before, and the only option I had then was to write my own serializer and deserializer from scratch.

Not trying to knock serde.ml or anything, it does what it does and luckily there are many more options with rescript. But beware of that trade-off when considering whether it’s the right tool to use.

ostera · September 2, 2023, 2:53pm

Could you give me an example?

Rust’s serde lets you use custom (de)serializers per type (or even per variant or field), which means you can for example: URL encode/decode strings, turn binary data into base64 and back, compact arrays, etc.

And so far I’ve used serde with s-expressions, TOML, JSON, and even compact binary serialization formats that I didn’t own (for ex. Erlang’s VM bytecode).

So it always seemed pretty flexible to me, but I understand it’s not a silver-bullet. I’d love to see your use-case/example to see the shortcomings.

Also just to clarify serde.ml can’t do all these things yet, but in principle it can be extended to do them.

glennsl · September 2, 2023, 3:40pm

I think the specific problem we had was with (de)serializing to/from wasm_bindgen::JsValue and embedding a JsValue such as a Blob. Binary data can of course be serialized as a base64-encoded string, but only if you also control the receiving end. Which we didn’t. Arbitrary native values or opaque values that carry some internal state such as ObjectUrl would be even harder to handle. Whereas in a format-specific (de)serializer these would all be trivial no-ops.