I need a parser for my ReScript library to be able to obtain tree-like data structures from string input with parentheses and I am thinking of using parser combinators like in Haskell. Is this a reasonable approach or would the compiled JS-output be too inefficient and drag the performance down compared to conventional parsing methods?
I haven’t found any parser combinator libraries for ReScript yet, but there is a small one for ReasonML (https://github.com/aitoroses/bs-parse) that I have managed to translate to .res files without any problems (I use the Syntax settings in https://rescript-lang.org/try for automatic conversion). Obviously the infix operators are a bit cumbersome to work with since ReScript doesn’t have this feature yet.
Otherwise it works fine and does everything I need, but I am still a beginner in this area so I don’t really know if it is up to current standards (it hasn’t been maintained for 2 years), the code looks pretty solid to me at least. I am learning this stuff as I go and have never worked with parser combinators in a project, but I really like this approach so I want to give it a go if it is reasonable for JS output.
Since I don’t know much of OCaml, I have not touched any of the popular parser combinator libraries there, but I suspect that it would be much more difficult to make them work for ReScript.
Any concrete examples? ReScript itself is pretty fun to write parsers in, so if the grammar is not too complex, maybe not worth pulling in the extra code of a parser combinator library?
Parser combinators are not known for their efficiency / performance (at least when compiled to JS). Highly depends on how you use them, and how many hot paths they hit. They are pretty allocation heavy and pull in a lot of runtime code.
Yeah, it’s also not really recommended to use |> either. With -> alone it’s pretty hard to represent combinator patterns.
Maybe worth having a look at TypeScript / JS based combinator libraries instead? Probably more aligned with the ReScript conventions.
I essentially want to parse a paranthesis notation for a formal logical system which is similar to the Laws of Form (formulas looking like “((a)b)(c)”), so it is not too complex, but has some extensions on top of the base syntax.
Some years ago I started this project in vanilla JS where I wrote (or rather hacked together) a parser when I didn’t even know it was a parser to begin with. It still works surprisingly well, but since now I am rewriting this whole thing in ReScript, I was looking for a more solid solution that is extendable, since I want to be able to add to the syntax in the future.
Maybe I should learn more about how to actually write a solid parser with a more functional approach myself before committing to an external library.
I thought so, which is why I asked. Thanks for clarifying!
Haskell has this nice “do”-sugar for chaining monads, which made these things look really simple, but the syntax gets very ugly and cumbersome when translated to ReScript.
Might take a look at them if I hit a roadblock with my own attempt. Thanks for the suggestions!
Nice! Yeah, with a proper lexer and parser you will also be able to provide meaningful error messages for your language.
In case it helps, a while ago I wrote an ingredients parser for a project of mine. Probably not the cleanest implementation, but maybe it helps getting the ball rolling: Ingredient Parser example · GitHub
I guess the most “functional” part for me here was the usage of variants, pattern matching and recursive functions… nowadays I tend to reach for isolated imperative code for less allocation & more performance, like arrays, loops etc.
If you are interested in a really solid parser implementation, you could also just check out the ReScript syntax source code. The cool thing here is that it’s written in a very straight-forward style that’s easy to grasp for Non-PL-researchers.