Immutable array implementation?

mweichert · December 7, 2020, 3:52pm

Hi there,

Is there something that offers the performance of an array but the immutable guarantee of a list?

Thanks,
Mike

spyder · December 7, 2020, 8:14pm

Here is a simple implementation using opaque types. There are more methods that could be added but this covers the basics.

The interface enforces that methods returning data use the opaque type, while internally methods that operate on potentially shared data copy it first.

yawaramin · December 8, 2020, 5:32am

That’s tricky because how should that data structure behave? E.g. if you want performance ‘similar’ to array O(1) for getting and setting a value at an index, and you also want this to be an immutable data structure, then you can’t just e.g. copy the whole array whenever you want to set an element at an index. That’s where more sophisticated data structures come in.

The closest we have to that in Belt, is probably Belt.Map.Int. You can think of a map from ints to values as conceptually like an array. And since Belt.Map data structures are immutable and persistent, they give you the immutable guarantee as well. E.g.

module Arr = Belt.Map.Int

let () = {
  let arr = Arr.fromArray([(0, "a"), (1, "b"), (2, "c")])
  let arr2 = Arr.set(arr, 1, "b2")
  
  Arr.forEach(arr2, (idx, v) =>
    Printf.printf("idx=%d, v=%s", idx, v))
}

The only catch is, as you can see, when constructing the map you have to manage the indices manually. But it should be easy to write a wrapper function for that.

austindd · December 8, 2020, 6:49am

I’ve been working on porting Elm’s immutable Array data structure to ReScript. It’s based on the “relaxed radix balanced tree” (RRB tree, also called an “immutable vector”). Most of the constant-time operations of a mutable array can be done in “effectively constant time” with RRB trees. Plus it has fast append and iteration operations.

Here is a paper on RRB trees: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.592.5377
Here is the Elm implementation: https://github.com/elm/core/blob/1.0.5/src/Array.elm

My code is a bit rough, but so far I have ported operations like get, set, push. Most of the internal tree operations are implemented, so the rest of it is mostly just the exposed API. When it gets closer to being done, I will publish it on NPM, but it needs a lot more testing for me to feel confident using it for anything serious. If it interests anyone here, I can push it to GitHub for people to view.

austindd · December 8, 2020, 5:25pm

For what it’s worth, Elm’s immutable arrays are probably better-suited to large collections. It’s fine for small collections too, but it has a relatively high constant factor for each operation (multiple object allocations & function calls). So, aside from painless immutability, the performance doesn’t really pay off until you have a larger number of elements.

For small collections, I would recommend something like @spyder’s solution above. It will likely be more efficient.

One thing I might add to @spyder’s solution is a batchUpdate function that passes a mutable copy of the immutable array into a user-defined callback, along with a commit function that casts the return value to the immutable version. The callback is then allowed to mutate the array and “commit” the changes. This would allow us to perform a series of mutations in a local context, without violating immutability anywhere else.

E.g.

      external castToImmutable: array<'a> => t<'a> = "%identity"
      let commitChanges: (. array<'a>) => t<'a> = (. mutArr) =>
        castToImmutable(mutArr)

      let batchUpdate = (immArr: t<'a>, callback) =>
        callback(Js.Array2.sliceFrom(immArr, 0), commitChanges)

The castToImmutable and commitChanges functions would be hidden by the module signature. The batchUpdate function would then be used like this:

let intCompare = (a, b) =>
  if a < b {
    -1
  } else if a > b {
    1
  } else {
    0
  }

let newArray = batchUpdate(immutable, (mutArr, commit) => {
  open Belt
  let acc = ref(0)
  for i in 0 to Array.length(mutArr) - 1 {
    let elt = mutArr->Array.getUnsafe(i)
    acc := acc.contents + elt - mod(elt, 10)
    mutArr->Array.setUnsafe(i, acc.contents)
  }
  SortArray.stableSortInPlaceBy(mutArr, intCompare)
  commit(. mutArr)
})

This allows the user to perform mutations on the copied array for performance reasons. It does involve 2 extra function calls, but it will likely be more efficient than a long chain of array copies for each update.

Here is an updated version of @spyder’s implementation which includes this function.

yawaramin · December 8, 2020, 7:24pm

That sounds like https://github.com/immerjs/immer , I’m not sure if someone did it already but a binding to that would be useful.

tom-sherman · December 9, 2020, 9:17am

You could look at adding bindings to the Record/Tuple proposal polyfill: https://github.com/bloomberg/record-tuple-polyfill

ryyppy · December 9, 2020, 11:17am

getting bad vibes from a time were we bought into immutable-js … really scary stuff

austindd · December 9, 2020, 3:54pm

Immutable.js had some issues of its own, but I think a lot of what made it bad for many users was just the tendency to try to use those immutable data structures without thinking about time/space complexity (a problem we still have with list sometimes), along with the lack of static typing to help make sense of it.

But an immutable array is a pretty nice thing to have. You get more peace of mind about its state, and some operations (like concatenation) can even be faster than with the mutable version.

That said, I still think most people don’t really need to use persistent immutable arrays, not least because they don’t interop with JS libraries that use standard arrays. Although, the port I’m working on has clean conversion functions to & from the array and list types in Reason.

spyder · December 10, 2020, 10:23am

It doesn’t take much, fundamentally it’s a single binding

[@module "immer"] external produce: ('a, 'a => 'a) => 'a = "produce";

Everything beyond that is nice to have optimisations that I haven’t found a use for yet.