[TIL] Size of blob of data in Array vs List

Today I learned:

In my web app (next.js) I’m have a file ShippingPricing.res which contains a list of cities with the prices to ship a package of 5 different sizes.

example:

let destinations = [
//... more destinations
{
    city: `CDMX`,
    cityId: "0000000766",
    pricing: {
      p500: list{1000., 2000., 3000., 4000., 5000.},
      p1000: list{1500., 2500., 3500., 4500., 5500.},
    },
  },
]

I used to have destinations be a list instead, until I realized that the size of the .bs.js

  • 289 KB using list{}
  • 53 KB using array

In my local dev it doesn’t matter, but it is a big payload to send to the client

So I conclude that I should use arrays for hardcoding data instead of lists.

6 Likes

Exactly, arrays are heavily optimised by JS engines and are very easily processed on modern cpus. A single-linked list is not “bad”, it has its use cases. Just don’t use it if you need to store bulk data. Like all the enemies in your game, a list of all cities, all animations in your app etc. Linked lists are very expensive (memory) for this use case.

6 Likes

Lists are more for intermediate computations where you don’t know ahead of time how big they’ll be - not for storing data. You should definitely use arrays here!

You may want to dig into things a bit more, I tried those literals here: /try demo with 22 items in the list

And put the code through terser separating the two variables into different files:

  • Top level list: 4257 bytes
  • Top level array: 4080 bytes

Lists compile to literal objects with hd and tl fields, which is only a few more bytes than an array literal. There is no reason for the size difference being so huge in this specific isolated example, so there may be other reasons why this is happening.

There are valid reasons for using lists, they are immutable for one, so they are a great guarantee if your list of destinations is constant and you don’t want any code modifying it at runtime.

3 Likes

@joakin I just copy pasted the outputs of the examples in your demo into separate .js files and inspected with MacOS finder.

  • array 14KB
  • list 33 KB

how are you measuring the bytes?

That is correct, whitespace takes bytes and the list version indents code more in the output so it will have more whitespace in the JS output. That is why I “put the code through terser” in my comparison.

Sorry if it wasn’t clear, terser is a JS minifier (successor to uglify) which you should always use for your production output combined with compression like gzip. See https://try.terser.org/

2 Likes

ahh, I totally missed to check the output after minification, good point!

A missing advantage of using array instead of list for cross wire communication is that deserialisation is much faster, i.e, JSON.parse handles array much faster than nested objects .

There was a bug of JSON.parse in V8 which is still not fixed. https://bugs.chromium.org/p/v8/issues/detail?id=10595&q=&can=4

5 Likes

do you know which is faster?
For searching through ~1000 items I wonder which would be more performant

Searching arrays is always faster, because the data is usually more lined up (and V8 is optimised for it). Lists are objects with pointers to other objects.

The speed advantage of reading arrays far outweighs the cost of creating them (which happens a lot when you treat them as immutable).

I still use list, but only when my recursive algorithm simply can’t be expressed as an Array (which isn’t often anymore)

3 Likes

Generally speaking, an array of a 1000 items is tiny for modern computers and I don’t think it will be a problem with either version.

Performance is always dependent on how often the code is called and how much work there is, and making blind assumptions is usually cargo culting and a waste of time. Always performance test your code with real workloads under real conditions.

For example, here is looking up 1000 ids in arrays and lists 1000 times in my computer running on node.js:

Pretty fast no? I stand by what I said “Always performance test your code with real workloads under real conditions.”, with an extra of “JITs are amazing and very hard to understand”.

3 Likes

Just FYI for those who want to run benchmark tests within Node.js…

I recently released a relatively complete bindings library for Benchmark.js. It’s called rescript-benchmarkjs, and you can find it here:

It does add a few helper functions around parts of the Benchmark.js API, mostly for type safety and ergonomics, but I was very careful to make sure that nothing interferes with the code under test.

I encourage people to use it for empirical performance tests like this. And obviously if you uncover any issues with the library, just let me know.

3 Likes

Ah yes. The common UI scenario where I am manipulating a 1000 item list in my React app. :joy:

1 Like

Those numbers are super weird.

If you just walk the array linearly with this code:

  • Array access: 5.987ms
  • List access: 11.15ms
1 Like

To be honest micro-benchmarks like this on a JIT don’t always make sense.

For example, I tried to use Array.getByU thinking that uncurrying the lambda would maybe make it faster, and here is the results I got on my machine with node v14.15.4:

  • Array access: 14.571ms
  • Array access uncurried: 15.069ms
  • List access: 11.186ms

:man_shrugging: