[TIL] Size of blob of data in Array vs List

jorge · February 18, 2021, 11:15am

Today I learned:

In my web app (next.js) I’m have a file ShippingPricing.res which contains a list of cities with the prices to ship a package of 5 different sizes.

example:

let destinations = [
//... more destinations
{
    city: `CDMX`,
    cityId: "0000000766",
    pricing: {
      p500: list{1000., 2000., 3000., 4000., 5000.},
      p1000: list{1500., 2500., 3500., 4500., 5500.},
    },
  },
]

I used to have destinations be a list instead, until I realized that the size of the .bs.js

289 KB using list{}
53 KB using array

In my local dev it doesn’t matter, but it is a big payload to send to the client

So I conclude that I should use arrays for hardcoding data instead of lists.

Maxim · February 18, 2021, 12:13pm

Exactly, arrays are heavily optimised by JS engines and are very easily processed on modern cpus. A single-linked list is not “bad”, it has its use cases. Just don’t use it if you need to store bulk data. Like all the enemies in your game, a list of all cities, all animations in your app etc. Linked lists are very expensive (memory) for this use case.

jacobp100 · February 18, 2021, 3:08pm

Lists are more for intermediate computations where you don’t know ahead of time how big they’ll be - not for storing data. You should definitely use arrays here!

joakin · February 18, 2021, 7:21pm

You may want to dig into things a bit more, I tried those literals here: /try demo with 22 items in the list

And put the code through terser separating the two variables into different files:

Top level list: 4257 bytes
Top level array: 4080 bytes

Lists compile to literal objects with hd and tl fields, which is only a few more bytes than an array literal. There is no reason for the size difference being so huge in this specific isolated example, so there may be other reasons why this is happening.

There are valid reasons for using lists, they are immutable for one, so they are a great guarantee if your list of destinations is constant and you don’t want any code modifying it at runtime.

jorge · February 18, 2021, 8:32pm

@joakin I just copy pasted the outputs of the examples in your demo into separate .js files and inspected with MacOS finder.

array 14KB
list 33 KB

how are you measuring the bytes?

joakin · February 18, 2021, 8:46pm

That is correct, whitespace takes bytes and the list version indents code more in the output so it will have more whitespace in the JS output. That is why I “put the code through terser” in my comparison.

Sorry if it wasn’t clear, terser is a JS minifier (successor to uglify) which you should always use for your production output combined with compression like gzip. See https://try.terser.org/

jorge · February 18, 2021, 9:41pm

ahh, I totally missed to check the output after minification, good point!

Hongbo · February 19, 2021, 1:20am

A missing advantage of using array instead of list for cross wire communication is that deserialisation is much faster, i.e, JSON.parse handles array much faster than nested objects .

There was a bug of JSON.parse in V8 which is still not fixed. https://bugs.chromium.org/p/v8/issues/detail?id=10595&q=&can=4

jorge · February 19, 2021, 7:35pm

do you know which is faster?
For searching through ~1000 items I wonder which would be more performant

spyder · February 20, 2021, 12:21pm

Searching arrays is always faster, because the data is usually more lined up (and V8 is optimised for it). Lists are objects with pointers to other objects.

The speed advantage of reading arrays far outweighs the cost of creating them (which happens a lot when you treat them as immutable).

I still use list, but only when my recursive algorithm simply can’t be expressed as an Array (which isn’t often anymore)

joakin · February 26, 2021, 5:32pm

Generally speaking, an array of a 1000 items is tiny for modern computers and I don’t think it will be a problem with either version.

Performance is always dependent on how often the code is called and how much work there is, and making blind assumptions is usually cargo culting and a waste of time. Always performance test your code with real workloads under real conditions.

For example, here is looking up 1000 ids in arrays and lists 1000 times in my computer running on node.js:

Array access: 13.933ms
List access: 9.716ms
Benchmark code on rescript-lang.org/try

Pretty fast no? I stand by what I said “Always performance test your code with real workloads under real conditions.”, with an extra of “JITs are amazing and very hard to understand”.

austindd · February 26, 2021, 7:57pm

Just FYI for those who want to run benchmark tests within Node.js…

I recently released a relatively complete bindings library for Benchmark.js. It’s called rescript-benchmarkjs, and you can find it here:

It does add a few helper functions around parts of the Benchmark.js API, mostly for type safety and ergonomics, but I was very careful to make sure that nothing interferes with the code under test.

I encourage people to use it for empirical performance tests like this. And obviously if you uncover any issues with the library, just let me know.

ryyppy · February 27, 2021, 10:20am

Ah yes. The common UI scenario where I am manipulating a 1000 item list in my React app.

Maxim · February 27, 2021, 6:08pm

Those numbers are super weird.

If you just walk the array linearly with this code:

Array access: 5.987ms
List access: 11.15ms

joakin · March 1, 2021, 5:29pm

To be honest micro-benchmarks like this on a JIT don’t always make sense.

For example, I tried to use Array.getByU thinking that uncurrying the lambda would maybe make it faster, and here is the results I got on my machine with node v14.15.4:

Array access: 14.571ms
Array access uncurried: 15.069ms
List access: 11.186ms