Is this efficient / good implementation of group by functionality?

abhishes · March 6, 2023, 7:37am

I have a data structure for employees like this

type employee = {
  id: int, 
  fullName: string, 
  designation: string,
  gender: gender,
  teamName: string
}

Given a array of employees, I want to group them by team into an array of arrays. So [e1, e2, e3, e4] becomes [(“teamA”, [e1, e2]), (“teamB”, [e3, e4])]

Towards this I wrote this code which works but I feel its very clunky and inefficient.

let groupedEmployeeList = 
employeeList 
  -> Belt.Array.reduce(Belt.Map.String.empty, (ge, e) => {
    ge -> Belt.Map.String.update(e.teamName, elOption => {
      switch elOption {
      | Some(el) => Some(el -> Belt.List.add(e))
      | None => Some(list{e})
      }
    })
  })
  -> Belt.Map.String.map(el => el -> Belt.List.toArray)
  -> Belt.Map.String.toArray
Js.Console.log(groupedEmployeeList)

How can I write this more succinctly and efficiently?

sprkv5 · March 6, 2023, 9:31am

You might want to take a look at this thread. I had asked a similar question, but it might not be succinct.

yangdanny97 · March 6, 2023, 11:31am

I think the two main things are you can/should probably use a JS object to store the mapping, and to use arrays as the values instead of lists.

Since you’re not updating/reusing the mapping outside of this assignment the immutable map has some performance penalty and no practical benefit; and since you want arrays in the end using lists just means you need to convert them.

Something like this would be more efficient and less verbose than what you have above.

let groupedEmployeeList = {
  let mapping = Js.Dict.empty()
  Js.Array2.forEach(employeeList, e => {
  	switch mapping->Js.Dict.get(e.teamName) {
      | Some(el) => Js.Array2.push(el, e)->ignore
      | None => mapping->Js.Dict.set(e.teamName, [e])
    }
  })
  mapping->Js.Dict.entries
}

cometkim · March 7, 2023, 7:40pm

A bit off-topic, one other alternative is to model a data structure that handles indexing along with manipulations instead of using a raw array/dict. It can be a bit verbose in the beginning, but it gets better with time.