Get all non-unique values (i.e.: duplicate/more than one occurrence) in an array

javascript

prompt-engineering

performance

best-practices

byAnton Shumikhin·Oct 5, 2024

A capacity to identify duplicate values in an array is often essential. You can achieve this simply by using filter coupled with indexOf to locate the first occurrences of every value. Here's the basic command:

const duplicates = [1, 2, 2, 3, 1].filter((e, i, a) => a.indexOf(e) !== i);
// duplicates => [2, 1]

In this case, filter moves through the array (a) using each element (e) and its position (i). If indexOf(e)—the index of the element's first appearance—differs from i, it indicates a duplicate.

Detailed analysis and optimal solutions

The fast answer provided before stands perfectly fine when dealing with relatively smaller arrays. However, for more extensive arrays, it might not be the most efficient solution due to its O(N^2) complexity. For such parcels of data, we implement more performance-efficient solutions.

Optimization through sorting

A sort function organizes the array, positioning duplicate values side by side, simplifying the comparison process. Here's how you can achieve this:

function findDuplicates(data) {
  // Cloning the array to keep the original data intact (changed clone? I prefer replicas 😄)
  let sorted = data.slice().sort((a, b) => a - b);
  let results = [];
  for (let i = 1; i < sorted.length; i++) {
    if (sorted[i - 1] === sorted[i]) {
      results.push(sorted[i]);
    }
  }
  // Presto! No more repeated duplicates, thanks to our friend Set 🎉
  return [...new Set(results)]; 
}

Utilizing built-in features:`Set` and `filter`

We can leverage the elegant solution that Set object and filter() provide, especially when you prefer a one-liner solution:

const findUniqueDuplicates = arr => arr.filter((e, i, a) => a.indexOf(e) !== i && a.lastIndexOf(e) === i);

This function neatly filters out unique duplicates—values that occur more than once but get listed only one time within the final output.

Dealing with large-scale sets

When handling bulky data sets, it's crucial to consider both time and space complexity of your solution. Here's where you can grasp a more effective strategy:

Choice of algorithm

Pick an algorithm that minimizes redundancy and unnecessary operations. Keep in mind, simple isn't always optimal—a quadratic complexity (O(N^2)) can be suboptimal for larger arrays. When performance is critical, consider the sort and Set based solutions discussed above.

Handle all data types

JavaScript is type-agnostic with arrays, meaning they can store elements of diverse types. Most sorting functions predominantly act on numbers and strings, so you might need a custom sorting function for objects and arrays.

Browser compatibility

Consider the envisioned browser your code will run on. Array.from() and the Set object would not run on older browsers like IE 11. If browser support is a concern, implement a polyfill, or use more compatible methods.

Unique once, duplicate many: Dealing with unique duplicates

Often, we want each duplicate to be represented once in the returned array. Achieving this involves another layer of understanding:

Exploiting `Set` for unique duplicates

The Set object only stores unique values. After locating duplicates, we can apply a Set to ensure each one is listed once:

const uniqueDuplicates = Array.from(new Set(duplicates));

Each duplicate value makes a solo appearance, leaving a clean, simplified list of non-uniques.

The dynamic duo: `reduce` and `indexOf`

Coupling the reduce method with indexOf serves as another way to accumulate duplicates without repetition:

const uniqDupsReducer = (acc, e, i, array) => {
  // Did we time-travel? Nope! just checking if our value existed 'previously' 😉
  if (array.indexOf(e) !== i && acc.indexOf(e) === -1) acc.push(e);
  return acc;
};

const uniqueDups = [1, 2, 2, 3, 1].reduce(uniqDupsReducer, []);