Problem

Design a data structure that efficiently finds the majority element of a given subarray.

The majority element of a subarray is an element that occurs threshold times or more in the subarray.

Implementing the MajorityChecker class:

  • MajorityChecker(int[] arr) Initializes the instance of the class with the given array arr.
  • int query(int left, int right, int threshold) returns the element in the subarray arr[left...right] that occurs at least threshold times, or -1 if no such element exists.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
**Input**
["MajorityChecker", "query", "query", "query"]
[[[1, 1, 2, 2, 1, 1]], [0, 5, 4], [0, 3, 3], [2, 3, 2]]
**Output**
[null, 1, -1, 2]

**Explanation**
MajorityChecker majorityChecker = new MajorityChecker([1, 1, 2, 2, 1, 1]);
majorityChecker.query(0, 5, 4); // return 1
majorityChecker.query(0, 3, 3); // return -1
majorityChecker.query(2, 3, 2); // return 2

Solution

To solve the problem of finding the majority element efficiently within a given subarray, we need a combination of:

  1. Pre-processing: Store information about the frequency of elements in the array.
  2. Querying: Use the pre-processed data to efficiently fetch the result for a subarray.

Approach

  1. Pre-process the array:

    • Use a map or dictionary to store the indices of each distinct element. For example, given the array [1, 1, 2, 2, 1, 1], store the positions of 1 as [0, 1, 4, 5], and 2 as [2, 3].
    • This allows efficient querying of the positions where a specific element appears.
  2. Process the query:

    • For a given subarray range (leftright), check the frequencies of the potential majority elements.
    • Use binary search (via bisect) to count how many times the element appears within the range.
    • If its frequency is greater than or equal to threshold, return the element; otherwise, return -1.

The final structure allows for efficient querying by leveraging the pre-processed frequency data.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class MajorityChecker {
    private Map<Integer, List<Integer>> posMap;

    public MajorityChecker(int[] arr) {
        posMap = new HashMap<>();
        for (int i = 0; i < arr.length; i++) {
            posMap.computeIfAbsent(arr[i], k -> new ArrayList<>()).add(i);
        }
    }

    public int query(int left, int right, int threshold) {
        int ans = -1;
        for (Map.Entry<Integer, List<Integer>> entry : posMap.entrySet()) {
            int key = entry.getKey();
            List<Integer> posList = entry.getValue();
            // Use binary search to find the count of key in range [left, right]
            int lIdx = binarySearch(posList, left, true);
            int rIdx = binarySearch(posList, right, false);
            int count = rIdx - lIdx + 1;
            if (count >= threshold) {
                ans = key;
                break;
            }
        }
        return ans;
    }

    private int binarySearch(List<Integer> lst, int target, boolean findFirst) {
        int lo = 0, hi = lst.size() - 1, idx = -1;
        while (lo <= hi) {
            int mid = (lo + hi) / 2;
            if (lst.get(mid) < target) {
                lo = mid + 1;
            } else if (lst.get(mid) > target) {
                hi = mid - 1;
            } else {
                idx = mid;
                if (findFirst) {
                    hi = mid - 1;
                } else {
                    lo = mid + 1;
                }
            }
        }
        return findFirst ? lo : hi;
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class MajorityChecker:
    def __init__(self, arr: List[int]):
        self.pos_map = defaultdict(list)
        for i, val in enumerate(arr):
            self.pos_map[val].append(i)

    def query(self, left: int, right: int, threshold: int) -> int:
        ans = -1
        for k, pos_list in self.pos_map.items():
            # Use binary search to find the count of k in range [left, right]
            l_idx = self._binary_search(pos_list, left, True)
            r_idx = self._binary_search(pos_list, right, False)
            count = r_idx - l_idx + 1
            if count >= threshold:
                ans = k
                break
        return ans
    
    def _binary_search(self, lst: List[int], target: int, find_first: bool) -> int:
        lo, hi = 0, len(lst) - 1
        idx = -1
        while lo <= hi:
            mid = (lo + hi) // 2
            if lst[mid] < target:
                lo = mid + 1
            elif lst[mid] > target:
                hi = mid - 1
            else:
                idx = mid
                if find_first:
                    hi = mid - 1
                else:
                    lo = mid + 1
        return lo if find_first else hi

Complexity

  • ⏰ Time complexity: O(n), where n` is the length of the array, to build the frequency map.
    • Query:
      • For each query, checking the frequency using binary search takes O(log(m)), where m is the size of the positions list for an element. This process might repeat for multiple potential majority elements.
      • The worst case involves checking all unique elements, leading to O(u * log(m)), where u is the number of unique elements.
    • Overall, querying is more efficient than a brute force approach.
  • 🧺 Space complexity: O(n) for storing element positions in the dictionary.