The basic solution is similar to merging sorted arrays, though discarding the elements that aren’t present in both arrays:
def basic_intersect(a, b): m = len(a) n = len(b) i = j = 0 ret =  while i < m and j < n: if a[i] == b[j]: ret.append(a[i]) i += 1 j += 1 elif a[i] < b[j]: i += 1 elif a[i] > b[j]: j += 1 return ret
As we walk over all the elements in both arrays, the running time is obviously O(m+n), where m and n are the lengths of the two arrays. It’s clear that in certain cases we cannot do better, for example if the two arrays are “almost equal” we need O(m+n) time just to write the output. But can we do better in other conditions?
This solution consists of running through all the elements of the smaller array, doing a binary search for each element in the bigger array:
def bin_search_intersect(a, b): def bin_search(e, a): # adapted from Python docs from bisect import bisect_left i = bisect_left(a, e) if i != len(a) and a[i] == e: return i return None if len(a) > len(b): a, b = b, a return [e for e in a if bin_search(e, b) is not None]
It’s not better than the basic solution in all cases, but its running time is O(m log n), asymptotically better than O(m+n) when n >> m.
A better solution?
Demian proposed to switch methods based on the relative size of the arrays, but doing this can be quite tricky as we don’t know the hidden constants in the running times of the algorithms. My idea to solve this problem (that turned out not to be an original one 😀 ) was to avoid doing a binary search in the full range of the bigger array by searching for an upper bound with exponentially growing steps:
def look_fwd_intersect(a, b): from bisect import bisect_left def ubound_search(a, m, i, e): k = 1 while i < m and a[i] <= e: i += k k *= 2 if i >= m: return m if a[m-1] >= e else None else: return i if len(a) > len(b): a, b = b, a m = len(a) n = len(b) i = j1 = 0 ret =  while i < m and j1 < n: j2 = ubound_search(b, n, j1, a[i]) if j2 is None: break k = bisect_left(b, a[i], j1, j2) if k < n and b[k] == a[i]: ret.append(a[i]) j1 = k + 1 else: j1 = k i += 1 return ret
The running time of this solution is more difficult to calculate than those of the previous two. Let’s start by calling the distance we “jump” in the second array for each element of the first array di. Then we have
where m is the length of the first array and n the length of the second one.
The exponential upper bound search and the binary search have both an O(log di) cost, so the total cost will be given by
To get a tight bound for this sum, we can use the following inequality (using the AM–GM inequality in the second step):
Applying this inequality, but taking into account that there is a fixed cost for each element of the first array, we get
giving us a final running time of .
When m = O(n) , we get the same result as the basic search:
And when we have the other limit, m = O(1), we get: