This post finishes explaining the solutions that were given in the comments of the “two coding problems” post. As the first problem was already discussed, this post will only be about the second one.

##### Basic solution

The basic solution is similar to merging sorted arrays, though discarding the elements that aren’t present in both arrays:

def basic_intersect(a, b):
m = len(a)
n = len(b)
i = j = 0
ret = []
while i < m and j < n:
if a[i] == b[j]:
ret.append(a[i])
i += 1
j += 1
elif a[i] < b[j]:
i += 1
elif a[i] > b[j]:
j += 1
return ret

As we walk over all the elements in both arrays, the running time is obviously O(*m*+*n*), where *m* and *n* are the lengths of the two arrays. It’s clear that in certain cases we cannot do better, for example if the two arrays are “almost equal” we need O(*m*+*n*) time just to write the output. But can we do better in other conditions?

##### Guille’s solution

This solution consists of running through all the elements of the smaller array, doing a binary search for each element in the bigger array:

def bin_search_intersect(a, b):
def bin_search(e, a):
# adapted from Python docs
from bisect import bisect_left
i = bisect_left(a, e)
if i != len(a) and a[i] == e:
return i
return None
if len(a) > len(b):
a, b = b, a
return [e for e in a if bin_search(e, b) is not None]

It’s not better than the basic solution in all cases, but its running time is O(*m* log *n*), asymptotically better than O(*m*+*n*) when *n* >> *m*.

##### A better solution?

Demian proposed to switch methods based on the relative size of the arrays, but doing this can be quite tricky as we don’t know the hidden constants in the running times of the algorithms. My idea to solve this problem (that turned out not to be an original one 😀 ) was to avoid doing a binary search in the full range of the bigger array by searching for an upper bound with exponentially growing steps:

def look_fwd_intersect(a, b):
from bisect import bisect_left
def ubound_search(a, m, i, e):
k = 1
while i < m and a[i] <= e:
i += k
k *= 2
if i >= m:
return m if a[m-1] >= e else None
else:
return i
if len(a) > len(b):
a, b = b, a
m = len(a)
n = len(b)
i = j1 = 0
ret = []
while i < m and j1 < n:
j2 = ubound_search(b, n, j1, a[i])
if j2 is None:
break
k = bisect_left(b, a[i], j1, j2)
if k < n and b[k] == a[i]:
ret.append(a[i])
j1 = k + 1
else:
j1 = k
i += 1
return ret

The running time of this solution is more difficult to calculate than those of the previous two. Let’s start by calling the distance we “jump” in the second array for each element of the first array *d*_{i}. Then we have

,

where *m* is the length of the first array and *n* the length of the second one.

The exponential upper bound search and the binary search have both an O(log *d*_{i}) cost, so the total cost will be given by

.

To get a tight bound for this sum, we can use the following inequality (using the AM–GM inequality in the second step):

.

Applying this inequality, but taking into account that there is a fixed cost for each element of the first array, we get

,

giving us a final running time of .

When *m* = O(*n*) , we get the same result as the basic search:

And when we have the other limit, *m* = O(1), we get:

.