List Intersection Algorithm Implementation Only Using Python Lists (not Sets)

January 04, 2024 Post a Comment

I've been trying to write down a list intersection algorithm in python that takes care of repetitions. I'm a newbie to python and programming so forgive me if this sounds inefficie

Solution 1:

Here is a faster solution:

defintersect_sorted(a1, a2):
  """Yields the intersection of sorted lists a1 and a2, without deduplication.

  Execution time is O(min(lo + hi, lo * log(hi))), where lo == min(len(a1),
  len(a2)) and hi == max(len(a1), len(a2)). It can be faster depending on
  the data.
  """import bisect, math
  s1, s2 = len(a1), len(a2)
  i1 = i2 = 0if s1 and s1 + s2 > min(s1, s2) * math.log(max(s1, s2)) * 1.4426950408889634:
    bi = bisect.bisect_left
    while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1elif v1 < v2:
        i1 = bi(a1, v2, i1)
      else:
        i2 = bi(a2, v1, i2)
  else:  # The linear solution is faster.while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1elif v1 < v2:
        i1 += 1else:
        i2 += 1

It runs in O(min(n + m, n * log(m))) time where n is the minimum of the lengths and m is the maximum. It iterates over both lists at the same time, skipping as many elements in the beginning as possible.

An analysis is available here: http://ptspts.blogspot.ch/2015/11/how-to-compute-intersection-of-two.html

Solution 2:

Anything that iterates through L1, iterating through all of L2 each time, will take quadratic time. The only way to improve that is to avoid iterating through all of L2. (There's a similar issue removing duplicates from L at the end.)

If you use a set for L2 (and for L), of course each in L2 step is constant time, so the overall algorithm is linear. And you can always build your own hash table implementation instead of using set. But that's a lot of work.

With a binary search tree, or even just a sorted list and a binary_find function, you can do it in O(N log N). And that binary_find is much easier to write yourself. So:

S2 = sorted(L2)
L = [element for element in L1 if binary_find(element, S2)]
S = remove_adjacent(sorted(L))

Or, even more simply, sort L1 too, and then you don't need remove_adjacent:

S1, S2 = sorted(L1), sorted(L2)
L = []
for element in S1:
    if binary_find(element, S2) and (not L or L[-1] != element):
        L.append(element)

Either way, this is O(N log N), where N is the length of the longer list. By comparison, the original is O(N^2), and the other answers are O(N^3). Of course it's a bit more complicated, but it's still pretty easy to understand.

You need to write the binary_find (and, if applicable, remove_adjacent), because I assume you don't want to use stuff out of the stdlib if you don't even want to use extra builtins. But that's really easy. For example:

Baca Juga

def binary_find(element, seq):
    low, high = 0, len(seq), 
    while low != high:
        mid = (low + high) // 2if seq[mid] == element:
            return True
        elif seq[mid] < element:
            low = mid+1else:
            high = mid
    return False

def remove_adjacent(seq):
    ret = []
    last = object()
    for element in seq:
        if element != last:
            ret.append(element)
        last = element
    return ret

If you don't even want to use sorted or list.sort, you can write your own sort pretty easily too.

Solution 3:

How about:

Iterate though L1
Iterate though L2
If (in L1 and L2) and not in L -> add to L

Not particularly efficient, but in code it would look something like this (with repetitions to make the point):

>>>L1 = [1,2,3,3,4]>>>L2 = [2,3,4,4,5]>>>L = list()>>>for v1 in L1:
        for v2 in L2:
            if v1 == v2 and v1 not in L:
                L.append(v1)
>>>L
[2,3,4]

You avoid deleting from L1 and L2 simply by checking if the element is already in L and adding to L if it is not. Then it doesn't matter if there are repetitions in L1 and L2.

Solution 4:

EDIT: I read the title wrong, and skimmed over the builtins part. I'm gonna leave it here anyway, might help someone else.

You can acheive this using the set type.

>>>a = [1,2,3,4]>>>b = [3,4,5,6]>>>c = list(set(a) & set(b))>>>c
[3, 4]

Solution 5:

Make a temporary list.
Iterate through one of the two lists. It doesn't matter which one.
For every element, check to see if that element exists in the other list (if element in list2) and isn't already in your temporary list (same syntax)
If both conditions are true, append it to your temporary list.

I feel bad for posting the solution, but it's honestly more readable than my text:

def intersection(l1, l2):
    temp = []

    for item in l1:
        if item in l2 and item not in temp:
            temp.append(item)

    return temp

Python Developer