List Intersection Algorithm Implementation Only Using Python Lists (not Sets)
Solution 1:
Here is a faster solution:
defintersect_sorted(a1, a2):
"""Yields the intersection of sorted lists a1 and a2, without deduplication.
Execution time is O(min(lo + hi, lo * log(hi))), where lo == min(len(a1),
len(a2)) and hi == max(len(a1), len(a2)). It can be faster depending on
the data.
"""import bisect, math
s1, s2 = len(a1), len(a2)
i1 = i2 = 0if s1 and s1 + s2 > min(s1, s2) * math.log(max(s1, s2)) * 1.4426950408889634:
bi = bisect.bisect_left
while i1 < s1 and i2 < s2:
v1, v2 = a1[i1], a2[i2]
if v1 == v2:
yield v1
i1 += 1
i2 += 1elif v1 < v2:
i1 = bi(a1, v2, i1)
else:
i2 = bi(a2, v1, i2)
else: # The linear solution is faster.while i1 < s1 and i2 < s2:
v1, v2 = a1[i1], a2[i2]
if v1 == v2:
yield v1
i1 += 1
i2 += 1elif v1 < v2:
i1 += 1else:
i2 += 1It runs in O(min(n + m, n * log(m))) time where n is the minimum of the lengths and m is the maximum. It iterates over both lists at the same time, skipping as many elements in the beginning as possible.
An analysis is available here: http://ptspts.blogspot.ch/2015/11/how-to-compute-intersection-of-two.html
Solution 2:
Anything that iterates through L1, iterating through all of L2 each time, will take quadratic time. The only way to improve that is to avoid iterating through all of L2. (There's a similar issue removing duplicates from L at the end.)
If you use a set for L2 (and for L), of course each in L2 step is constant time, so the overall algorithm is linear. And you can always build your own hash table implementation instead of using set. But that's a lot of work.
With a binary search tree, or even just a sorted list and a binary_find function, you can do it in O(N log N). And that binary_find is much easier to write yourself. So:
S2 = sorted(L2)
L = [element for element in L1 if binary_find(element, S2)]
S = remove_adjacent(sorted(L))
Or, even more simply, sort L1 too, and then you don't need remove_adjacent:
S1, S2 = sorted(L1), sorted(L2)
L = []
for element in S1:
if binary_find(element, S2) and (not L or L[-1] != element):
L.append(element)
Either way, this is O(N log N), where N is the length of the longer list. By comparison, the original is O(N^2), and the other answers are O(N^3). Of course it's a bit more complicated, but it's still pretty easy to understand.
You need to write the binary_find (and, if applicable, remove_adjacent), because I assume you don't want to use stuff out of the stdlib if you don't even want to use extra builtins. But that's really easy. For example:
def binary_find(element, seq):
low, high = 0, len(seq),
while low != high:
mid = (low + high) // 2if seq[mid] == element:
return True
elif seq[mid] < element:
low = mid+1else:
high = mid
return False
def remove_adjacent(seq):
ret = []
last = object()
for element in seq:
if element != last:
ret.append(element)
last = element
return ret
If you don't even want to use sorted or list.sort, you can write your own sort pretty easily too.
Solution 3:
How about:
- Iterate though L1
- Iterate though L2
- If (in L1 and L2) and not in L -> add to L
Not particularly efficient, but in code it would look something like this (with repetitions to make the point):
>>>L1 = [1,2,3,3,4]>>>L2 = [2,3,4,4,5]>>>L = list()>>>for v1 in L1:
for v2 in L2:
if v1 == v2 and v1 not in L:
L.append(v1)
>>>L
[2,3,4]
You avoid deleting from L1 and L2 simply by checking if the element is already in L and adding to L if it is not. Then it doesn't matter if there are repetitions in L1 and L2.
Solution 4:
EDIT: I read the title wrong, and skimmed over the builtins part. I'm gonna leave it here anyway, might help someone else.
You can acheive this using the set type.
>>>a = [1,2,3,4]>>>b = [3,4,5,6]>>>c = list(set(a) & set(b))>>>c
[3, 4]
Solution 5:
- Make a temporary list.
- Iterate through one of the two lists. It doesn't matter which one.
- For every element, check to see if that element exists in the other list (
if element in list2) and isn't already in your temporary list (same syntax) - If both conditions are true, append it to your temporary list.
I feel bad for posting the solution, but it's honestly more readable than my text:
def intersection(l1, l2):
temp = []
for item in l1:
if item in l2 and item not in temp:
temp.append(item)
return temp
Post a Comment for "List Intersection Algorithm Implementation Only Using Python Lists (not Sets)"