List Intersection Algorithm Implementation Only Using Python Lists (not Sets)
Solution 1:
Here is a faster solution:
defintersect_sorted(a1, a2):
"""Yields the intersection of sorted lists a1 and a2, without deduplication.
Execution time is O(min(lo + hi, lo * log(hi))), where lo == min(len(a1),
len(a2)) and hi == max(len(a1), len(a2)). It can be faster depending on
the data.
"""import bisect, math
s1, s2 = len(a1), len(a2)
i1 = i2 = 0if s1 and s1 + s2 > min(s1, s2) * math.log(max(s1, s2)) * 1.4426950408889634:
bi = bisect.bisect_left
while i1 < s1 and i2 < s2:
v1, v2 = a1[i1], a2[i2]
if v1 == v2:
yield v1
i1 += 1
i2 += 1elif v1 < v2:
i1 = bi(a1, v2, i1)
else:
i2 = bi(a2, v1, i2)
else: # The linear solution is faster.while i1 < s1 and i2 < s2:
v1, v2 = a1[i1], a2[i2]
if v1 == v2:
yield v1
i1 += 1
i2 += 1elif v1 < v2:
i1 += 1else:
i2 += 1
It runs in O(min(n + m, n * log(m)))
time where n is the minimum of the lengths and m is the maximum. It iterates over both lists at the same time, skipping as many elements in the beginning as possible.
An analysis is available here: http://ptspts.blogspot.ch/2015/11/how-to-compute-intersection-of-two.html
Solution 2:
Anything that iterates through L1
, iterating through all of L2
each time, will take quadratic time. The only way to improve that is to avoid iterating through all of L2
. (There's a similar issue removing duplicates from L
at the end.)
If you use a set
for L2
(and for L
), of course each in L2
step is constant time, so the overall algorithm is linear. And you can always build your own hash table implementation instead of using set
. But that's a lot of work.
With a binary search tree, or even just a sorted list and a binary_find
function, you can do it in O(N log N). And that binary_find
is much easier to write yourself. So:
S2 = sorted(L2)
L = [element for element in L1 if binary_find(element, S2)]
S = remove_adjacent(sorted(L))
Or, even more simply, sort L1 too, and then you don't need remove_adjacent
:
S1, S2 = sorted(L1), sorted(L2)
L = []
for element in S1:
if binary_find(element, S2) and (not L or L[-1] != element):
L.append(element)
Either way, this is O(N log N), where N is the length of the longer list. By comparison, the original is O(N^2), and the other answers are O(N^3). Of course it's a bit more complicated, but it's still pretty easy to understand.
You need to write the binary_find
(and, if applicable, remove_adjacent
), because I assume you don't want to use stuff out of the stdlib if you don't even want to use extra builtins. But that's really easy. For example:
def binary_find(element, seq):
low, high = 0, len(seq),
while low != high:
mid = (low + high) // 2if seq[mid] == element:
return True
elif seq[mid] < element:
low = mid+1else:
high = mid
return False
def remove_adjacent(seq):
ret = []
last = object()
for element in seq:
if element != last:
ret.append(element)
last = element
return ret
If you don't even want to use sorted
or list.sort
, you can write your own sort pretty easily too.
Solution 3:
How about:
- Iterate though L1
- Iterate though L2
- If (in L1 and L2) and not in L -> add to L
Not particularly efficient, but in code it would look something like this (with repetitions to make the point):
>>>L1 = [1,2,3,3,4]>>>L2 = [2,3,4,4,5]>>>L = list()>>>for v1 in L1:
for v2 in L2:
if v1 == v2 and v1 not in L:
L.append(v1)
>>>L
[2,3,4]
You avoid deleting from L1 and L2 simply by checking if the element is already in L and adding to L if it is not. Then it doesn't matter if there are repetitions in L1 and L2.
Solution 4:
EDIT: I read the title wrong, and skimmed over the builtins part. I'm gonna leave it here anyway, might help someone else.
You can acheive this using the set
type.
>>>a = [1,2,3,4]>>>b = [3,4,5,6]>>>c = list(set(a) & set(b))>>>c
[3, 4]
Solution 5:
- Make a temporary list.
- Iterate through one of the two lists. It doesn't matter which one.
- For every element, check to see if that element exists in the other list (
if element in list2
) and isn't already in your temporary list (same syntax) - If both conditions are true, append it to your temporary list.
I feel bad for posting the solution, but it's honestly more readable than my text:
def intersection(l1, l2):
temp = []
for item in l1:
if item in l2 and item not in temp:
temp.append(item)
return temp
Post a Comment for "List Intersection Algorithm Implementation Only Using Python Lists (not Sets)"