How Do I Remove Almost-duplicate Integers From List?
I'm parsing some PDFs in Python. These PDFs are visually organized into rows and columns. The pdftohtml script converts these PDFs to an XML format, full of loose tags
Solution 1:
- Sort the list to put the close values next to one another
- Use
reduce
to filter the value depending on the previous value
Code:
>>>tops = [925, 946, 966, 995, 996, 1015, 1035]>>>threshold = 2>>>reduce(lambda x, y: x + [y] iflen(x) == 0or y > x[-1] + threshold else x, sorted(tops), [])
[925, 946, 966, 995, 1015, 1035]
With several contiguous values:
>>>tops = range(10)>>>reduce(lambda x, y: x + [y] iflen(x) == 0or y > x[-1] + threshold else x, sorted(tops), [])
[0, 3, 6, 9]
Edit
Reduce can be a little cumbersome to read, so here is a more straightforward approach:
res = []
for item in sorted(tops):
iflen(res) == 0 or item > res[-1] + threshold:
res.append(item)
Solution 2:
@njzk2's answer works too, but this function actually shows what is going on and is easier to understand:
>>>defsort(list):...list.sort() #sorts in ascending order... x = range(0, len(list), 1) #gets range... x.reverse() #reverses...for k in x:...iflist[k]-1 == list[k-1]: #if the list value -1 is equal to the next,...del(list[k-1]) #remove it...returnlist#return...>>>tops = [925, 946, 966, 995, 996, 1015, 1035]>>>sort(tops)
[925, 946, 966, 996, 1015, 1035]
>>>
Post a Comment for "How Do I Remove Almost-duplicate Integers From List?"