How Do I Remove Almost-duplicate Integers From List?

July 26, 2023 Post a Comment

I'm parsing some PDFs in Python. These PDFs are visually organized into rows and columns. The pdftohtml script converts these PDFs to an XML format, full of loose tags

Solution 1:

Sort the list to put the close values next to one another
Use reduce to filter the value depending on the previous value

Code:

>>>tops = [925, 946, 966, 995, 996, 1015, 1035]>>>threshold = 2>>>reduce(lambda x, y: x + [y] iflen(x) == 0or y > x[-1] + threshold else x, sorted(tops), [])
[925, 946, 966, 995, 1015, 1035]

With several contiguous values:

>>>tops = range(10)>>>reduce(lambda x, y: x + [y] iflen(x) == 0or y > x[-1] + threshold else x, sorted(tops), [])
[0, 3, 6, 9]

Edit

Reduce can be a little cumbersome to read, so here is a more straightforward approach:

Baca Juga

res = []
for item in sorted(tops):
    iflen(res) == 0 or item > res[-1] + threshold:
        res.append(item)

Solution 2:

@njzk2's answer works too, but this function actually shows what is going on and is easier to understand:

>>>defsort(list):...list.sort() #sorts in ascending order...    x = range(0, len(list), 1) #gets range...    x.reverse() #reverses...for k in x:...iflist[k]-1 == list[k-1]: #if the list value -1 is equal to the next,...del(list[k-1])     #remove it...returnlist#return...>>>tops = [925, 946, 966, 995, 996, 1015, 1035]>>>sort(tops)
[925, 946, 966, 996, 1015, 1035]
>>>

Python Developer

How Do I Remove Almost-duplicate Integers From List?

Solution 1:

Edit

Solution 2:

Post a Comment for "How Do I Remove Almost-duplicate Integers From List?"