Skip to content Skip to sidebar Skip to footer

Quicker Way To Perform Fuzzy String Match In Pandas

Is there any way to speed up the fuzzy string match using fuzzywuzzy in pandas. I have a dataframe as extra_names which has names that I want to run fuzzy matches for with another

Solution 1:

Let's try difflib:

import difflib
from functools import partial

f = partial(
    difflib.get_close_matches, possibilities=names_df['names'].tolist(), n=1)

matches = extra_names['not_matching'].map(f).str[0].fillna('')
scores = [
    difflib.SequenceMatcher(None, x, y).ratio() 
    for x, y in zip(matches, extra_names['not_matching'])
]

extra_names.assign(best=matches, score=scores)

       not_matching               best     score
0         Vij Sales        Vijay Sales  0.900000
1  Crom Electronics  Croma Electronics  0.969697
2       REL Digital   Reliance Digital  0.666667
3        Bajaj Elec  Bajaj Electronics  0.740741
4     Reliance Digi   Reliance Digital  0.896552

Post a Comment for "Quicker Way To Perform Fuzzy String Match In Pandas"