Quicker Way To Perform Fuzzy String Match In Pandas
Is there any way to speed up the fuzzy string match using fuzzywuzzy in pandas. I have a dataframe as extra_names which has names that I want to run fuzzy matches for with another
Solution 1:
Let's try difflib:
import difflib
from functools import partial
f = partial(
    difflib.get_close_matches, possibilities=names_df['names'].tolist(), n=1)
matches = extra_names['not_matching'].map(f).str[0].fillna('')
scores = [
    difflib.SequenceMatcher(None, x, y).ratio() 
    for x, y in zip(matches, extra_names['not_matching'])
]
extra_names.assign(best=matches, score=scores)
       not_matching               best     score
0         Vij Sales        Vijay Sales  0.900000
1  Crom Electronics  Croma Electronics  0.969697
2       REL Digital   Reliance Digital  0.666667
3        Bajaj Elec  Bajaj Electronics  0.740741
4     Reliance Digi   Reliance Digital  0.896552
Post a Comment for "Quicker Way To Perform Fuzzy String Match In Pandas"