Quicker Way To Perform Fuzzy String Match In Pandas
Is there any way to speed up the fuzzy string match using fuzzywuzzy in pandas. I have a dataframe as extra_names which has names that I want to run fuzzy matches for with another
Solution 1:
Let's try difflib
:
import difflib
from functools import partial
f = partial(
difflib.get_close_matches, possibilities=names_df['names'].tolist(), n=1)
matches = extra_names['not_matching'].map(f).str[0].fillna('')
scores = [
difflib.SequenceMatcher(None, x, y).ratio()
for x, y in zip(matches, extra_names['not_matching'])
]
extra_names.assign(best=matches, score=scores)
not_matching best score
0 Vij Sales Vijay Sales 0.900000
1 Crom Electronics Croma Electronics 0.969697
2 REL Digital Reliance Digital 0.666667
3 Bajaj Elec Bajaj Electronics 0.740741
4 Reliance Digi Reliance Digital 0.896552
Post a Comment for "Quicker Way To Perform Fuzzy String Match In Pandas"