How To Remove Strings Present In A List From A Column In Pandas
Solution 1:
I think need str.replace
if want remove also substrings:
df['name'] = df['name'].str.replace('|'.join(To_remove_lst), '')
If possible some regex characters:
import re
df['name'] = df['name'].str.replace('|'.join(map(re.escape, To_remove_lst)), '')
print (df)
ID name
0 1 Kitty
1 2 Puppy
2 3 is example
3 4 stackoverflow
4 5 World
But if want remove only words use nested list comprehension:
df['name'] = [' '.join([y for y in x.split() if y not in To_remove_lst]) for x indf['name']]
Solution 2:
I'd recommend re.sub
in a list comprehension for speed.
import re
p = re.compile('|'.join(map(re.escape, To_remove_lst)))
df['name'] = [p.sub('', text) for text indf['name']]
print (df)
ID name
0 1 Kitty
1 2 Puppy
2 3 is example
3 4 stackoverflow
4 5 World
List comprehensions are implemented in C and operate in C speed. I highly recommend list comprehensions when working with string and regex data over pandas str
functions for the time-being because the API is a bit slow.
The use of map(re.escape, To_remove_lst)
is to escape any possible regex metacharacters which are meant to be treated literally during replacement.
The pattern is precompiled before calling regex.sub
to reduce the overhead of compilation at each iteration.
I've also let it slide but please use PEP-8 compliant variable names "to_remove_lst" (lower-snake case).
Timings
df = pd.concat([df] * 10000)
%timeit df['name'].str.replace('|'.join(To_remove_lst), '')
%timeit [p.sub('', text) for text indf['name']]
100 ms ± 5.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
60 ms ± 3.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solution 3:
You can run a for loop for each element and then use str.replace
for WORD in To_remove_lst:
df['name'] = df['name'].str.replace(WORD, '')
Output:
ID name
01 Kitty
12 Puppy
23is example
34 stackoverflow
45 World
Post a Comment for "How To Remove Strings Present In A List From A Column In Pandas"