Skip to content Skip to sidebar Skip to footer

Pandas - Conditional Drop Duplicates

I have a Pandas 0.19.2 dataframe for Python 3.6x as below. I want to drop_duplicates() with the same Id based on a conditional logic. import pandas as pd import numpy as np np.rand

Solution 1:

Use GroupBy.transform for aggregated values with same size as original DataFrame with sort_values and drop_duplicates for remove dupes:

df['Size'] = df.groupby('Id')['Size'].transform('sum')
df = df.sort_values('Age').drop_duplicates('Id', keep='last').sort_index()
print (df)
   Id Name      Size  Age
1   2    B  0.812663   25
3   4    D  0.302333   31
4   3    E  0.146870   43
6   6    G  0.186260   44
7   7    H  0.345561   20
8   1    I  0.813789   51
9   8    K  0.538817   31

Post a Comment for "Pandas - Conditional Drop Duplicates"