Skip to content Skip to sidebar Skip to footer

Mapping Pandas Dataframe Column To A Dictionary

I have a case of a dataframe containing a categorical variable of high cardinality (many unique values). I would like to re-code that variable to a set of values (the top most freq

Solution 1:

There are at least a couple of methods you can use:

where + Boolean indexing

df['fruits'].where(df['fruits'].isin(top_values), 'other', inplace=True)

loc + Boolean indexing

df.loc[~df['fruits'].isin(top_values), 'fruits'] = 'other'

After this process, you will probably want to turn your series into a categorical:

df['fruits'] = df['fruits'].astype('category')

Doing this before the value replacement operation probably won't help as your input series has high cardinality.

Solution 2:

df.newCol = df.apply(lambda row: row.fruits if row.fruits in top_values else 'others' )

Post a Comment for "Mapping Pandas Dataframe Column To A Dictionary"