Skip to content Skip to sidebar Skip to footer

Include Missing Group Keys As NaN In Pandas GroupBy Output

I have a dataframe in pandas. test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'], 'transact

Solution 1:

This is easy if you convert "transaction" to a categorical column before grouping,

df.transaction = pd.Categorical(df.transaction)
df.groupby(['date', 'transaction', 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.404488
           bb           0.459295       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.439354
           bb                NaN       NaN
           cc           0.429269       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  1.542451
           cc                NaN       NaN

Missing categories in the output are represented by NaNs. This is usually possible when performing numeric aggregation.


If you don't want to modify df, this will do:

u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u, 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.429134
           bb           0.852355       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.541576
           bb                NaN       NaN
           cc           0.994095       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.744587
           cc                NaN       NaN

Post a Comment for "Include Missing Group Keys As NaN In Pandas GroupBy Output"