Include Missing Group Keys As NaN In Pandas GroupBy Output
I have a dataframe in pandas. test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'], 'transact
Solution 1:
This is easy if you convert "transaction" to a categorical column before grouping,
df.transaction = pd.Categorical(df.transaction)
df.groupby(['date', 'transaction', 'ccy']).sum().unstack(2)
amt
ccy EUR USD
date transaction
2018-12-28 aa NaN 0.404488
bb 0.459295 NaN
cc NaN NaN
2018-12-29 aa NaN 0.439354
bb NaN NaN
cc 0.429269 NaN
2018-12-30 aa NaN NaN
bb NaN 1.542451
cc NaN NaN
Missing categories in the output are represented by NaNs. This is usually possible when performing numeric aggregation.
If you don't want to modify df
, this will do:
u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u, 'ccy']).sum().unstack(2)
amt
ccy EUR USD
date transaction
2018-12-28 aa NaN 0.429134
bb 0.852355 NaN
cc NaN NaN
2018-12-29 aa NaN 0.541576
bb NaN NaN
cc 0.994095 NaN
2018-12-30 aa NaN NaN
bb NaN 0.744587
cc NaN NaN
Post a Comment for "Include Missing Group Keys As NaN In Pandas GroupBy Output"