How To Group-by Twice, Preserve Original Columns, And Plot
Solution 1:
The problem of losing the values of 'exercise' when grouping by the maximum of 'area' can be solved by keeping the MultiIndex (i.e. not using reset_index
) and using .transform
to create a boolean mask to select the appropriate full rows of mean_il_CA
that contain the maximum 'impact_level' values per 'area'. This solution is based on the code provided in this answer by unutbu. The full labels for the bar chart can be created by concatenating the labels of 'area' and 'exercise'.
Here is an example using the titanic dataset from the seaborn package. The variables 'class', 'embark_town', and 'fare' are used in place of 'area', 'exercise', and 'impact_level'. The categorical variables both contain three unique values: 'First', 'Second', 'Third', and 'Cherbourg', 'Queenstown', 'Southampton'.
import pandas as pd # v 1.2.5import seaborn as sns # v 0.11.1
df = sns.load_dataset('titanic')
data = df[['class', 'embark_town', 'fare']]
data.head()
data_mean = data.groupby(['class', 'embark_town'])['fare'].mean()
data_mean
# Select max values in each class and create concatenated labels
mask_max = data_mean.groupby(level=0).transform(lambda x: x == x.max())
data_mean_max = data_mean[mask_max].reset_index()
data_mean_max['class, embark_town'] = data_mean_max['class'].astype(str) + ', ' \
+ data_mean_max['embark_town']
data_mean_max
# Draw seaborn bar chart
sns.barplot(data=data_mean_max,
x=data_mean_max['fare'],
y=data_mean_max['class, embark_town'])
Post a Comment for "How To Group-by Twice, Preserve Original Columns, And Plot"