Skip to content Skip to sidebar Skip to footer

How To Group-by Twice, Preserve Original Columns, And Plot

I have the following data sets (only sample is shown): I want to find the most impactful exercise per area and then plot it via Seaborn barplot. I use the following code to do s

Solution 1:

The problem of losing the values of 'exercise' when grouping by the maximum of 'area' can be solved by keeping the MultiIndex (i.e. not using reset_index) and using .transform to create a boolean mask to select the appropriate full rows of mean_il_CA that contain the maximum 'impact_level' values per 'area'. This solution is based on the code provided in this answer by unutbu. The full labels for the bar chart can be created by concatenating the labels of 'area' and 'exercise'.

Here is an example using the titanic dataset from the seaborn package. The variables 'class', 'embark_town', and 'fare' are used in place of 'area', 'exercise', and 'impact_level'. The categorical variables both contain three unique values: 'First', 'Second', 'Third', and 'Cherbourg', 'Queenstown', 'Southampton'.

import pandas as pd    # v 1.2.5import seaborn as sns  # v 0.11.1

df = sns.load_dataset('titanic')
data = df[['class', 'embark_town', 'fare']]
data.head()

data_head

data_mean = data.groupby(['class', 'embark_town'])['fare'].mean()
data_mean

data_mean

# Select max values in each class and create concatenated labels
mask_max = data_mean.groupby(level=0).transform(lambda x: x == x.max())
data_mean_max = data_mean[mask_max].reset_index()
data_mean_max['class, embark_town'] = data_mean_max['class'].astype(str) + ', ' \
                                      + data_mean_max['embark_town']
data_mean_max

data_mean_max

# Draw seaborn bar chart
sns.barplot(data=data_mean_max,
            x=data_mean_max['fare'],
            y=data_mean_max['class, embark_town'])

sns_barplot

Post a Comment for "How To Group-by Twice, Preserve Original Columns, And Plot"