How To Group-by Twice, Preserve Original Columns, And Plot

November 30, 2023 Post a Comment

I have the following data sets (only sample is shown): I want to find the most impactful exercise per area and then plot it via Seaborn barplot. I use the following code to do s

Solution 1:

The problem of losing the values of 'exercise' when grouping by the maximum of 'area' can be solved by keeping the MultiIndex (i.e. not using reset_index) and using .transform to create a boolean mask to select the appropriate full rows of mean_il_CA that contain the maximum 'impact_level' values per 'area'. This solution is based on the code provided in this answer by unutbu. The full labels for the bar chart can be created by concatenating the labels of 'area' and 'exercise'.

Here is an example using the titanic dataset from the seaborn package. The variables 'class', 'embark_town', and 'fare' are used in place of 'area', 'exercise', and 'impact_level'. The categorical variables both contain three unique values: 'First', 'Second', 'Third', and 'Cherbourg', 'Queenstown', 'Southampton'.

import pandas as pd    # v 1.2.5import seaborn as sns  # v 0.11.1

df = sns.load_dataset('titanic')
data = df[['class', 'embark_town', 'fare']]
data.head()

data_mean = data.groupby(['class', 'embark_town'])['fare'].mean()
data_mean

# Select max values in each class and create concatenated labels
mask_max = data_mean.groupby(level=0).transform(lambda x: x == x.max())
data_mean_max = data_mean[mask_max].reset_index()
data_mean_max['class, embark_town'] = data_mean_max['class'].astype(str) + ', ' \
                                      + data_mean_max['embark_town']
data_mean_max

# Draw seaborn bar chart
sns.barplot(data=data_mean_max,
            x=data_mean_max['fare'],
            y=data_mean_max['class, embark_town'])

Python Developer

How To Group-by Twice, Preserve Original Columns, And Plot

Solution 1:

Post a Comment for "How To Group-by Twice, Preserve Original Columns, And Plot"