Pivot Table Aggregation By Index
I have two dataframes containing information about student grades and test scores. The first looks like this: ID Test_Score Class1 Class2 Class3 0 001 85
Solution 1:
You problem is that you need to keep the index values of the second dataframe when you pivot_table
, see this answer for understanding. So if you do:
print (df2.reset_index().pivot_table(index='index', values=[1], columns=[0],
aggfunc= lambda x: sorted(x)[0]))
# I used my own idea of highest function10 Algebra Calculus_1 Calculus_2 Trig
index0 A B C- NaN
1 C+ C- NaN C
then you can join
such as:
df_p = df2.reset_index().pivot_table(index='index', values=[1], columns=[0],
aggfunc= lambda x: sorted(x)[0])
df_p.columns =[col[1]for col in df_p.columns]
new_df = df1.join(df_p)
print (new_df)
ID Test_Score Class1 Class2 Class3 Algebra Calculus_1 Calculus_2 Trig
000185 B- A C+ A B C-NaN100278 B NaN B+ C+ C-NaN C
200393 A B NaNNaNNaNNaNNaN
Solution 2:
This is what you want. However with pivot you aren't allowed to have duplicate column names, so having Trig twice for the same user is going to result in an error.
So after renaming the duplicate Trig value to Trig2, the join/pivot works well.
df = pd.DataFrame({'ID':['001','002','003'],'Test_Score':[85,78,93],'Class1':['B-','B','A'],'Class2':['A','','B'],'Class3':['C+','B+','',]})
df2 = pd.DataFrame({0:['Algebra','Calculus_1','Calculus_2','Algebra','Trig','Trig2','Calculus_1'],1:['A','B','C-','C+','F','C','C-']}, index=[0,0,0,1,1,1,1])
df.join(df2.pivot(columns=0, values=1))
ID Test_Score Class1 Class2 Class3 Algebra Calculus_1 Calculus_2 Trig Trig2
000185 B- A C+ A B C- NaN NaN
100278 B B+ C+ C- NaN F C
200393 A B NaN NaN NaN NaN NaN
Post a Comment for "Pivot Table Aggregation By Index"