Combine Pivoted And Aggregated Column In PySpark Dataframe
My question is related to this one. I have a PySpark DataFrame, named df, as shown below. date | recipe | percent | volume ---------------------------------------- 2019-01-0
Solution 1:
According to Spark's source code, it has a special branch for pivoting with single aggregation.
val singleAgg = aggregates.size == 1
def outputName(value: Expression, aggregate: Expression): String = {
val stringValue = value.name
if (singleAgg) {
stringValue <--- Here
}
else {
val suffix = {...}
stringValue + "_" + suffix
}
}
I don't know the reason, but the single remaining option is column renaming.
Here is a simplified version for renaming:
def rename(identity: Set[String], suffix: String)(df: DataFrame): DataFrame = {
val fieldNames = df.schema.fields.map(filed => filed.name)
val renamed = fieldNames.map(fieldName => {
if (identity.contains(fieldName)) {
fieldName
} else {
fieldName + suffix
}} )
df.toDF(renamed:_*)
}
Usage:
rename(Set("date"), "_percent")(pivoted).show()
+----------+---------+---------+
| date|A_percent|B_percent|
+----------+---------+---------+
|2019-01-01| 0.025| 0.05|
|2019-01-02| 0.11| 0.06|
+----------+---------+---------+
Post a Comment for "Combine Pivoted And Aggregated Column In PySpark Dataframe"