Skip to content Skip to sidebar Skip to footer

Pyspark - Create New Column From Operations Of Dataframe Columns Gives Error "column Is Not Iterable"

I have a PySpark DataFrame and I have tried many examples showing how to create a new column based on operations with existing columns, but none of them seem to work. So I have t̶

Solution 1:

a = sqlContext.createDataFrame([(5, 5, 3)], ['A', 'B', 'C'])
a = a.withColumn('my_sum', F.UserDefinedFunction(lambda *args: sum(args), IntegerType())(*a.columns))
a.show()

+---+---+---+------+
|  A|  B|  C|my_sum|
+---+---+---+------+
|  5|  5|  3|    13|
+---+---+---+------+

Solution 2:

Your problem is in this part for col in a.columns cuz you cannot iterate the result, so you must:

a = a.withColumn('my_sum', a.A + a.B + a.C)

Post a Comment for "Pyspark - Create New Column From Operations Of Dataframe Columns Gives Error "column Is Not Iterable""