Skip to content Skip to sidebar Skip to footer

Apply Expanding Function On Dataframe

I have a function that I wish to apply to a subsets of a pandas DataFrame, so that the function is calculated on all rows (until current row) from the same group - i.e. using a gro

Solution 1:

An possible solution is to make the expanding part of the function, and use GroupBy.apply:

def foo1(_df):
    return _df['x1'].expanding().max() * _df['x2'].expanding().apply(lambda x: x[-1], raw=True)

df['foo_result'] = df.groupby('group').apply(foo1).reset_index(level=0, drop=True)
print (df)
  group  time   x1  x2  foo_result
0     A     1   10   1        10.0
3     B     1  100   2       200.0
1     A     2   40   2        80.0
4     B     2  200   0         0.0
2     A     3   30   1        40.0
5     B     3  300   3       900.0

This is not a direct solution to the problem of applying a dataframe function to an expanding dataframe, but it achieves the same functionality.


Solution 2:

Applying a dataframe function on an expanding window is apparently not possible (at least for not pandas version 0.23.0), as one can see by plugging a print statement into the function.

Running df.groupby('group').expanding().apply(lambda x: bool(print(x)) , raw=False) on the given DataFrame (where the bool around the print is just to get a valid return value) returns:

0    1.0
dtype: float64
0    1.0
1    2.0
dtype: float64
0    1.0
1    2.0
2    3.0
dtype: float64
0    10.0
dtype: float64
0    10.0
1    40.0
dtype: float64
0    10.0
1    40.0
2    30.0
dtype: float64

(and so on - and also returns a dataframe with '0.0' in each cell, of course).

This shows that the expanding window works on a column-by-column basis (we see that first the expanding time series is printed, then x1, and so on), and does not really work on a dataframe - so a dataframe function can't be applied to it.

So, to get the obtained functionality, one would have to put the expanding inside the dataframe function, like in the accepted answer.


Post a Comment for "Apply Expanding Function On Dataframe"