Skip to content Skip to sidebar Skip to footer

Conditional Running Count In Pandas For All Previous Rows Only

Suppose I have the following DataFrame: df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 'Date': ['2019-01-01', '2019-02-01', '2

Solution 1:

groupby + rank

Dates can be treated as numeric. Use'min' to get your counting logic.

df['PEC'] = (df.groupby('Event').Date.rank(method='min')-1).astype(int)

  Event       Date  PEC
0     A 2019-01-01    0
1     B 2019-02-01    0
2     A 2019-03-01    1
3     A 2019-03-01    1
4     B 2019-02-15    1
5     C 2019-03-15    0
6     B 2019-04-05    2
7     B 2019-04-05    2
8     A 2019-04-15    3
9     C 2019-06-10    1

Solution 2:

First get counts by GroupBy.size per both columns, then aggregate by first level with shift and cumulative sum and last join to original:

s = (df.groupby(['Event', 'Date'])
       .size()
       .groupby(level=0)
       .apply(lambda x: x.shift(1).cumsum())
       .fillna(0)
       .astype(int))

df = df.join(s.rename('Previous_Event_Count'), on=['Event','Date'])
print (df)
  Event       Date  Previous_Event_Count
0     A 2019-01-01                     0
1     B 2019-02-01                     0
2     A 2019-03-01                     1
3     A 2019-03-01                     1
4     B 2019-02-15                     1
5     C 2019-03-15                     0
6     B 2019-04-05                     2
7     B 2019-04-05                     2
8     A 2019-04-15                     3
9     C 2019-06-10                     1

Solution 3:

Finally, I can find a better and faster way to get the desired result. It turns out that it is very easy. One can try:

df['Total_Previous_Sale'] = df.groupby('Event').cumcount() \
                          - df.groupby(['Event', 'Date']).cumcount()

Post a Comment for "Conditional Running Count In Pandas For All Previous Rows Only"