Skip to content Skip to sidebar Skip to footer

Efficient Method To Count Consecutive Positive Values In Pandas Dataframe

I trying to count the number of consecutive positive events for each column in a pandas dataframe. The solution provided by DSM here- Counting consecutive positive value in Python

Solution 1:

Use consecutiveCounts just once in an unstacked series. Then, stack back to data frame.

Using DSM's consecutiveCount, which I named c here for simplicity:

>>>c = lambda y: y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)>>>c(df.unstack()).unstack().T

    a   b
0   0   0
1   1   0
2   0   0
3   1   0
4   2   1
5   0   2
6   0   0
7   0   1
8   1   2
9   2   3
10  0   0
11  1   0
12  0   0

Timings

# df2 is (65, 40)
df2 = pd.concat([pd.concat([df]*20, axis=1)]*5).T.reset_index(drop=True).T.reset_index(drop=True)

%timeit c(df2.unstack()).unstack().T
5.54 ms ± 296 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
%timeit df2.apply(c)
82.5 ms ± 2.19 ms per loop (mean ± std. dev. of7 runs, 10 loops each)

Solution 2:

Solution 3:

Adapted from @cs95's answer:

a = pd.Series([-1, 2, 15, 3, 45, 5, 23, 0, 6, -4, -8, -5, 3, 
-9, -7, -36, -71, -2, 25, 47, -8])

defpos_neg_count(a):
    v = a.ge(0).ne(a.ge(0).shift()).cumsum()
    vals = v.groupby(v).count().values
    cols = ['pos', 'neg'] if a[0] >= 0else ['neg', 'pos']
    try:
        result = pd.DataFrame(vals.reshape(-1, 2), columns=cols)
    except ValueError:
        vals = np.insert(vals, len(vals), 0)
        result = pd.DataFrame(vals.reshape(-1, 2), columns=cols)
    return result

pos_neg_count(a)
#       neg pos#   0     1   8#   1     3   1#   2     5   2#   3     1   0

Post a Comment for "Efficient Method To Count Consecutive Positive Values In Pandas Dataframe"