Efficient Method To Count Consecutive Positive Values In Pandas Dataframe
I trying to count the number of consecutive positive events for each column in a pandas dataframe. The solution provided by DSM here- Counting consecutive positive value in Python
Solution 1:
Use consecutiveCounts
just once in an unstacked series. Then, stack back to data frame.
Using DSM's consecutiveCount
, which I named c
here for simplicity:
>>>c = lambda y: y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)>>>c(df.unstack()).unstack().T
a b
0 0 0
1 1 0
2 0 0
3 1 0
4 2 1
5 0 2
6 0 0
7 0 1
8 1 2
9 2 3
10 0 0
11 1 0
12 0 0
Timings
# df2 is (65, 40)
df2 = pd.concat([pd.concat([df]*20, axis=1)]*5).T.reset_index(drop=True).T.reset_index(drop=True)
%timeit c(df2.unstack()).unstack().T
5.54 ms ± 296 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
%timeit df2.apply(c)
82.5 ms ± 2.19 ms per loop (mean ± std. dev. of7 runs, 10 loops each)
Solution 2:
You can try the apply
method. That might give you better results:
df.apply(consecutiveCount)
Solution 3:
Adapted from @cs95's answer:
a = pd.Series([-1, 2, 15, 3, 45, 5, 23, 0, 6, -4, -8, -5, 3,
-9, -7, -36, -71, -2, 25, 47, -8])
defpos_neg_count(a):
v = a.ge(0).ne(a.ge(0).shift()).cumsum()
vals = v.groupby(v).count().values
cols = ['pos', 'neg'] if a[0] >= 0else ['neg', 'pos']
try:
result = pd.DataFrame(vals.reshape(-1, 2), columns=cols)
except ValueError:
vals = np.insert(vals, len(vals), 0)
result = pd.DataFrame(vals.reshape(-1, 2), columns=cols)
return result
pos_neg_count(a)
# neg pos# 0 1 8# 1 3 1# 2 5 2# 3 1 0
Post a Comment for "Efficient Method To Count Consecutive Positive Values In Pandas Dataframe"