Split A DataFrame By Rows Containing Zero In Python Pandas
Solution 1:
Here is a not-so-elegant solution, which would however get you to the groupby
you need :)
df2 = df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
df2[df2['B'].notnull()].groupby('group')
If you inspect df2
(I'm creating a new one just in case you want to have two different ones, but you can perhaps chain the operation if need be), it looks like this now:
A B group
2020-01-01 0.0 3.0 1
2020-01-02 1.0 2.0 1
2020-01-03 NaN NaN 1
2020-01-04 NaN NaN 1
2020-01-05 4.0 1.0 2
2020-01-06 5.0 2.0 2
So, now you can filter out values where df['B'] is null
(which is essentially the rows where two consecutive 0s appeared in a row), and then groupby this new column group
.
What happens here is:
df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
If the B value is equal to 0 and either the previous or next one are also equal to zero, hide these rows (replace with NaN via df.mask()
)
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
Create an indicator column group
, just to let Pandas know what to groupby
(you can also just directly group by that whole expression, I just want to make the step clear). The group is defined as follows: a new group is defined if the previous value of B is Null, and if the current value is not null. Then take the cumulative sum, and this way you get this fabricated "id" to groupby.
Post a Comment for "Split A DataFrame By Rows Containing Zero In Python Pandas"