Skip to content Skip to sidebar Skip to footer

Split A DataFrame By Rows Containing Zero In Python Pandas

Apologies if what I'm asking is very basic and has been answered elsewhere (couldn't find it, but it could be I'm just using the wrong terminology). I would like to be able to spli

Solution 1:

Here is a not-so-elegant solution, which would however get you to the groupby you need :)

df2 = df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
df2[df2['B'].notnull()].groupby('group')

If you inspect df2 (I'm creating a new one just in case you want to have two different ones, but you can perhaps chain the operation if need be), it looks like this now:

            A     B     group
2020-01-01  0.0   3.0   1
2020-01-02  1.0   2.0   1
2020-01-03  NaN   NaN   1
2020-01-04  NaN   NaN   1
2020-01-05  4.0   1.0   2
2020-01-06  5.0   2.0   2

So, now you can filter out values where df['B'] is null (which is essentially the rows where two consecutive 0s appeared in a row), and then groupby this new column group.

What happens here is:

df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))

If the B value is equal to 0 and either the previous or next one are also equal to zero, hide these rows (replace with NaN via df.mask())

df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()

Create an indicator column group, just to let Pandas know what to groupby (you can also just directly group by that whole expression, I just want to make the step clear). The group is defined as follows: a new group is defined if the previous value of B is Null, and if the current value is not null. Then take the cumulative sum, and this way you get this fabricated "id" to groupby.


Post a Comment for "Split A DataFrame By Rows Containing Zero In Python Pandas"