Split A DataFrame By Rows Containing Zero In Python Pandas

November 28, 2022 Post a Comment

Apologies if what I'm asking is very basic and has been answered elsewhere (couldn't find it, but it could be I'm just using the wrong terminology). I would like to be able to spli

Solution 1:

Here is a not-so-elegant solution, which would however get you to the groupby you need :)

df2 = df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
df2[df2['B'].notnull()].groupby('group')

If you inspect df2 (I'm creating a new one just in case you want to have two different ones, but you can perhaps chain the operation if need be), it looks like this now:

            A     B     group
2020-01-01  0.0   3.0   1
2020-01-02  1.0   2.0   1
2020-01-03  NaN   NaN   1
2020-01-04  NaN   NaN   1
2020-01-05  4.0   1.0   2
2020-01-06  5.0   2.0   2

So, now you can filter out values where df['B'] is null (which is essentially the rows where two consecutive 0s appeared in a row), and then groupby this new column group.

What happens here is:

df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))

If the B value is equal to 0 and either the previous or next one are also equal to zero, hide these rows (replace with NaN via df.mask())

df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()

Create an indicator column group, just to let Pandas know what to groupby (you can also just directly group by that whole expression, I just want to make the step clear). The group is defined as follows: a new group is defined if the previous value of B is Null, and if the current value is not null. Then take the cumulative sum, and this way you get this fabricated "id" to groupby.

Python Developer

Split A DataFrame By Rows Containing Zero In Python Pandas

Solution 1:

Post a Comment for "Split A DataFrame By Rows Containing Zero In Python Pandas"