Skip to content Skip to sidebar Skip to footer

Deleting Half Of Dataframe Rows Which Meet Condition

I'm looking to extract a subset of a dataframe based on a condition. Let's say df = pd.Dataframe({'Col1': [values1], 'Col2' = [values2], 'Col3' = [values3]}) I'd like to sort by

Solution 1:

Data input

values1 = [-5,10,13,-3,-1,-2]
values2 = [-5,10,13,-3,-1,-2]
values3 = [-5,10,13,-3,-1,-2]
df = pd.DataFrame({'Col1': values1, 'Col2' : values2, 'Col3' : values3})

By using sample and concat , you can calculated the n from sample(n), i simply using 2 here

pd.concat([df[df.Col2>0],df[df.Col2<0].sample(2)])
Out[224]: 
   Col1  Col2  Col3
110101021313135    -2    -2    -24    -1    -1    -1

Solution 2:

A straight-forward approach, first, you wanted your data-frame sorted:

In [16]:  df = pd.DataFrame({'Col1': values1, 'Col2':values2, 'Col3': values3})
In [17]: df
Out[17]:
   Col1  Col2 Col3
0     1    -5    a
1     2    10    b
2     3    13    c
3     4    -3    d
4     5    -1    e
5     6    -2    f

In [18]: df.sort_values('Col2', inplace=True)

In [19]: df
Out[19]:
   Col1  Col2 Col3
0     1    -5    a
3     4    -3    d
5     6    -2    f
4     5    -1    e
1     2    10    b
2     3    13    c

Then, create a boolean mask for the negative values, use np.where to get the indices, cut the indices and half, then drop those indices:

In [20]: mask = (df.Col2 < 0)

In [21]: idx, = np.where(mask)

In [22]: df.drop(df.index[idx[:len(idx)//2]])
Out[22]:
   Col1  Col2 Col3
56    -2    f
45    -1    e
1210    b
2313    c

Post a Comment for "Deleting Half Of Dataframe Rows Which Meet Condition"