Deleting Half Of Dataframe Rows Which Meet Condition
I'm looking to extract a subset of a dataframe based on a condition. Let's say df = pd.Dataframe({'Col1': [values1], 'Col2' = [values2], 'Col3' = [values3]}) I'd like to sort by
Solution 1:
Data input
values1 = [-5,10,13,-3,-1,-2]
values2 = [-5,10,13,-3,-1,-2]
values3 = [-5,10,13,-3,-1,-2]
df = pd.DataFrame({'Col1': values1, 'Col2' : values2, 'Col3' : values3})
By using sample
and concat
, you can calculated the n from sample(n), i simply using 2 here
pd.concat([df[df.Col2>0],df[df.Col2<0].sample(2)])
Out[224]:
Col1 Col2 Col3
110101021313135 -2 -2 -24 -1 -1 -1
Solution 2:
A straight-forward approach, first, you wanted your data-frame sorted:
In [16]: df = pd.DataFrame({'Col1': values1, 'Col2':values2, 'Col3': values3})
In [17]: df
Out[17]:
Col1 Col2 Col3
0 1 -5 a
1 2 10 b
2 3 13 c
3 4 -3 d
4 5 -1 e
5 6 -2 f
In [18]: df.sort_values('Col2', inplace=True)
In [19]: df
Out[19]:
Col1 Col2 Col3
0 1 -5 a
3 4 -3 d
5 6 -2 f
4 5 -1 e
1 2 10 b
2 3 13 c
Then, create a boolean mask for the negative values, use np.where
to get the indices, cut the indices and half, then drop those indices:
In [20]: mask = (df.Col2 < 0)
In [21]: idx, = np.where(mask)
In [22]: df.drop(df.index[idx[:len(idx)//2]])
Out[22]:
Col1 Col2 Col3
56 -2 f
45 -1 e
1210 b
2313 c
Post a Comment for "Deleting Half Of Dataframe Rows Which Meet Condition"