Pandas Replace Values With Nan At Random
I am testing the performance of a machine learning algorithm, specifically how it handles missing data and what kind of performance degrades are experienced when variables are miss
Solution 1:
Reassign using the sample
method, and pandas will introduce NaN
values due to auto-alignment:
df['var1'] = df['var1'].sample(frac=0.5)
Interactively:
In [1]: import pandas as pd
...: d = {'var1': [1, 2, 3, 4], 'var2': [5, 6, 7, 8]}
...: df = pd.DataFrame(data=d)
...: df
...:
Out[1]:
var1 var2
0 1 5
1 2 6
2 3 7
3 4 8
In [2]: df['var1'] = df['var1'].sample(frac=0.5)
In [3]: df
Out[3]:
var1 var2
0 1.0 5
1 NaN 6
2 3.0 7
3 NaN 8
Solution 2:
(Note: I created this before you posted your mcve. I can edit it to include your starting code.)
Here is a solution:
import pandas as pd
import numpy as np
df = pd.DataFrame({'x': np.random.random(20)})
length = len(df)
num = int(0.2*length)
idx_replace = np.random.randint(0, length-1, num)
df.loc[idx_replace, 'x'] = np.nan
print(df)
Output:
x
00.4266421NaN2NaN30.86936740.7197785NaN60.94441170.42473380.24654590.344444100.810131110.73502812NaN130.707681140.963711150.420725160.787127170.618693180.606222190.022355
Solution 3:
https://chartio.com/resources/tutorials/how-to-check-if-any-value-is-nan-in-a-pandas-dataframe/
skip down to 'Count Missing Values in DataFrame' df.isnull().sum().sum()
Post a Comment for "Pandas Replace Values With Nan At Random"