How To Use Np.genfromtxt And Fill In Missing Columns?
Solution 1:
Pandas has more robust readers and you can use the DataFrame
methods to handle the missing values.
You'll have to figure out how many columns to use first:
columns = max(len(l.split()) for l inopen('data.txt'))
To read the file:
import pandas
df = pandas.read_table('data.txt',
delim_whitespace=True,
header=None,
usecols=range(columns),
engine='python')
To convert to a numpy array:
importnumpya= numpy.array(df)
This will fill in NaNs in the blank positions. You can use .fillna()
to get other values for blanks.
filled = numpy.array(df.fillna(999))
Solution 2:
You need to modify the filling_values
argument to np.nan
(which is considered of type float so you won't have the string conversion issue) and specify the delimiter to be comma since by default genfromtxt
expects only white space as delimiters:
trainData = np.genfromtxt('data.txt', usecols = range(0, 5), invalid_raise=False, missing_values = "", filling_values=np.nan, delimiter=',')
Solution 3:
I managed to figure out a solution.
df = pandas.DataFrame([line.strip().split() for line in open('data.txt', 'r')])
data = np.array(df)
Solution 4:
With the copy-n-paste of the 3 big lines, this pandas reader works:
In [149]: pd.read_csv(BytesIO(txt), delim_whitespace=True,header=None,error_bad_
...: lines=False,names=list(range(91)))
Out[149]:0123456789 ... 8182\00.790.10.91-0.170.10.33-0.90.1-0.19-0.0 ... 51516310.790.10.91-0.170.10.33-0.90.1-0.19-0.0 ... 51516320.790.10.91-0.170.10.33-0.90.1-0.19-0.0 ... 1253083848586878889900535NaNNaNNaNNaNNaNNaNNaN1509112.0535.0NaNNaNNaNNaNNaN2412422.0556.055.0355.0485.0112.0515.0
_.values
to get the array.
The key is specifying a big enough names
list. Pandas can fill incomplete lines, while genfromtxt
requires explicit delimiters.
Post a Comment for "How To Use Np.genfromtxt And Fill In Missing Columns?"