Skip to content Skip to sidebar Skip to footer

How To Use Np.genfromtxt And Fill In Missing Columns?

I am trying to use np.genfromtxt to load a data that looks something like this into a matrix: 0.79 0.10 0.91 -0.17 0.10 0.33 -0.90 0.10 -0.19 -0.00 0.10 -0.99 -0.06 0.10 -

Solution 1:

Pandas has more robust readers and you can use the DataFrame methods to handle the missing values.

You'll have to figure out how many columns to use first:

columns = max(len(l.split()) for l inopen('data.txt'))

To read the file:

import pandas
df = pandas.read_table('data.txt', 
                       delim_whitespace=True, 
                       header=None, 
                       usecols=range(columns), 
                       engine='python')

To convert to a numpy array:

importnumpya= numpy.array(df)

This will fill in NaNs in the blank positions. You can use .fillna() to get other values for blanks.

filled = numpy.array(df.fillna(999))

Solution 2:

You need to modify the filling_values argument to np.nan (which is considered of type float so you won't have the string conversion issue) and specify the delimiter to be comma since by default genfromtxt expects only white space as delimiters:

trainData = np.genfromtxt('data.txt', usecols = range(0, 5), invalid_raise=False, missing_values = "", filling_values=np.nan, delimiter=',')

Solution 3:

I managed to figure out a solution.

df = pandas.DataFrame([line.strip().split() for line in open('data.txt', 'r')])
data = np.array(df)

Solution 4:

With the copy-n-paste of the 3 big lines, this pandas reader works:

In [149]: pd.read_csv(BytesIO(txt), delim_whitespace=True,header=None,error_bad_
     ...: lines=False,names=list(range(91)))
Out[149]:0123456789   ...     8182\00.790.10.91-0.170.10.33-0.90.1-0.19-0.0  ...    51516310.790.10.91-0.170.10.33-0.90.1-0.19-0.0  ...    51516320.790.10.91-0.170.10.33-0.90.1-0.19-0.0  ...    1253083848586878889900535NaNNaNNaNNaNNaNNaNNaN1509112.0535.0NaNNaNNaNNaNNaN2412422.0556.055.0355.0485.0112.0515.0

_.values to get the array.

The key is specifying a big enough names list. Pandas can fill incomplete lines, while genfromtxt requires explicit delimiters.

Post a Comment for "How To Use Np.genfromtxt And Fill In Missing Columns?"