Skip to content Skip to sidebar Skip to footer

How To Fill The Missing Record Of Pandas Dataframe In Pythonic Way?

I have a Pandas dataframe 'df' like this : X Y IX1 IX2 A A1 20 30 A2 20 30 A5 20 30 B B2 20 30 B4 20 30 It lost some rows, and I want to fil

Solution 1:

You need to construct your full index, and then use the reindex method of the dataframe. Like so...

import pandas
import StringIO
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'), 
              ('A', 'A4'), ('A', 'A5'), ('B', 'B1'), 
              ('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
A A1  2030
  A2  2030
  A3 NaN NaN
  A4 NaN NaN
  A5  2030
B B1 NaN NaN
  B2  2030
  B3  2030
  B4  2030

And then you can use the fillna method to set the NaNs to whatever you want.

update (June 2014)

Just had to revisit this myself... In the current version of pandas, there is a function to build MultiIndex from the Cartesian product of iterables. So the above solution could become:

datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
C1 C2
 A  12030220303 NaN NaN
    4 NaN NaN
    52030
 B  1 NaN NaN
    2203032030420305 NaN NaN

Pretty elegant, in my opinion.

Post a Comment for "How To Fill The Missing Record Of Pandas Dataframe In Pythonic Way?"