How To Fill The Missing Record Of Pandas Dataframe In Pythonic Way?
I have a Pandas dataframe 'df' like this : X Y IX1 IX2 A A1 20 30 A2 20 30 A5 20 30 B B2 20 30 B4 20 30 It lost some rows, and I want to fil
Solution 1:
You need to construct your full index, and then use the reindex
method of the dataframe. Like so...
import pandas
import StringIO
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")
dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'),
('A', 'A4'), ('A', 'A5'), ('B', 'B1'),
('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
C3 C4
A A1 2030
A2 2030
A3 NaN NaN
A4 NaN NaN
A5 2030
B B1 NaN NaN
B2 2030
B3 2030
B4 2030
And then you can use the fillna
method to set the NaNs to whatever you want.
update (June 2014)
Just had to revisit this myself...
In the current version of pandas, there is a function to build MultiIndex
from the Cartesian product of iterables. So the above solution could become:
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")
dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
C3 C4
C1 C2
A 12030220303 NaN NaN
4 NaN NaN
52030
B 1 NaN NaN
2203032030420305 NaN NaN
Pretty elegant, in my opinion.
Post a Comment for "How To Fill The Missing Record Of Pandas Dataframe In Pythonic Way?"