Skip to content Skip to sidebar Skip to footer

Fill Missing Timeseries Data Using Pandas Or Numpy

I have a list of dictionaries which looks like this : L=[ { 'timeline': '2014-10', 'total_prescriptions': 17 }, { 'timeline': '2014-11', 'total_prescriptions': 14 }, { 'timelin

Solution 1:

What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:

df=pd.DataFrame(L)df.index=pd.to_datetime(df.timeline,format='%Y-%m')dftimelinetotal_prescriptionstimeline2014-10-01  2014-10                   172014-11-01  2014-11                   142014-12-01  2014-12                    82015-01-01   2015-142015-03-01   2015-3102015-04-01   2015-43

Then you can add in your missing months with resample('MS') (MS stands for "month start" I guess), and use fillna(0) to convert null values to zero as in your requirement.

df=df.resample('MS').fillna(0)dftotal_prescriptionstimeline2014-10-01                   172014-11-01                   142014-12-01                    82015-01-01                    42015-02-01                  NaN2015-03-01                   102015-04-01                    3

To convert back to your original format, convert the datetime index back to string using to_native_types, and then export using to_dict('records'):

df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},
 {'timeline': '2014-11-01', 'total_prescriptions': 14.0},
 {'timeline': '2014-12-01', 'total_prescriptions': 8.0},
 {'timeline': '2015-01-01', 'total_prescriptions': 4.0},
 {'timeline': '2015-02-01', 'total_prescriptions': 0.0},
 {'timeline': '2015-03-01', 'total_prescriptions': 10.0},
 {'timeline': '2015-04-01', 'total_prescriptions': 3.0}]

Post a Comment for "Fill Missing Timeseries Data Using Pandas Or Numpy"