Fill Missing Timeseries Data Using Pandas Or Numpy
I have a list of dictionaries which looks like this : L=[ { 'timeline': '2014-10', 'total_prescriptions': 17 }, { 'timeline': '2014-11', 'total_prescriptions': 14 }, { 'timelin
Solution 1:
What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:
df=pd.DataFrame(L)df.index=pd.to_datetime(df.timeline,format='%Y-%m')dftimelinetotal_prescriptionstimeline2014-10-01 2014-10 172014-11-01 2014-11 142014-12-01 2014-12 82015-01-01 2015-142015-03-01 2015-3102015-04-01 2015-43Then you can add in your missing months with resample('MS') (MS stands for "month start" I guess), and use fillna(0) to convert null values to zero as in your requirement.
df=df.resample('MS').fillna(0)dftotal_prescriptionstimeline2014-10-01 172014-11-01 142014-12-01 82015-01-01 42015-02-01 NaN2015-03-01 102015-04-01 3To convert back to your original format, convert the datetime index back to string using to_native_types, and then export using to_dict('records'):
df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},
{'timeline': '2014-11-01', 'total_prescriptions': 14.0},
{'timeline': '2014-12-01', 'total_prescriptions': 8.0},
{'timeline': '2015-01-01', 'total_prescriptions': 4.0},
{'timeline': '2015-02-01', 'total_prescriptions': 0.0},
{'timeline': '2015-03-01', 'total_prescriptions': 10.0},
{'timeline': '2015-04-01', 'total_prescriptions': 3.0}]
Post a Comment for "Fill Missing Timeseries Data Using Pandas Or Numpy"