Fill Missing Timeseries Data Using Pandas Or Numpy
I have a list of dictionaries which looks like this : L=[ { 'timeline': '2014-10', 'total_prescriptions': 17 }, { 'timeline': '2014-11', 'total_prescriptions': 14 }, { 'timelin
Solution 1:
What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:
df=pd.DataFrame(L)df.index=pd.to_datetime(df.timeline,format='%Y-%m')dftimelinetotal_prescriptionstimeline2014-10-01 2014-10 172014-11-01 2014-11 142014-12-01 2014-12 82015-01-01 2015-142015-03-01 2015-3102015-04-01 2015-43
Then you can add in your missing months with resample('MS')
(MS stands for "month start" I guess), and use fillna(0)
to convert null values to zero as in your requirement.
df=df.resample('MS').fillna(0)dftotal_prescriptionstimeline2014-10-01 172014-11-01 142014-12-01 82015-01-01 42015-02-01 NaN2015-03-01 102015-04-01 3
To convert back to your original format, convert the datetime index back to string using to_native_types
, and then export using to_dict('records')
:
df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},
{'timeline': '2014-11-01', 'total_prescriptions': 14.0},
{'timeline': '2014-12-01', 'total_prescriptions': 8.0},
{'timeline': '2015-01-01', 'total_prescriptions': 4.0},
{'timeline': '2015-02-01', 'total_prescriptions': 0.0},
{'timeline': '2015-03-01', 'total_prescriptions': 10.0},
{'timeline': '2015-04-01', 'total_prescriptions': 3.0}]
Post a Comment for "Fill Missing Timeseries Data Using Pandas Or Numpy"