Sum Large Pandas Dataframe Based On Smaller Date Ranges
I have a large pandas dataframe that has hourly data associated with it. I then want to parse that into 'monthly' data that sums the hourly data. However, the months aren't neces
Solution 1:
pd.merge_asof
only available with pandas 0.19
combination of pd.merge_asof
+ query
+ groupby
pd.merge_asof(df, month, left_on='date', right_on='start') \
.query('date <= end').groupby(['start', 'end']).num.sum().reset_index()
explanationpd.merge_asof
From docs
For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.
But this only takes into account the start
date.
query
I take care of end
date with query
since I now conveniently have end
in my dataframe after pd.merge_asof
groupby
I trust this part is obvious`
Solution 2:
Maybe you can convert to a period and add a number of days
# create data
dates = pd.Series(pd.date_range('1/1/2015 00:00','3/31/2015 23:45',freq='1H'))
nums = np.random.randint(0,100,dates.count())
df = pd.DataFrame({'date':dates, 'num':nums})
# offset days and then create perioddf['periods'] = (df.date + pd.tseries.offsets.Day(23)).dt.to_period('M')]
# group and sum
df.groupby('periods')['num'].sum()
Output
periods2015-01 100512015-02 342292015-03 373112015-04 26655
You can then shift the dates back and make new columns
Post a Comment for "Sum Large Pandas Dataframe Based On Smaller Date Ranges"