How To Highlight Weekends For Time Series Line Plot In Python
Solution 1:
You can easily highlight areas by using axvspan
, to get the areas to be highlighted you can run through the index of your dataframe and search for the weekend days. I've also added an example for highlighting 'occupied hours' during a working week (hopefully that doesn't confuse things).
I've created dummy data for a dataframe based on days and another one for hours.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# dummy data (Days)
dates_d = pd.date_range('2017-01-01', '2017-02-01', freq='D')
df = pd.DataFrame(np.random.randint(1, 20, (dates_d.shape[0], 1)))
df.index = dates_d
# dummy data (Hours)
dates_h = pd.date_range('2017-01-01', '2017-02-01', freq='H')
df_h = pd.DataFrame(np.random.randint(1, 20, (dates_h.shape[0], 1)))
df_h.index = dates_h
#two graphs
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)
#plot lines
dfs = [df, df_h]
for i, df inenumerate(dfs):
for v in df.columns.tolist():
axes[i].plot(df[v], label=v, color='black', alpha=.5)
deffind_weekend_indices(datetime_array):
indices = []
for i inrange(len(datetime_array)):
if datetime_array[i].weekday() >= 5:
indices.append(i)
return indices
deffind_occupied_hours(datetime_array):
indices = []
for i inrange(len(datetime_array)):
if datetime_array[i].weekday() < 5:
if datetime_array[i].hour >= 7and datetime_array[i].hour <= 19:
indices.append(i)
return indices
defhighlight_datetimes(indices, ax):
i = 0while i < len(indices)-1:
ax.axvspan(df.index[indices[i]], df.index[indices[i] + 1], facecolor='green', edgecolor='none', alpha=.5)
i += 1#find to be highlighted areas, see functions
weekend_indices = find_weekend_indices(df.index)
occupied_indices = find_occupied_hours(df_h.index)
#highlight areas
highlight_datetimes(weekend_indices, axes[0])
highlight_datetimes(occupied_indices, axes[1])
#formatting..
axes[0].xaxis.grid(b=True, which='major', color='black', linestyle='--', alpha=1) #add xaxis gridlines
axes[1].xaxis.grid(b=True, which='major', color='black', linestyle='--', alpha=1) #add xaxis gridlines
axes[0].set_xlim(min(dates_d), max(dates_d))
axes[0].set_title('Weekend days', fontsize=10)
axes[1].set_title('Occupied hours', fontsize=10)
plt.show()
Solution 2:
I tried using the code in the accepted answer but the way the indices are used, the last weekend in the time series does not get highlighted entirely, despite what the image currently shown suggests (this is noticeable mainly with a frequency of 6 hours or more). Also, it does not work if the frequency of the data is higher than daily. This is why I share here a solution that uses the x-axis units so that weekends (or any other recurring time period) can be highlighted without any problem related to the index.
This solution takes only 6 lines of code and it works with any frequency. In the example below, it highlights full weekend days which makes it more efficient than the accepted answer where small frequencies (e.g. 30 minutes) will produce many polygons to cover the whole weekend.
The x-axis limits are used to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. Then a weekends
mask is computed and passed to the where
argument of the fill_between
plotting function. The masks are processed as right-exclusive so in this case, they must contain Mondays for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that contrary to axvspan
, the fill_between
function needs the y1
and y2
arguments. For some reason, using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. This issue is solved by running ax.set_ylim(*ax.get_ylim())
just after creating the plot.
import numpy as np # v 1.19.2import pandas as pd # v 1.1.3import matplotlib.pyplot as plt # v 3.3.2import matplotlib.dates as mdates
# Create sample dataset
rng = np.random.default_rng(seed=1234) # random number generator
dti = pd.date_range('2017-01-01', '2017-05-15', freq='D')
counts = 5000 + np.cumsum(rng.integers(-1000, 1000, size=dti.size))
df = pd.DataFrame(dict(Counts=counts), index=dti)
# Draw pandas plot: x_compat=True converts the pandas x-axis units to matplotlib# date units (not strictly necessary when using a daily frequency like here)
ax = df.plot(x_compat=True, figsize=(10, 5), legend=None, ylabel='Counts')
ax.set_ylim(*ax.get_ylim()) # reset y limits to display highlights without gaps# Highlight weekends based on the x-axis units
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values# Create appropriate ticks using matplotlib date tick locators and formatters
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator(bymonthday=np.arange(5, 31, step=7)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('\n%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
# Additional formatting
ax.figure.autofmt_xdate(rotation=0, ha='center')
title = 'Daily count of trips with weekends highlighted from SAT 00:00 to MON 00:00'
ax.set_title(title, pad=20, fontsize=14);
As you can see, the weekends are always highlighted to the full extent, regardless of where the data starts and ends.
You can find more examples of this solution in the answers I have posted here and here.
Solution 3:
I have another suggestion to make in this regard, which takes inspirations from previous posts by other contributors. The code is as follows:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
rng = np.random.default_rng(seed=42) # random number generator
dti = pd.date_range('2021-08-01', '2021-08-31', freq='D')
counts = 5000 + np.cumsum(rng.integers(-1000, 1000, size=dti.size))
df = pd.DataFrame(dict(Counts=counts), index=dti)
weekends = [d for d in df.index if d.isoweekday() in [6,7]]
weekend_list = []
for weekendday in weekends:
d1 = weekendday
d2 = weekendday + datetime.timedelta(days=1)
weekend_list.append((d1, d2))
weekend_df = pd.DataFrame(weekend_list)
sns.set()
plt.figure(figsize=(15, 10), dpi=100)
df.plot()
plt.legend(bbox_to_anchor=(1.02, 0), loc="lower left", borderaxespad=0)
plt.ylabel("Counts")
plt.xlabel("Date of visit")
plt.xticks(rotation = 0)
plt.title("Daily counts of shop visits with weekends highlighted in green")
ax = plt.gca()
for d in weekend_df.index:
print(weekend_df[0][d], weekend_df[1][d])
ax.axvspan(weekend_df[0][d], weekend_df[1][d], facecolor="g", edgecolor="none", alpha=0.5)
ax.relim()
ax.autoscale_view()
plt.savefig("junk.png", dpi=100, bbox_inches='tight', pad_inches=0.2)
Post a Comment for "How To Highlight Weekends For Time Series Line Plot In Python"