With timeseries data we often require to resample on different intervel to feed in to our analytics model.
Pandas resample have a built-in list of widely used methods. However, if the built-in methods are not sufficient, it is always possible to write a custom function to resample.
This post shows an example.
Say, we have a months temperature data captured every hour. We shall calculate the number of times in a day the temperature crossed 40 degree celsius.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns ts_index = pd.date_range(start='10/01/2016', end='10/30/2016', freq='H') temperature_series = pd.Series(np.random.randint(20, 60, len(ts_index)), index=ts_index) temperature_series_mean = temperature_series .resample('D') .mean() temperature_series_gt_40 = temperature_series .resample('D') .apply(lambda day_array: sum(day_array > 40)) fig, ax1 = plt.subplots() ax2 = ax1.twinx() # Day's mean shown as line chart plt_mean = ax1.plot(temperature_series_mean, 'c-') # Day's count shown as dot chart plt_gt40 = ax2.plot(temperature_series_gt_40, 'bo') ax1.set_xlabel('Date') ax1.set_ylabel('Day\'s mean temerature') ax2.set_ylabel('Number of times temperature crossed 40') plt.legend(handles=plt_mean, loc='upper left') plt.rcParams['figure.figsize'] = 13,7 plt.show()