You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for raising this issue. For me it boils down to a more fundamental question: does resampling affect the data frequency or the event resolution? Hopefully these examples illustrate the question:
Let's say there is a sensor recording the average wind speed over a period of 3 seconds. That is, its event resolution is 3 seconds. And let's say a record is made every second, so we get a sliding window of measurements. That is, its data frequency is 1 second. (To be explicit, I'm following pandas terminology here, which measures frequency in units of time, say, s rather than 1/s.)
Now we want to resample and save the results to a new sensor that records the maximum 3-second wind gust in a 10 minute period, for every sequential period of 10 minutes (this is actually a useful measure, e.g. reported by KNMI). That is, the new sensor has an event resolution of 10 minutes, and also a data frequency of 10 minutes.
This resampling operation is downsampling, with both the event resolution and the data frequency being downsampled to 10 minutes. Something like .resample(pd.Timedelta("PT10M")).max() should do the trick to transform the data, and the event resolution would be updated to 10 minutes.
Now consider a sensor recording instantaneous temperature readings, once per hour. That is, its event resolution is 0 hours and its data frequency is 1 hour. What would it mean to resample to 30 minutes? I see two possibilities:
We measure instantaneous temperature every 30 minutes. The event resolution stays 0 hours and the data frequency becomes 30 minutes. Something like .resample(pd.Timedelta("PT30M")).ffill() (or .interpolate()) would do the trick to transform the data. The event resolution would not change. We are upsampling (with regards to the data frequency).
We measure average temperatures over a period of 30 minutes. Some combination of .resample().interpolate().rolling().mean() should do the trick to transform the data. The event resolution would be updated to 30 minutes. We are upsampling (with regards to the data frequency) and downsampling (with regards to the event resolution).
For example, given temperature readings:
10 °C at 1 PM
12 °C at 2 PM
With linear interpolation, option 1 yields:
10 °C at 1 PM
11 °C at 1.30 PM
12 °C at 2 PM
With linear interpolation, option 2 yields:
10.5 °C average between 1 PM and 1.30 PM
11.5 °C average between 1.30 PM and 2 PM
For now, I'm gravitating towards a policy like "in timely-beliefs, event resampling, by default, updates both the data frequency and the event resolution for non-instantaneous sensors, and only the data frequency for instantaneous sensors." But I'd like to support updating the event resolution for instantaneous sensors, too, maybe only when explicitly requested. For example, consider instantaneous battery SoC measurements every 5 minutes. When resampled to a day, one might not really be interested to know the SoC at midnight every day (.first(), and keep the original zero event resolution), but rather something like the daily min or max SoC (.min()/.max(), and update the event resolution to 1 day).
Let's say there is a sensor recording the average wind speed over a period of 3 seconds. That is, its event resolution is 3 seconds. And let's say a record is made every second, so we get a sliding window of measurements. That is, its data frequency is 1 second. (To be explicit, I'm following pandas terminology here, which measures frequency in units of time, say,
s
rather than1/s
.)Now we want to resample and save the results to a new sensor that records the maximum 3-second wind gust in a 10 minute period, for every sequential period of 10 minutes (this is actually a useful measure, e.g. reported by KNMI). That is, the new sensor has an event resolution of 10 minutes, and also a data frequency of 10 minutes.
This resampling operation is downsampling, with both the event resolution and the data frequency being downsampled to 10 minutes. Something like
.resample(pd.Timedelta("PT10M")).max()
should do the trick to transform the data, and the event resolution would be updated to 10 minutes.Now consider a sensor recording instantaneous temperature readings, once per hour. That is, its event resolution is 0 hours and its data frequency is 1 hour. What would it mean to resample to 30 minutes? I see two possibilities:
.resample(pd.Timedelta("PT30M")).ffill()
(or.interpolate()
) would do the trick to transform the data. The event resolution would not change. We are upsampling (with regards to the data frequency)..resample().interpolate().rolling().mean()
should do the trick to transform the data. The event resolution would be updated to 30 minutes. We are upsampling (with regards to the data frequency) and downsampling (with regards to the event resolution).For example, given temperature readings:
With linear interpolation, option 1 yields:
With linear interpolation, option 2 yields:
For now, I'm gravitating towards a policy like "in timely-beliefs, event resampling, by default, updates both the data frequency and the event resolution for non-instantaneous sensors, and only the data frequency for instantaneous sensors." But I'd like to support updating the event resolution for instantaneous sensors, too, maybe only when explicitly requested. For example, consider instantaneous battery SoC measurements every 5 minutes. When resampled to a day, one might not really be interested to know the SoC at midnight every day (
.first()
, and keep the original zero event resolution), but rather something like the daily min or max SoC (.min()
/.max()
, and update the event resolution to 1 day).Originally posted by @Flix6x in #118 (comment)
The text was updated successfully, but these errors were encountered: