-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resample beliefs about instantaneous sensors #118
Changes from 8 commits
55341e8
7f13512
53a78dc
4f5de41
13f1fbc
66ee209
a904765
f3feabc
167f007
3537784
707ea63
e262dd8
15f6429
d1bbb5b
cc00933
9330d16
b951c78
7bf2217
5395a6d
3bd3d35
1d98721
b3a70b9
bb703f9
5621506
9dec4da
ee00e40
23353ea
ed18518
f943d06
a4fca7e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -283,7 +283,9 @@ def join_beliefs( | |
if output_resolution > input_resolution: | ||
|
||
# Create new BeliefsDataFrame with downsampled event_start | ||
if output_resolution % input_resolution != timedelta(0): | ||
if input_resolution == timedelta( | ||
0 | ||
) or output_resolution % input_resolution != timedelta(0): | ||
raise NotImplementedError( | ||
"Cannot downsample from resolution %s to %s." | ||
% (input_resolution, output_resolution) | ||
|
@@ -749,3 +751,21 @@ def extreme_timedeltas_not_equal( | |
if isinstance(td_a, pd.Timedelta): | ||
td_a = td_a.to_pytimedelta() | ||
return td_a != td_b | ||
|
||
|
||
def downsample_first(df: pd.DataFrame, resolution: timedelta) -> pd.DataFrame: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function name was zero helpful to me. Comment also didn't help a lot. Is this downsampling, and in case of non-obvious cases opting for the first event? Also, handling DST correctly is so crucial, we could even consider it in to he name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for raising this issue. For me it boils down to a more fundamental question: does resampling affect the data frequency or the event resolution? Hopefully these examples illustrate the question: Let's say there is a sensor recording the average wind speed over a period of 3 seconds. That is, its event resolution is 3 seconds. And let's say a record is made every second, so we get a sliding window of measurements. That is, its data frequency is 1 second. (To be explicit, I'm following pandas terminology here, which measures frequency in units of time, say, Now we want to resample and save the results to a new sensor that records the maximum 3-second wind gust in a 10 minute period, for every sequential period of 10 minutes (this is actually a useful measure, e.g. reported by KNMI). That is, the new sensor has an event resolution of 10 minutes, and also a data frequency of 10 minutes. This resampling operation is downsampling, with both the event resolution and the data frequency being downsampled to 10 minutes. Something like Now consider a sensor recording instantaneous temperature readings, once per hour. That is, its event resolution is 0 hours and its data frequency is 1 hour. What would it mean to resample to 30 minutes? I see two possibilities:
For example, given temperature readings:
With linear interpolation, option 1 yields:
With linear interpolation, option 2 yields:
For now, I'm gravitating towards a policy like "in timely-beliefs, event resampling, by default, updates both the data frequency and the event resolution for non-instantaneous sensors, and only the data frequency for instantaneous sensors." But I'd like to support updating the event resolution for instantaneous sensors, too, maybe only when explicitly requested. For example, consider instantaneous battery SoC measurements every 5 minutes. When resampled to a day, one might not really be interested to know the SoC at midnight every day ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the distinction now (between treating data frequency and event resolution) W.r.t. your last paragraph,
Finally, you made no suggestion for a better function name yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed the function.
I now made the util function accept different resampling methods, and restricted its use in
I opened #123 for this. |
||
"""Resample data representing instantaneous events. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumption (instantaneous events) is never checked. Or that the index is even a DateTimeIndex There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you reply on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm more leaning towards making this a private function. Would that take away your concern? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite. If it matters so much, why not add an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opened #124 as a follow-up. |
||
|
||
Updates the data frequency, while keeping the event resolution. | ||
|
||
Note that the data frequency may not be constant due to DST transitions | ||
The duration between observations is longer for the fall DST transition, | ||
and shorter for the spring DST transition. | ||
""" | ||
ds_index = df.index.floor( | ||
resolution, ambiguous=[True] * len(df), nonexistent="shift_forward" | ||
Flix6x marked this conversation as resolved.
Show resolved
Hide resolved
|
||
).drop_duplicates() | ||
ds_df = df[df.index.isin(df.index.join(ds_index, how="inner"))] | ||
if ds_df.index.freq is None and len(ds_df) > 2: | ||
ds_df.index.freq = pd.infer_freq(ds_df.index) | ||
return ds_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't explain why your new function only applies to instantaneous sensors. Can you add that?
And why is that the one case in which we take so much care of DST transitions? (this
resample_events
function here has two other cases)Note: The NB about
keep_only_most_recent_belief=True
applies to only one case, the reader might assume it's about all of them.