-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further improvements to splitting module #187
Comments
I have worked on the first two points and have a finished commit here: maxloeffler@a18a932. I will not open a pull request just now though, as we discussed, to not interfere with this #244. I will look into the corner cases (point three) from now on :) |
Thanks! I had a quick look at it, and your commit looks good to me (given that the functionality did not change, what we will hopefully see in the tests - did you already check whether the tests for the different kinds of splitting do also test for the bins attribute?). |
I manually stepped through the code and checked that the corner cases we discovered in my last PR do not apply to activity-based network splitting, as they did to activity-based data splitting. This has mainly the reason, that the There was also an issue that was caused by the |
Also about your last question regarding tests. In the last PR I added tests, that test for the validity of the |
Here, we collect further ideas from PR #184 on how to improve the splitting module. Thanks to @clhunsen for your suggestions!
bins.date
) from ranges:construct.bin.labels.from.ranges
get.bin.dates.from.ranges
[edited on 2024-02-29]split.network.time.based.by.ranges
andsplit.networks.by.bins
. To be able to do that, we need functions likeconstruct.bin.labels.from.ranges
(see above) to extract the bins from the ranges. (See also the current occurrences ofattr(nets, "bins") = bins.date
). After adding this attribute in the mentioned functions, we do not need to set this attribute in the functionsplit.networks.time.based
anymore.If we construct ranges in
if (sliding.window) {
ranges = construct.overlapping.ranges(start = min(dates), end = max(dates),
time.period = time.period, overlap = 0.5, raw = FALSE,
include.end.date = TRUE)
bins.info = construct.overlapping.ranges(start = min(dates), end = max(dates),
time.period = time.period, overlap = 0.5, raw = TRUE,
include.end.date = TRUE)
bins.date = sort(unname(unique(get.date.from.unix.timestamp(unlist(bins.info)))))
} else {
bins.info = split.get.bins.time.based(dates, time.period, number.windows)
bins.date = get.date.from.string(bins.info[["bins"]])
}
} else {
## specific bins are given, do not use sliding windows
sliding.window = FALSE
## set the bins to use
bins.date = bins
}
## split all networks to the extracted bins
networks.split = lapply(networks, function(net) {
if (sliding.window) {
nets = split.network.time.based.by.ranges(network = net, ranges = ranges,
remove.isolates = remove.isolates)
attr(nets, "bins") = bins.date
} else {
nets = split.network.time.based(network = net, bins = bins.date, sliding.window = sliding.window,
remove.isolates = remove.isolates)
}
return(nets)
})
split.networks.time.based
after Line 660 and after Line 668 (both usingconstruct.ranges(bins.date, sliding.window = ...)
), we could omit the if-else cascade in Lines 682ff. This would decrease the complexity of the functionality below.coronet/util-split.R
Lines 660 to 691 in 78e99a1
(We don't want to implement the fourth suggestions, since it would add one additional, but unnecessary indirection, as
split.network.time.based.by.ranges
would in turn callsplit.network.time.based
, and this would be one more indirection in theelse
case than what the current implementation does. Therefore, I have crossed out the last suggestion.) [edited on 2023-11-15]All three suggestions should be easy to implement and should, therefore, easily improve the code of the splitting module.
The text was updated successfully, but these errors were encountered: