Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Filled in Column in End Of Season Dialog. #9214

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

MeSophie
Copy link
Contributor

Fixes #9117
@rdstern @lilyclements I made some modification on End Rain/Season dialog. Please have a look.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MeSophie this is looking great. There is still one improvement to make (you may not have included that element yet?) and one question for you and @lilyclements
a) The status is still not adjusted for missing values. I used dodoma and made row 838 (17 April 1937) as missing. Then the results on the end of the rains (14 April) are given and end of the season is correctly missing. The status variable should then be NA, but it is still TRUE.
b) The question, is when you give the filled in column. Unless @lilyclements would like it always, I suggest you only give it when some years are FALSE in the status. That's when there IS something to be filled in. If there is nothing to be added - so no years are FALSE, then you don't add the filled variable.
c) Just to confirm when you do the status correctly, then the Filled in is only for status ==FALSE. If status == NA, then the filled in is also NA.

@lilyclements
Copy link
Contributor

lilyclements commented Oct 29, 2024

@rdstern what parameters have you set here? I am unable to reproduce this my end at the moment.

Just to confirm when you do the status correctly, then the Filled in is only for status ==FALSE. If status == NA, then the filled in is also NA.

Agreed. If the status is NA then that is due to missing data, so the filled would by default be NA. I can understand why this would not be happening at the moment (due to my code, of course), so a reproducible answer would help me to fix this.

@rdstern
Copy link
Collaborator

rdstern commented Oct 29, 2024

@lilyclements and @MeSophie I used dodoma - as usual.
a) End of the rains from 1 Feb to end April. That works fine, but TRUE in status (correctly) each time.
b) End of season from end of rains to 15 May. That also gives TRUE each time (correctly) and also gives the filled variable after the status variable. Can't tell if correct, because no FALSE in end of season - that is the instance when I suggest we don't need the extra variable.
c) Changed the end date to end April (day 121) and ran again. Left the names as before. Now sometimes it is FALSE and I confirmed that the extra filled variable correctly picks up the 121.
d) Then made missing day as reported above 17 April 1937. So after end rains and before end season in that year. Confirmed the end season day and date now missing, but status remains TRUE - it should also become missing.

@lilyclements
Copy link
Contributor

@rdstern perfect thank you. I can see the issue now. We did a similar change in start of rains.

@MeSophie can you amend the function_exp in end_season_status to be:

end_season_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, TRUE), FALSE)", result_name="end_season_status", save=2)

@rdstern
Copy link
Collaborator

rdstern commented Oct 31, 2024

@MeSophie please tell me when I can test again?

@MeSophie
Copy link
Contributor Author

@rdstern I made the change according to @lilyclements code.
For this part, I'm still investigate on how to implement it.
image

Moreover, there's this error that suddenly appeared on line two of the code. I don't know if @lilyclements can help to solve it. Even If we right-click to convert a variable to a factor manually, we also get the same error.
data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")

image

@rdstern
Copy link
Collaborator

rdstern commented Oct 31, 2024

@MeSophie yes I have recently got that error. I think it has just been fixed (a few minutes ago), so can you update your branch to use the latest merged version. Then that should now be ok, and maybe soves the other problem too?

@MeSophie
Copy link
Contributor Author

@rdstern I update the branch and it works fine now. You can Already test the change made on the end_season_status.

@lilyclements
Copy link
Contributor

b) The question, is when you give the filled in column. Unless @lilyclements would like it always, I suggest you only give it when some years are FALSE in the status. That's when there IS something to be filled in. If there is nothing to be added - so no years are FALSE, then you don't add the filled variable.

@rdstern from the way the calculation system is set up, we create the filled column at the same time that we create the status column.
This means that we cannot check if there are FALSE's in the status column in advance of creating the filled column as easily. (Unless I set up an if statement after running the creation of the status column, but then I don't know how we easily run that if statement in VB; Alternatively, I could create and then delete the filled column if they have the same values, but I don't like this as it can run unnecessary code!)

I believe that the easiest way to handle this is to have a checkbox option to include a filled column (I suppose this way the user can then rename it too if they have multiple of them, like the day of year). What are your thoughts on the addition of the filled column being a checkbox?

@rdstern
Copy link
Collaborator

rdstern commented Oct 31, 2024

@MeSophie and @lilyclements very happy to have a checkbox. I suggest unchecked by default.

@rdstern
Copy link
Collaborator

rdstern commented Nov 1, 2024

@MeSophie I did my usual sequence, so end of rains up to end April, and then end season from end of rains till end April. Now get this essor.

image

@MeSophie
Copy link
Contributor Author

MeSophie commented Nov 1, 2024

@MeSophie I did my usual sequence, so end of rains up to end April, and then end season from end of rains till end April. Now get this essor.

image

@lilyclements Could you please help me to investigate on this error. It is caused by line
data_book$run_instat_calculation(display=FALSE, param_list=list(drop=FALSE), calc=end_of_season_combined) I don't know what may be the problem. Thank you

@lilyclements
Copy link
Contributor

@MeSophie it is coming from having the doy attached to the end of rains column. There's a merging issue, but it is now fixed. If you call in PR 271 then it should be fixed.

Allowing checks and conversions if join type differs between two data frames
@MeSophie
Copy link
Contributor Author

MeSophie commented Nov 4, 2024

@rdstern the problem is now fixed. Could you please test the dialog again?
Thank you @lilyclements.

@rdstern
Copy link
Collaborator

rdstern commented Nov 4, 2024

@MeSophie You now give the status as NA, when it should be FALSE. here is my example:

Here there were many missing end of season, because it didn't happen by 30 April.

I added a missing value in the data in 1937, after the end of the rains. There the NA in the status is correct. (That's how we distinguish between the missing data and the censored data.

Uploading image.png…

Can you also add the checkbox to Include Filled Data?

@lilyclements
Copy link
Contributor

I added a missing value in the data in 1937, after the end of the rains. There the NA in the status is correct. (That's how we distinguish between the missing data and the censored data.

@rdstern where should the status be FALSE not NA? The image is not loading for me.

I cannot open R-Instat at the moment given the most recent development version (it crashes, and shuts itself). If you could share your entire R script then that would help me out as I would then be able to debug in R!

@rdstern
Copy link
Collaborator

rdstern commented Nov 8, 2024

@lilyclements and @MeSophie I thought we agreed, like the start of the rains, that FALSE is where there is no end of the season by the defined end date?
So there are 3 possibilities say the end limit is day 121 - just like the start:
a) The end day is on or before 121, then the day is given and the status is TRUE.
b) No end day by 121. Then the end day is missing and the status is FALSE
c) Missing value in the data, before the end day. Then the end day is missing and the status is missing.

@lilyclements
Copy link
Contributor

@rdstern thanks - I was running the sequence incorrectly and so it was working for me. Very confused. But, I'm able to replicate the error now. I think Sophie just missed a message before saying it can be retested.

@MeSophie as stated here, can you amend the function_exp in end_season_status to be:

end_season_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, TRUE), FALSE)", result_name="end_season_status", save=2)

@MeSophie
Copy link
Contributor Author

MeSophie commented Nov 8, 2024

@rdstern thanks - I was running the sequence incorrectly and so it was working for me. Very confused. But, I'm able to replicate the error now. I think Sophie just missed a message before saying it can be retested.

@MeSophie as stated here, can you amend the function_exp in end_season_status to be:

end_season_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, TRUE), FALSE)", result_name="end_season_status", save=2)

@lilyclements Sorry I can see the difference between the actual end_season_status and the one you send. It's the same code as the one you sent last week and it's already implemented. It's the one that @rdstern says doesn't work.

Dialog: End of Rains/Season

year_type <- data_book$get_column_data_types(data_name="dodoma", columns="year")

data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")
rain_min <- instat_calculation$new(type="calculation", function_exp="ifelse(test=is.na(x=rain), yes=0, no=rain)", result_name="rain_min", calculated_from=list("dodoma"="rain"))
wb_min <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_min - 5, n=-1), .init=0)", result_name="wb_min", sub_calculations=list(rain_min))
rain_max <- instat_calculation$new(type="calculation", function_exp="ifelse(yes=100, test=is.na(x=rain), no=rain)", result_name="rain_max", calculated_from=list("dodoma"="rain"))
wb_max <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_max - 5, n=-1), .init=0)", result_name="wb_max", sub_calculations=list(rain_max))
wb <- instat_calculation$new(type="calculation", function_exp="ifelse(test=(wb_min != wb_max) | is.na(x=rain), yes=NA, no=wb_min)", result_name="wb", sub_calculations=list(wb_min, wb_max))
conditions_filter <- instat_calculation$new(type="filter", function_exp="(wb <= 0.5) | is.na(x=rain)", sub_calculations=list(wb))
grouping_by_station_year <- instat_calculation$new(type="by", calculated_from=list("dodoma"="year"))
doy_filter <- instat_calculation$new(type="filter", function_exp="doy_366 >= 1 & doy_366 <= 366", calculated_from=calc_from_convert(x=list(dodoma="doy_366")))
end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)
end_season_date <- instat_calculation$new(type="summary", function_exp="dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date))", result_name="end_season_date", save=2)
end_season_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, TRUE), FALSE)", result_name="end_season_status", save=2)
end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_date, end_season_status))
data_book$run_instat_calculation(display=FALSE, param_list=list(drop=FALSE), calc=end_of_season_combined)
linked_data_name <- data_book$get_linked_to_data_name("dodoma", link_cols=c("year"))

data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)
rm(list=c("end_of_season_combined", "conditions_filter", "wb", "wb_min", "rain_min", "wb_max", "rain_max", "grouping_by_station_year", "doy_filter", "end_season", "end_season_date", "end_season_status", "year_type", "linked_data_name"))

@MeSophie
Copy link
Contributor Author

MeSophie commented Nov 8, 2024

@lilyclements Could you please also provide the code for this column (I assume it is a new column as end_season_filled )? Thank you.

@lilyclements
Copy link
Contributor

@MeSophie thanks. It wasn’t running the full end_season_status for me but if it’s working now then great.

the filled column code is exactly what you’re currently running with your end_season_filled, this just needs to be attached to a checkbox now (and the instat_calculation$new updated)

@lilyclements
Copy link
Contributor

@rdstern I am trying to understand what I am misinterpreting so I can fix the code.

a) End of the rains from 1 Feb to end April. That works fine, but TRUE in status (correctly) each time.
b) No end day by 121. Then the end day is missing and the status is FALSE
c) Missing value in the data, before the end day. Then the end day is missing and the status is missing.

In the below example, I set our row 838 to NA
I then look at End of Season for DOY 105 to 121.

  • We can see no data for 1936, so I would put this into group (b).
  • We can see only missing data for 1937, so I would put this into group (c) since we have "Missing value in the data before the end day". So, I've been assuming then that our end day is missing so the status should be missing.

image

The resulting output from the table above:
image

@lilyclements
Copy link
Contributor

@MeSophie what does this new checkbox in the length dialog do? Thanks.

@rdstern
Copy link
Collaborator

rdstern commented Nov 11, 2024

@lilyclements and @MeSophie your a) b) and c) above are perfect. But, when you look at my output above 1936 should be FALSE as it is for you. You can see from my figure above (where there were no missing values, that 1936 was NA and there no instances of FALSE in my record, though there were years, without missing, where the season didn't finish by day 121.

@lilyclements
Copy link
Contributor

Good puzzle - finally solved! We need to convert year in dodoma_by_year into a factor as well as year in dodoma. The R code was all fine, but it wasn't working since it was dropping levels in dodoma_by_year$year!

data_book$convert_column_to_type(data_name="dodoma_by_year", col_names="year", to_type="factor")
data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")

This means we have to
(a) Check if a link already exists
(b) Check the name of the linking variable

So we want to run something like this:

data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")

linked_data_name <- data_book$get_linked_to_data_name("dodoma", link_cols=c("year"))
linked_variable_name <- data_book$get_link_between("dodoma", linked_data_name )$link_columns
data_book$convert_column_to_type(data_name=linked_data_name, col_names=linked_variable_name, to_type="factor")

But then, what if there hasn't been a linked data frame created yet? We can't really do an if statement in the VB output code.

So perhaps I should write a function which:

(a) checks existence of a link
(b) gets the name of the linked variable in the new df (dodoma_by_year, here)
(c) converts the column to be the same type as it's linked column (so, here, takes dodoma_by_year$year, and sets it as a factor, because dodoma$year is set as a factor in our R code).
(d) If there is no link, it does nothing

Thoughts?

# preliminary function, but needs testing -- e.g., looping through multiple variables
convert_linked_variable <- function(data, variables){
linked_data_name <- data_book$get_linked_to_data_name(data, link_cols=c(variables))
if (!is.null(linked_data_name)){
linked_variable_name <- data_book$get_link_between(data, linked_data_name )$link_columns

# TODO: loop through all columns given in variable argument
variable_type <- data_book$get_column_data_types(data_name = data, columns=variables)
data_book$convert_column_to_type(data_name=linked_data_name, col_names=linked_variable_name, to_type="factor")
}
}

@MeSophie
Copy link
Contributor Author

@MeSophie what does this new checkbox in the length dialog do? Thanks.

@lilyclements The checkbox is for the Filled data.

@lilyclements
Copy link
Contributor

@MeSophie OK, and what does it do? Does it fill in the data - if so, how do we know the end date? Etc.

@MeSophie
Copy link
Contributor Author

@lilyclements I tested the new change with our dodoma data and it works well but when change to Ghana data I obtain these notifications but the code is still produce the result.

image

image

year_type <- data_book$get_column_data_types(data_name="ghana", columns="year")

data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type="factor")
linked_data_name <- data_book$get_linked_to_data_name("ghana", link_cols=c("year", "station"))

linked_variable_name <- data_book$get_link_between("ghana", linked_data_name)$link_columns

data_book$convert_column_to_type(data_name=linked_data_name, col_names=linked_variable_name, to_type="factor")
rain_min <- instat_calculation$new(type="calculation", function_exp="ifelse(test=is.na(x=rainfall), yes=0, no=rainfall)", result_name="rain_min", calculated_from=list("ghana"="rainfall"))
wb_min <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_min - 5, n=-1), .init=0)", result_name="wb_min", sub_calculations=list(rain_min))
rain_max <- instat_calculation$new(type="calculation", function_exp="ifelse(yes=100, test=is.na(x=rainfall), no=rainfall)", result_name="rain_max", calculated_from=list("ghana"="rainfall"))
wb_max <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_max - 5, n=-1), .init=0)", result_name="wb_max", sub_calculations=list(rain_max))
wb <- instat_calculation$new(type="calculation", function_exp="ifelse(test=(wb_min != wb_max) | is.na(x=rainfall), yes=NA, no=wb_min)", result_name="wb", sub_calculations=list(wb_min, wb_max))
conditions_filter <- instat_calculation$new(type="filter", function_exp="(wb <= 0.5) | is.na(x=rainfall)", sub_calculations=list(wb))
grouping_by_station_year <- instat_calculation$new(type="by", calculated_from=list("ghana"="station","ghana"="year"))
doy_filter <- instat_calculation$new(type="filter", function_exp="doy >= 1 & doy <= 121", calculated_from=calc_from_convert(x=list(ghana="doy")))
end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy))", result_name="end_season", calculated_from=list("ghana"="doy"), save=2)
end_season_date <- instat_calculation$new(type="summary", function_exp="dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date))", result_name="end_season_date", save=2)
end_season_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, TRUE), FALSE)", result_name="end_season_status", save=2)
end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_date, end_season_status))
data_book$run_instat_calculation(display=FALSE, param_list=list(drop=FALSE), calc=end_of_season_combined)
linked_data_name <- data_book$get_linked_to_data_name("ghana", link_cols=c("year", "station"))

data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)
rm(list=c("end_of_season_combined", "conditions_filter", "wb", "wb_min", "rain_min", "wb_max", "rain_max", "grouping_by_station_year", "doy_filter", "end_season", "end_season_date", "end_season_status", "year_type", "linked_data_name", "linked_variable_name"))

@MeSophie
Copy link
Contributor Author

@MeSophie OK, and what does it do? Does it fill in the data - if so, how do we know the end date? Etc.

@lilyclements I think @rdstern is the best to answer.

@rdstern
Copy link
Collaborator

rdstern commented Nov 11, 2024

@MeSophie and @lilyclements so far great. It works perfectly - so far - for dodoma! Let me continue. There is now a nice file for Zambia with 5 stations. Let me try that.

@rdstern
Copy link
Collaborator

rdstern commented Nov 12, 2024

@MeSophie and @lilyclements the problem is probably an old one I think you fixed in the start of the rains? Here is the error message with the end of the season at dodoma.

image

It isn't just a multiple stations problem, but is when there is a station variable in the command. So I added the station for dodoma and here it is.

I hope this helps.

@lilyclements
Copy link
Contributor

lilyclements commented Nov 12, 2024

@MeSophie I can see this error has in it linked_variable_name, which is from a suggested function I wrote above.

I didn't mean for this to be read in yet - or to be used (yet). I would put it into a function myself, if we were happy with the suggestion I gave, and then you would just have to read in the convert_linked_variable function, when written.

Waiting for feedback on that until I create that function and go from there.

@rdstern
Copy link
Collaborator

rdstern commented Nov 13, 2024

@MeSophie I hope the above is sufficient for you to make the corrections today? I'm really keen for this and the length to be included in the version at the end of this week.

@MeSophie
Copy link
Contributor Author

@MeSophie I hope the above is sufficient for you to make the corrections today? I'm really keen for this and the length to be included in the version at the end of this week.

@rdstern from my understanding of @lilyclements comment, the error is coming from the linked_variable_name I added. So I deleted the function in stand_alone_function so that she can add herself.

@lilyclements does this mean that I don't need anymore this line of code
linked_data_name <- data_book$get_linked_to_data_name("dodoma", link_cols=c("year"))
If we implement linked_data_name in the stand_alone_function?

… the same class as that of the from data frame
@lilyclements
Copy link
Contributor

lilyclements commented Nov 13, 2024

@MeSophie, great point. Yes we wouldn't want to run that get_linked_to_data_name code anymore. I've made the changes to merge into this PR here. I am quite confident of this approach. I'm sorry for the confusion again, and for asking you to make another amendment on your side. It's looking great, and coming along really well.

You just need to run before the EoR/EoS code:

year_type <- data_book$get_column_data_types(data_name="ghana", columns="year")

data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type="factor")

data_book$convert_linked_variable(from_data_frame = "ghana", link_cols=c("year", "station"))

Can you also change the bit after the EoR/EoS code from:

linked_data_name <- data_book$get_linked_to_data_name("ghana", link_cols=c("year", "station"))
data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)

to:

data_book$convert_column_to_type(data_name="ghana", col_names = "year", to_type=year_type)
data_book$convert_linked_variable(from_data_frame = "ghana", link_cols=c("year", "station"))

Changing the "after" isn't as important, but, it would be nice for consistency.

I hope this is as straight forward as changing your get_linked_to_data_name function to convert_linked_variable, and removing it's assign.
(And having it run after we convert the year variable in the ghana data frame in both instances)

@MeSophie
Copy link
Contributor Author

@MeSophie, great point. Yes we wouldn't want to run that get_linked_to_data_name code anymore. I've made the changes to merge into this PR here. I am quite confident of this approach. I'm sorry for the confusion again, and for asking you to make another amendment on your side. It's looking great, and coming along really well.

You just need to run before the EoR/EoS code:

year_type <- data_book$get_column_data_types(data_name="ghana", columns="year")

data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type="factor")

data_book$convert_linked_variable(from_data_frame = "ghana", link_cols=c("year", "station"))

Can you also change the bit after the EoR/EoS code from:

linked_data_name <- data_book$get_linked_to_data_name("ghana", link_cols=c("year", "station"))
data_book$convert_column_to_type(data_name="ghana", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)

to:

data_book$convert_column_to_type(data_name="ghana", col_names = "year", to_type=year_type)
data_book$convert_linked_variable(from_data_frame = "ghana", link_cols=c("year", "station"))

Changing the "after" isn't as important, but, it would be nice for consistency.

I hope this is as straight forward as changing your get_linked_to_data_name function to convert_linked_variable, and removing it's assign. (And having it run after we convert the year variable in the ghana data frame in both instances)

I'll get started straight away. Thank you @lilyclements.

Adding function in linking to convert variables in a to data frame to be the same class as that of the from data frame
@rdstern
Copy link
Collaborator

rdstern commented Nov 13, 2024

@MeSophie and @lilyclements I get this error message

image

It then produces results, but the status is still NA when it should be FALSE.

I am sorry it is giving so much trouble!

@MeSophie
Copy link
Contributor Author

MeSophie commented Nov 13, 2024

@rdstern I think the EOR/EOS dialog is okay now. Could you please test again?

image
@rdstern could you please also provide more information about this? thank you. The question is about tjhe fiiled checkbox to add on Length of season dialog.

@rdstern
Copy link
Collaborator

rdstern commented Nov 14, 2024

@MeSophie and @lilyclements this is a big advance and I think is almost there now! I have been checking first with Dodoma, including an added station variable. Then with Zambia 5 stations, that is in the library. Dodoma first.
By the way I have only checked with just the end of the rains/season starting the dodoma, by station, year data frame. In practice there will usually be that data frame already, with the start, etc. We should check that situation next time.

I have never had an error message.

Here are the results for Dodoma:

image

Now just 2 small things to note.

a) First, the station linking variable is numeric, i.e. value 1 - it is labelled dodoma in the daily data.
b) I made a missing value in the data in 1938. This was after the end of the rains, and before the end of the season. The status variable is correct - it gives NA. (yippee!).
c) Second is that the filled variable should then also be NA. It should only be filled when the status is FALSE. (It is correctly filled for the instances of FALSE.)

Now the results for Zambia:

image

a) It works - no error messages.
b) The station variable goes from 1 to 5 for the stations - just as in Dodoma.
c) There are many missing values here, but the status (for both end rain and end season) is never NA, which it should be (and was for dodoma). So here there is just TRUE and FALSE in the status variables.

Thanks. Am I wrong, in thinking you guys are really close now?

@lilyclements
Copy link
Contributor

lilyclements commented Nov 15, 2024

Comments

Linking Station Variable Type

a) First, the station linking variable is numeric, i.e. value 1 - it is labelled dodoma in the daily data.

b) The station variable goes from 1 to 5 for the stations - just as in Dodoma.

@rdstern this is now fixed (hopefully!)

End Season Filled - NAs

c) Second is that the filled variable should then also be NA. It should only be filled when the status is FALSE. (It is correctly filled for the instances of FALSE.)

@MeSophie apologies, can you change the end_season_filled to be:

end_season_filled <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, ifelse(dplyr::first(is.na(wb)), NA, dplyr::first(x=doy_366)), 121)", result_name="end_season_filled", calculated_from=list("dodoma"="doy_366"), save=2)

Zambia Data

@rdstern where can I find the Zambia data? I can only see Moorings in R-Instat.

I think I know what the issue might be! At least, I've found an issue to fix (which I think is what you're seeing in line 80):

  1. Take our Dodoma data and set row 838 to be NA before we do end of rains
  2. Run the End of Rains dialog
  3. We get for 1937 that the start day is NA, but the status is TRUE.

image

If we look at what's happening, we can see this is because we take the last instance of rainfall, and that happens to be an NA.
However, the penultimate is not NA.
Do we want to take that penultimate day (105) in this instance, or change the status?

image

@rdstern
Copy link
Collaborator

rdstern commented Nov 15, 2024

@lilyclements in the latest version 0.8.0, which is now in the download website, it is added to Moorings in the climatic library.

I'll look at the other question when I am off the bus!

@rdstern
Copy link
Collaborator

rdstern commented Nov 15, 2024

Your 838- 17 April is made missing. Then the end of the rains is correctly missing, because that may otherwise have been the last day up to 30 April. Any day mssing up to 30 April should make the end of the rains missing.

And then status should also be NA.

@MeSophie
Copy link
Contributor Author

@rdstern The end_season_filled is now fixed. Please have a look.

@rdstern
Copy link
Collaborator

rdstern commented Nov 16, 2024

Here is the sitation now:

image

By the way I made 838 and 1180 missing, and the end of the rains copes well. 838 is after the end of the rains in 1937 so it doesn't know whether the date it found is last. 1180, in 1938 is before the end of the rains, so it doesn't matter that it is missing, because there are 10mm after that date. Congragtulations both of you, on the code, because that's pretty clever. And one reason I suggest for a statistics package for summaries, rather than coding in a spreadsheet, is because they cope well with missing values.

We need to write this all up and I like this example! I can't check with these events, whether the end of the season having a missing value is catered for, because the end date is ideantical for the rains and the season. I'll make it a bit later to check that too!

So I have changed the end of the season last date to 5 May and also made day 122 -1 May missing in 1936. Here are the new results, and they are "interesting"!

image

So they correctly have the end of the rains as ok, because the missing is in May. And they correctly have the end of the season as missing. So the code is coping brilliantly with the missing values in the day and the date variables. So our baseline is solid!

Now notic 1936 the season status is now correctly missing! Yippee. It is incorrect in 1937, because it should be missing whenever the day is missing, because of a missing value. So 1936 and 1937, should both have the status as missing.

And the filled value now. In 1937 when the status is incorrectly FALSE, (should be NA) the filled value is correctly NA. In 1936, when the status is correctly NA, the filled value is incorrectly 126.

And the filled value, should of course, only be 126 when the status is FALSE. Otherwise it should be the end-season date!

@lilyclements
Copy link
Contributor

lilyclements commented Nov 16, 2024

Problem 1:

Your 838- 17 April is made missing. Then the end of the rains is correctly missing, because that may otherwise have been the last day up to 30 April. Any day mssing up to 30 April should make the end of the rains missing.
And then status should also be NA.

To fix this, we need to amend the end_rains_status for the end of rains side. @MeSophie can you fix the function_exp on the end_rain_status to be:

end_rains_status <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, yes = ifelse(is.na(x=dplyr::last(x=roll_sum_rain)), yes = NA, no = TRUE), no = FALSE)", result_name="end_rains_status", save=2)

Problem 2:

The issue with the end season filled values is a really simple one! It's just that the parameters are being given in the wrong order, so R is taking our "yes" and "no" the wrong way in our "ifelse" statement, as we aren't giving the parameters. @MeSophie can you add in yes and no into the ifelse statement for the end_season_filled:

end_season_filled <- instat_calculation$new(type="summary", function_exp="ifelse(n() > 0, yes = ifelse(dplyr::first(is.na(wb)), NA, dplyr::first(x=doy_366)), no = 121)", result_name="end_season_filled", calculated_from=list("dodoma"="doy_366"), save=2)

Everything else there is looking great! As I say, I'll get to the other issue shortly (Monday?) but this one caught my eye in the meantime as something to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Small (or large) change in the start and end of the rains dialogs
3 participants