Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error or errors in Prepare > Numeric > Regular Sequence #7199

Closed
rdstern opened this issue Feb 4, 2022 · 28 comments · Fixed by #7231
Closed

Fix error or errors in Prepare > Numeric > Regular Sequence #7199

rdstern opened this issue Feb 4, 2022 · 28 comments · Fixed by #7231
Assignees
Milestone

Comments

@rdstern
Copy link
Collaborator

rdstern commented Feb 4, 2022

I first found this nasty bug with an odd data set, but I can now generate it quite easily. I am suggesting @Patowhiz to look reasonably generally at the code.
This is straight after starting R-Instat. I open the standard survey data and use the Regular Sequence dialogue. I put 40 as shown below, so the sequence needs to be truncated.

image

Damn - it works. On other occasions I have had an error that the resulting column must be divisible by the worksheet length.

No matter. I return to the dialogue and repeat - nothing changed, except R-Instat has made the name regular1. Now I get this nasty error:

image

I press Continue. Nothing has been generated into the output window.

Now when I return to the dialogue it says there is an invalid character in the name.
image

I have experimented a bit more. If I use the dialogue in an ordinary way, but put my own invalid names, then it traps this - which is good. But I then tried with 36 replaced by 15, so the sequence has to be extended - and got the nasty error again.

Tried again with multiple columns generated of right length. All ok. Then made 36 into 18 - so still a multiple, and got the nasty error immediately.

@shadrackkibet
Copy link
Collaborator

Assigning this @derekagorhom. He recently worked on this dialog, Derick could you first try to replicate the bug? then investigate the source of the bug - which line of code caused this? I would like you to report on your findings here.

@derekagorhom
Copy link
Contributor

Assigning this @derekagorhom. He recently worked on this dialog, Derick could you first try to replicate the bug? then investigate the source of the bug - which line of code caused this? I would like you to report on your findings here.

Thank you @shadrackkibet , i will look into it

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

@shadrackkibet thanks for assigning this to @derekagorhom. @derekagorhom I had already looked at this, I hope my findings below will save your time in fixing the bug.

@rdstern I agree that the 'Ok' button should not be enabled when the 'To' input is greater than the data frame length. That's a
check that should be added.

@rdstern I looked into your error, the error is thrown at the clsRScript of the Rscript library probably written by @lloyddewit.
The error is caused by an invalid R script when the dialog is reopened. This is due to the each parameter value being removed.
(@lloyddewit I hope the system exception error thrown here is intentional, of course the script passed by the dialog to the object is not valid).

Script produced when you reopen the dialog.

# Code generated by the dialog, Regular Sequence
regular1 <- rep(x=seq(from=1, to=40, by=1), each=, length.out=36)
data_book$add_columns_to_data(data_name="survey", col_name="regular1", col_data=regular1, before=FALSE)

Notice the each R parameter that lacks the value. @lloyddewit does that make the R script invalid from the clsRScript class perspective?

Error thrown
Rscript_error

@derekagorhom fixing the each R parameter value, which is passed by ucrNudRepeatValues control, should fix the bug.

@lloyddewit
Copy link
Contributor

@Patowhiz Thanks for the analysis above.
RScript is only intended to work with valid R script. If invalid R is passed to RScript, then the behaviour is undefined. RScript does not guarantee to detect all invalid R, but if does detect something invalid then it throws an exception.
I think this is the correct behaviour.

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

@lloyddewit thanks for the response. I agree that's the correct behavior.

More out of scope question below

For detected invalid R like in this case, should it have given a hint about the each parameter value missing? As a developer using the library, I would expect the class to give me a hint about the invalid R and possibly point out the exact invalid token I guess that may not have been a priority to implement, if so the undefined behavior seems fine, the comment clearly explained it.

@derekagorhom
Copy link
Contributor

@shadrackkibet thanks for assigning this to @derekagorhom. @derekagorhom I had already looked at this, I hope my findings below will save your time in fixing the bug.

@rdstern I agree that the 'Ok' button should not be enabled when the 'To' input is greater than the data frame length. That's a check that should be added.

@rdstern I looked into your error, the error is thrown at the clsRScript of the Rscript library probably written by @lloyddewit. The error is caused by an invalid R script when the dialog is reopened. This is due to the each parameter value being removed. (@lloyddewit I hope the system exception error thrown here is intentional, of course the script passed by the dialog to the object is not valid).

Script produced when you reopen the dialog.

# Code generated by the dialog, Regular Sequence
regular1 <- rep(x=seq(from=1, to=40, by=1), each=, length.out=36)
data_book$add_columns_to_data(data_name="survey", col_name="regular1", col_data=regular1, before=FALSE)

Notice the each R parameter that lacks the value. @lloyddewit does that make the R script invalid from the clsRScript class perspective?

Error thrown Rscript_error

@derekagorhom fixing the each R parameter value, which is passed by ucrNudRepeatValues control, should fix the bug.

thank you @Patowhiz , i also realized the problem was from the Repeat Values.
After performing the first regular sequence, i changed the repeat values and everything seem to be working fine. i was able to do
the same task multiple times.

@derekagorhom
Copy link
Contributor

To fix the bug i changed the original code to
ucrNudRepeatValues.SetParameter(New RParameter(" ", 1))
and it is working fine now.

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

To fix the bug i changed the original code to ucrNudRepeatValues.SetParameter(New RParameter(" ", 1)) and it is working fine now.

@derekagorhom that's interesting. I wonder why. And did the dialog produce the correct R script? I presume the each parameter name needs to be shown as part of the script.

@derekagorhom
Copy link
Contributor

To fix the bug i changed the original code to ucrNudRepeatValues.SetParameter(New RParameter(" ", 1)) and it is working fine now.

@derekagorhom that's interesting. I wonder why. And did the dialog produce the correct R script? I presume the each parameter name needs to be shown as part of the script.

sorry @Patowhiz can you give your skype ID so that we can have a meeting

@rdstern
Copy link
Collaborator Author

rdstern commented Feb 8, 2022

@derekagorhom sounds promising. Are you going to do a pull request>

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

@derekagorhom probably, I would guess that if you change the each parameter value, the script produced should be invalid. You now have a control that is writing a parameter value without a parameter name to the general script, in principle the overall script should be invalid.

@derekagorhom
Copy link
Contributor

@derekagorhom probably, I would guess that if you change the each parameter value, the script produced should be invalid. You now have a control that is writing a parameter value without a parameter name to the general script, in principle the overall script should be invalid.

yes i just realized it, i did not change the repeat value so it seem to work fine. thanks for the correction

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

@derekagorhom I can email you my skype id. thanks.

@derekagorhom
Copy link
Contributor

@derekagorhom I can email you my skype id. thanks.

[email protected]

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 8, 2022

@rdstern after discussing with @derekagorhom we found what was causing the invalid bug, it is line 182 of dlgRegularSequence;
ucrNudRepeatValues.AddAdditionalCodeParameterPair(clsSeqDateFunction, ucrNudRepeatValues.GetParameter(), iAdditionalPairNo:=1)

param_error

As shown above, the parameter pair is technically the same object(by reference). This gives the unexpected behavior that leads to the parameter value lacking in the final script produced. (@lloyddewit @dannyparsons I did a quick find by reference and found some other dialogs have similar code, Based on my understanding of the base code, I would expect the parameter passed to be a clone of itself instead of a reference of itself?).

@rdstern a quick fix would be to either passing a clone of the parameter or creating a new parameter object, see code below;

Quick fix
ucrNudRepeatValues.AddAdditionalCodeParameterPair(clsSeqDateFunction, ucrNudRepeatValues.GetParameter().Clone(), iAdditionalPairNo:=1)
or
ucrNudRepeatValues.AddAdditionalCodeParameterPair(clsSeqDateFunction, New RParameter("each", 1), iAdditionalPairNo:=1)
However after exploring the dialog further, we found out the following;
Currently the dialog produces these commands;

For numeric sequence

regular <- rep(x=seq(from=1, to=36, by=1), each=3, length.out=36)
data_book$add_columns_to_data(data_name="survey", col_name="regular", col_data=regular, before=FALSE)
rm(regular)

For date sequence

regular <- rep(x=seq.Date(from=as.Date(x="2018/1/1"), each=1, to=as.Date(x="2018/12/31"), by="1 days"), each=1, length.out=36)
data_book$add_columns_to_data(data_name="survey", col_name="regular", col_data=regular, before=FALSE)
rm(regular)

@rdstern notice the each parameter inside the seq.Date. That parameter is from the repeat values control. We looked into seq.Date R command and found it didn't have the each parameter. We wonder why it's needed in this dialog? If not needed then the fix would be to remove the code that adds it in the dialog.

@rdstern looking at the seq.Date much further, we realised it's no longer different from seq command. In fact the 2 functions accepts the from and to as dates. seq.Date seems to be deprecated when you look at the documentation, we wonder why its being used in this dialog?
The 2 commands below will have the same output, so should we just simplify the dialog and use seq function even for dates?;

regular1 <- rep(x=seq.Date(from=as.Date(x="2018/1/1"), to=as.Date(x="2018/12/31"), by="1 days"), each=1, length.out=36)
regular2 <- rep(x=seq(from=as.Date(x="2018/1/1"), to=as.Date(x="2018/12/31"), by="1 days"), each=1, length.out=36)

@rdstern your response will determine how @derekagorhom will fix the bug and improve the dialog. Thanks.
That's too complicated for me, so I need to include @dannyparsons to answer. Thanks for looking in this detailed way. Just what I hoped for.

@dannyparsons
Copy link
Contributor

@Patowhiz @derekagorhom Thanks for looking into this.

  1. Parameters being shared by controls.
    .GetParameter() is intended to be used if two controls need to refer to the same parameter (by reference). So this is in the intended behaviour. If not, you should use New RParameter("each", 1). This is safer, so unless it's certain that the parameters should always be identical, use New RParameter instead.

  2. each parameter
    each is a parameter of rep not of seq. In your commands it is correctly in rep and works as intended from my checking.

  3. seq.Date and seq
    Short answer: We probably should switch to seq for the Date option too.
    Long answer: But not for the reasons you mentioned.
    seq.Date isn't depreciated. seq.Date is the implementation of the seq function for Dates. In R terms, seq.Date is the method for seq for objects with class Date. And in R if you call seq with a Date object it dispatches to the seq.Date method, so seq and seq.Date does the same thing if from is a Date object. This is how R does OO-Programming which is very different to most other languages. If you really want to know the details you can read all about it here! https://adv-r.hadley.nz/s3.html
    It's normally advised to use the generic method and let R determine the correct method to dispatch to. So to follow standard practice we probably should use seq for the Date option too, but it makes no difference to what function is actually called.

@Patowhiz
Copy link
Contributor

@dannyparsons thanks for the response and the interesting R article (it's about time I read R in detail).

@derekagorhom please fix the bug based on @dannyparsons response.

@derekagorhom also note that, you will fix the R commands produced (the each parameter is only for rep) and also simply used the seq function for dates.

@lloyddewit
Copy link
Contributor

@lloyddewit thanks for the response. I agree that's the correct behavior.

More out of scope question below

For detected invalid R like in this case, should it have given a hint about the each parameter value missing? As a developer using the library, I would expect the class to give me a hint about the invalid R and possibly point out the exact invalid token I guess that may not have been a priority to implement, if so the undefined behavior seems fine, the comment clearly explained it.

I created issue #6 in the RScript repo.

@Patowhiz
Copy link
Contributor

@rdstern I found another inconsistency when I looked at the dialog further.

I loaded the dialog for the first time and clicked Ok, it produces the following command;

regular <- seq(from=1, to=36, by=1)
data_book$add_columns_to_data(data_name="survey", col_name="regular", col_data=regular, before=FALSE)

Then I returned to the dialog and just changed the repeat values from say 1 to 2, it produces the following command and threw the shown error. Also the preview was not shown;

rep(x=data_book$get_columns_from_data(data_name="survey", col_names="regular1"), each=2)

regular_seq_second

@derekagorhom @rdstern Could you confirm that you get the same error?
These errors are somehow related and can be fixed by PR #7217 .
@rdstern I presume that at any point, the dialog should have the rep as it's base function? I'm happy we have a skype call to explain what I mean, would be useful in helping us understand why there is a deliberate intention in the code to produce the different commands as shown above.

@derekagorhom
Copy link
Contributor

@rdstern I found another inconsistency when I looked at the dialog further.

I loaded the dialog for the first time and clicked Ok, it produces the following command;

regular <- seq(from=1, to=36, by=1)
data_book$add_columns_to_data(data_name="survey", col_name="regular", col_data=regular, before=FALSE)

Then I returned to the dialog and just changed the repeat values from say 1 to 2, it produces the following command and threw the shown error. Also the preview was not shown;

rep(x=data_book$get_columns_from_data(data_name="survey", col_names="regular1"), each=2)

regular_seq_second

@derekagorhom @rdstern Could you confirm that you get the same error? These errors are somehow related and can be fixed by PR #7217 . @rdstern I presume that at any point, the dialog should have the rep as it's base function? I'm happy we have a skype call to explain what I mean, would be useful in helping us understand why there is a deliberate intention in the code to produce the different commands as shown above.

Yes @Patowhiz i am encountering the same error you have, even when i undo the commented code in line 140 it still gives me that error.

@derekagorhom
Copy link
Contributor

derekagorhom commented Feb 11, 2022

i did alittle digging and found out where the inconsistency is coming from, it happens to be in data_object_R6.R
on line 669 to 673

#A bug in sjPlot requires removing labels when a factor column already has labels, using remove_labels for this if needed.
DataSheet$set("public", "get_columns_from_data", function(col_names, force_as_data_frame = FALSE, use_current_filter = TRUE, remove_labels = FALSE, drop_unused_filter_levels = FALSE) {
  if(missing(col_names)) stop("no col_names to return")
  #if(!all(col_names %in% self$get_column_names())) stop("Not all column names were found in data")
  if(!all(col_names %in% names(private$data))) stop("Not all column names were found in data")
  

That's a good spot. I am copying @dannyparsons to consider what we should do for the best here. Should we have our own copy of sjplot?

@dannyparsons
Copy link
Contributor

@Patowhiz This looks like a more serious issue with the dialog. If the dialog is producing this code then something is quite wrong:

rep(x=data_book$get_columns_from_data(data_name="survey", col_names="regular1"), each=2)

This isn't a problem with the data_book code. The dialog should not be producing something like this.

It seems that the dialog thinks that the column is assigned, but the input to rep should be the expression from seq. This looks like the assign to is not being reset after the dialog is run?

Can you have a look at this at the dialog level?

@Patowhiz
Copy link
Contributor

@dannyparsons yes it's at the dialog level. I just wanted to know from @rdstern if the different R commands produced was intentional, the code had suggested it was intentional From his and your response, @derekagorhom and I can now fix both bugs and refactor the code responsible for this.

@Patowhiz
Copy link
Contributor

@rdstern in the sequencer dialog, I noticed that when the 'From' is greater than the 'To' then the by value in the R command is changed to a negative value. However, the control still shows a positive value, should this change be shown to the user even before click Ok, as a user I would expect to see the negative value, right? Going by current logic, we now have an Rscript that has values different from what are being displayed to the user in dialog.

@dannyparsons
Copy link
Contributor

No that was deliberate. The user doesn't need to worry about the sign, they just specify the step and the dialog works out if it should be positive or negative.

Going by current logic, we now have an Rscript that has values different from what are being displayed to the user in dialog.

This is exactly the reason why we have R-Instat! So that the user interface can be simple even if the R code is complex. We choose the options which are logical to the user and then translate this into R code. If this doesn't match the arguments of the R function, then that's the job of the dialog to do that translation. If we only mirrored the arguments in the R code we would end up with very complicated and hard to use dialogs in some cases.

@Patowhiz
Copy link
Contributor

Patowhiz commented Feb 14, 2022

@dannyparsons thanks for your clarification.
@rdstern you had a different view from @dannyparsons above comment from our last online call. PR #7231 has those views. Let me know your new thoughts too once you have a look at it. Thanks.

@dannyparsons
Copy link
Contributor

I don't think it's a big difference, but to me "In steps of" is always a positive number, and would make most sense from a user perspective. It's sensible to use negative numbers for the by parameter as that makes the R code efficient. But I don't want to require users to know that they need to enter a negative number in this field, or change things automatically. I think keeping it positive simplifies usage.

In any case, I don't think it will be too confusing as the preview shows what's going to be in the output.

@rdstern
Copy link
Collaborator Author

rdstern commented Feb 14, 2022

@Patowhiz I am happy to accept @dannyparsons arguments. And, in general, I am really happy with his inputs to R-Instat. We are getting into a very good place with the software!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants