Closes #2526 data raw data round 2 #2527

jimrothstein · 2024-10-08T00:09:57Z

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

…used

bms63 · 2024-10-08T12:44:31Z

Thanks @jimrothstein - did the scripts get re-run?

I am wondering if there is some way to could apply a timestamp in the attributes of the datasets that could help us keep track of when this dataset was last created?? This way at least GitHub would detect the change in the timestamp as I am hoping the data doesn't change in the future - or if it does we know why!?!

@bundfussr WDYT?

bundfussr · 2024-10-08T16:18:54Z

I am wondering if there is some way to could apply a timestamp in the attributes of the datasets that could help us keep track of when this dataset was last created?? This way at least GitHub would detect the change in the timestamp as I am hoping the data doesn't change in the future - or if it does we know why!?!

@bundfussr WDYT?

@bms63 , what do you want to achieve?

If we add an attribute with a timestamp, git considers the file as changed even if the dataset was just recreated without any change (except the timestamp attribute). Therefore I'm not sure if this is a good idea.

I think we should rerun the create_* scripts before each release to ensure that the datasets are up to date (with respect to the pharmaversesdtm data and the ad_ scripts).

bms63 · 2024-10-08T17:15:20Z

@bms63 , what do you want to achieve?

god-like powers. :)

I want some way to track when the datasets were last run. if i see the datasets timestamp was run a couple of days before release then I will be happy. if i see the datasest timestamp was run two months before the release then I will be sad and know I need to run them.

maybe we should just run them with a custom script with GHA with each PR???? maybe this is where diffdf could be used to give us a report of any changes occurring in vignettes where these datasets are being used?

simplest solution is just to add a checkbox in my release checklist :) #2394 (comment)

jimrothstein · 2024-10-08T19:06:26Z

@bms63 @bundfussr

Will add something like:
(also fixing some code that uses cache, what I call CACHE_DIR)

> attributes(x)
$last_run
[1] "Tue Oct  8 11:59:00 2024"

Just fyi, I saw this older thread about add log file to Admiral
#2418

bms63 · 2024-10-08T19:57:08Z

@jimrothstein

I realized I can see the timestamp in GitHub so the attributes update might be overkill

@bundfussr
Rein me in here! I'm just a bit perplexed how our datasets got out of line with our templates code and looking for a way to prevent this in the future.

Ran package check. Admiral templates use tools:R_user_dir() to store generated files; update scripts here to use same function.

bundfussr · 2024-10-09T10:03:33Z

maybe we should just run them with a custom script with GHA with each PR???? maybe this is where diffdf could be used to give us a report of any changes occurring in vignettes where these datasets are being used?

simplest solution is just to add a checkbox in my release checklist :) #2394 (comment)

@bms63 , I wouldn't add an extra GHA for checking because the templates and pharmaversesdtm change rarely. Adding a checkbox to the release checklist seems the better option to me.

bundfussr · 2024-10-08T16:38:28Z

data-raw/create_admiral_adlb.R

The comparison shouldn't be removed and the code should be simplified:

# Create dataset data/admiral_adlb.rda # Run template script to create adlb source("inst/templates/ad_adlb.R", echo = TRUE) # nolint # Limit rows by selecting only these USUBJIDs usubjids <- c( "01-701-1015", "01-701-1023", "01-701-1028", "01-701-1033", "01-701-1034", "01-701-1047", "01-701-1097", "01-705-1186", "01-705-1292", "01-705-1310", "01-708-1286" ) admiral_adlb <- filter(adlb, USUBJID %in% usubjids) # Get previous dataset for comparison adlb_old <- admiral::admiral_adlb # Finally, save reduced dataset usethis::use_data(admiral_adlb, overwrite = TRUE) # Compare with previous version diffdf::diffdf( base = adlb_old, compare = admiral_adlb, keys = c("USUBJID", "PARAMCD", "AVISIT", "ADT") )

Hey @jimrothstein did this get updated?

Hey @jimrothstein did this get updated?

Let me double-check everything ... this weekend.

data-raw/create_admiral_adsl.R

…uestion: is diffdf correctly comparing old & new datasets? (load_all() re-reads datasets)

bundfussr · 2024-10-09T15:11:45Z

data-raw/create_admiral_adlb.R

@@ -7,7 +7,7 @@ library(diffdf) # nolint

 # To clarify directories (can be removed)


Code from line 5 to 19 should be removed (also in create_admiral_adsl.R).

jimrothstein · 2024-10-22T23:49:12Z

Created separate branch for this issue/PR.
https://github.com/pharmaverse/admiral/tree/2526-general-issues-data-rawdata-round-2

jimrothstein added 3 commits October 7, 2024 16:24

2526_data_raw_round_2 Remove data-backup/ folder

4f2e686

2526_data_raw_data_round_2 Remove data-raw/admiral-adlb.R script not …

91c5934

…used

CAUTION: cache location has changed; This code has not been well tested.

bba42c8

jimrothstein requested a review from bundfussr October 8, 2024 00:10

jimrothstein marked this pull request as draft October 8, 2024 00:10

jimrothstein self-assigned this Oct 8, 2024

jimrothstein requested a review from bms63 October 8, 2024 00:11

bms63 changed the title ~~2526 data raw data round 2~~ Closes #2526 data raw data round 2 Oct 8, 2024

Cleaned create_admiral_*.R files

1fcb431

Ran package check. Admiral templates use tools:R_user_dir() to store generated files; update scripts here to use same function.

bundfussr requested changes Oct 9, 2024

View reviewed changes

Simplify create_admiral_*.R scripts (per @bundfussr); lint, styler. Q…

a0f3d1c

…uestion: is diffdf correctly comparing old & new datasets? (load_all() re-reads datasets)

bundfussr reviewed Oct 9, 2024

View reviewed changes

Remove unnecessary code in create_admiral_*.R

f5d538a

bms63 marked this pull request as ready for review October 17, 2024 21:30

bms63 closed this Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #2526 data raw data round 2 #2527

Closes #2526 data raw data round 2 #2527

jimrothstein commented Oct 8, 2024

bms63 commented Oct 8, 2024

bundfussr commented Oct 8, 2024

bms63 commented Oct 8, 2024

jimrothstein commented Oct 8, 2024

bms63 commented Oct 8, 2024

bundfussr commented Oct 9, 2024

bundfussr Oct 8, 2024

bms63 Oct 17, 2024

jimrothstein Oct 17, 2024

bundfussr Oct 9, 2024

jimrothstein commented Oct 22, 2024 •

edited

Loading

		@@ -7,7 +7,7 @@ library(diffdf) # nolint

		# To clarify directories (can be removed)

Closes #2526 data raw data round 2 #2527

Closes #2526 data raw data round 2 #2527

Conversation

jimrothstein commented Oct 8, 2024

bms63 commented Oct 8, 2024

bundfussr commented Oct 8, 2024

bms63 commented Oct 8, 2024

jimrothstein commented Oct 8, 2024

bms63 commented Oct 8, 2024

bundfussr commented Oct 9, 2024

bundfussr Oct 8, 2024

Choose a reason for hiding this comment

bms63 Oct 17, 2024

Choose a reason for hiding this comment

jimrothstein Oct 17, 2024

Choose a reason for hiding this comment

bundfussr Oct 9, 2024

Choose a reason for hiding this comment

jimrothstein commented Oct 22, 2024 • edited Loading

jimrothstein commented Oct 22, 2024 •

edited

Loading