Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2526 data raw data round 2 #2527

Closed

Conversation

jimrothstein
Copy link
Collaborator

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

  • Place Closes #<insert_issue_number> into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the tidyverse style guide. Run styler::style_file() to style R and Rmd files
  • [] Updated relevant unit tests or have written new unit tests, which should consider realistic data scenarios and edge cases, e.g. empty datasets, errors, boundary cases etc. - See Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you fully follow the deprecation guidance?
  • Review the Cheat Sheet. Make any required updates to it by editing the file inst/cheatsheet/admiral_cheatsheet.pptx and re-upload a PDF and a PNG version of it to the same folder. (The PNG version can be created by taking a screenshot of the PDF version.)
  • Update to all relevant roxygen headers and examples, including keywords and families. Refer to the categorization of functions to tag appropriate keyword/family.
  • [ x] Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md under the header # admiral (development version) if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers). A Developer Notes section is available in NEWS.md for tracking developer-facing issues.
  • Build admiral site pkgdown::build_site() and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

@jimrothstein jimrothstein marked this pull request as draft October 8, 2024 00:10
@jimrothstein jimrothstein self-assigned this Oct 8, 2024
@bms63 bms63 changed the title 2526 data raw data round 2 Closes #2526 data raw data round 2 Oct 8, 2024
@bms63
Copy link
Collaborator

bms63 commented Oct 8, 2024

Thanks @jimrothstein - did the scripts get re-run?

I am wondering if there is some way to could apply a timestamp in the attributes of the datasets that could help us keep track of when this dataset was last created?? This way at least GitHub would detect the change in the timestamp as I am hoping the data doesn't change in the future - or if it does we know why!?!

@bundfussr WDYT?

@bundfussr
Copy link
Collaborator

I am wondering if there is some way to could apply a timestamp in the attributes of the datasets that could help us keep track of when this dataset was last created?? This way at least GitHub would detect the change in the timestamp as I am hoping the data doesn't change in the future - or if it does we know why!?!

@bundfussr WDYT?

@bms63 , what do you want to achieve?

If we add an attribute with a timestamp, git considers the file as changed even if the dataset was just recreated without any change (except the timestamp attribute). Therefore I'm not sure if this is a good idea.

I think we should rerun the create_* scripts before each release to ensure that the datasets are up to date (with respect to the pharmaversesdtm data and the ad_ scripts).

@bms63
Copy link
Collaborator

bms63 commented Oct 8, 2024

@bms63 , what do you want to achieve?

god-like powers. :)

I want some way to track when the datasets were last run. if i see the datasets timestamp was run a couple of days before release then I will be happy. if i see the datasest timestamp was run two months before the release then I will be sad and know I need to run them.

maybe we should just run them with a custom script with GHA with each PR???? maybe this is where diffdf could be used to give us a report of any changes occurring in vignettes where these datasets are being used?

simplest solution is just to add a checkbox in my release checklist :) #2394 (comment)

@jimrothstein
Copy link
Collaborator Author

@bms63 @bundfussr

Will add something like:
(also fixing some code that uses cache, what I call CACHE_DIR)

> attributes(x)
$last_run
[1] "Tue Oct  8 11:59:00 2024"

Just fyi, I saw this older thread about add log file to Admiral
#2418

@bms63
Copy link
Collaborator

bms63 commented Oct 8, 2024

@jimrothstein

I realized I can see the timestamp in GitHub so the attributes update might be overkill
image

@bundfussr
Rein me in here! I'm just a bit perplexed how our datasets got out of line with our templates code and looking for a way to prevent this in the future.

Ran package check.
Admiral templates use  tools:R_user_dir() to store generated files;  update scripts here to use same function.
@bundfussr
Copy link
Collaborator

maybe we should just run them with a custom script with GHA with each PR???? maybe this is where diffdf could be used to give us a report of any changes occurring in vignettes where these datasets are being used?

simplest solution is just to add a checkbox in my release checklist :) #2394 (comment)

@bms63 , I wouldn't add an extra GHA for checking because the templates and pharmaversesdtm change rarely. Adding a checkbox to the release checklist seems the better option to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison shouldn't be removed and the code should be simplified:

#  Create dataset data/admiral_adlb.rda

# Run template script to create adlb
source("inst/templates/ad_adlb.R", echo = TRUE) # nolint

# Limit rows by selecting only these USUBJIDs
usubjids <-
  c(
    "01-701-1015",
    "01-701-1023",
    "01-701-1028",
    "01-701-1033",
    "01-701-1034",
    "01-701-1047",
    "01-701-1097",
    "01-705-1186",
    "01-705-1292",
    "01-705-1310",
    "01-708-1286"
  )

admiral_adlb <- filter(adlb, USUBJID %in% usubjids)

# Get previous dataset for comparison
adlb_old <- admiral::admiral_adlb

# Finally, save reduced dataset
usethis::use_data(admiral_adlb, overwrite = TRUE)

# Compare with previous version
diffdf::diffdf(
  base = adlb_old,
  compare = admiral_adlb,
  keys = c("USUBJID", "PARAMCD", "AVISIT", "ADT")
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jimrothstein did this get updated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jimrothstein did this get updated?

Let me double-check everything ... this weekend.

data-raw/create_admiral_adsl.R Outdated Show resolved Hide resolved
…uestion: is diffdf correctly comparing old & new datasets? (load_all() re-reads datasets)
@@ -7,7 +7,7 @@ library(diffdf) # nolint

# To clarify directories (can be removed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code from line 5 to 19 should be removed (also in create_admiral_adsl.R).

@bms63 bms63 marked this pull request as ready for review October 17, 2024 21:30
@jimrothstein
Copy link
Collaborator Author

jimrothstein commented Oct 22, 2024

Created separate branch for this issue/PR.
https://github.com/pharmaverse/admiral/tree/2526-general-issues-data-rawdata-round-2

@bms63 bms63 closed this Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants