-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Execution Settings R6 Class #196
base: develop
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
|
||
# ExecutionSettings ---- | ||
|
||
#' @description | ||
#' An R6 class to define an ExecutionSettings object | ||
#' | ||
#' @export | ||
ExecutionSettings <- R6::R6Class( | ||
classname = "ExecutionSettings", | ||
public = list( | ||
initialize = function(connectionDetails = NULL, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just FYI, the from of initialize here is different than what we discussed about R6 classes in the monthly meeting:
So based on the above I was imagining the execution settings to contain user-input data about schemas, runtime options, etc, and then to execute a study you have the analysis, execution settings and connection details provided separately (especially because sometimes you pass along an actual open connection (for whatever reason) or a connection details and the code decides which to use). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the context of this class is useful for a different reason that serialization or de-serialization. This would allow us to standardize across packages how we encapsulate inputs. At the moment these are fairly inconsistent. I would say we explicitly never want to serialize connectionDetails as this adds the potential to leak security details. So the construction should be:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's a fair point, but this PR seems to set the basis for one example of implementing the other R6 classes. I was looking at the R6 classes as a means to define the analytic input API (at least at a data structure level, not a functional level). So there will be some that are pure data oriented (to define structure and handle serialization) and maybe there will be some that have behavior (for example, CI has a query builder R6 class I believe). Maybe we need to see an example of both in this PR so we see where lines are drawn between behavior classes and structure classes. |
||
connection = NULL, | ||
cdmDatabaseSchema = NULL, | ||
workDatabaseSchema = NULL, | ||
tempEmulationSchema = NULL, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we have it as an argument? snowflake requires it for everything selfishly hope to keep it here 😃 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, keep it as an argument but the current convention across other packages is to use
So I'm proposing that this behvaiour is consistent. |
||
targetCohortTable = NULL, | ||
cdmSourceName = NULL) { | ||
stopifnot(is.null(connectionDetails) || is.null(connection)) | ||
.setClass(private = private, key = "connectionDetails", value = connectionDetails, class = "ConnectionDetails") | ||
.setClass(private = private, key = ".connection", value = connection, | ||
class = "DatabaseConnectorJdbcConnection", nullable = TRUE) | ||
.setString(private = private, key = ".cdmDatabaseSchema", value = cdmDatabaseSchema) | ||
.setString(private = private, key = ".workDatabaseSchema", value = workDatabaseSchema) | ||
.setString(private = private, key = ".tempEmulationSchema", value = tempEmulationSchema) | ||
.setString(private = private, key = ".targetCohortTable", value = targetCohortTable) | ||
.setString(private = private, key = ".cdmSourceName", value = cdmSourceName) | ||
}, | ||
|
||
getDbms = function() { | ||
dbms <- private$connectionDetails$dbms | ||
return(dbms) | ||
}, | ||
|
||
# connect to database | ||
connect = function() { | ||
|
||
# check if private$connection is NULL | ||
conObj <- private$.connection | ||
if (is.null(conObj)) { | ||
private$.connection <- DatabaseConnector::connect(private$connectionDetails) | ||
} else{ | ||
cli::cat_bullet( | ||
"Connection object already open", | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
} | ||
}, | ||
|
||
# disconnect to database | ||
disconnect = function() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps a "withConnection" withr style context can be implemented as a method see here: |
||
|
||
# check if private$connection is NULL | ||
conObj <- private$.connection | ||
if (class(conObj) == "DatabaseConnectorJdbcConnection") { | ||
# disconnect connection | ||
DatabaseConnector::disconnect(private$.connection) | ||
private$.connection <- NULL | ||
} | ||
|
||
cli::cat_bullet( | ||
"Connection object has been disconected", | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
invisible(conObj) | ||
}, | ||
|
||
#TODO make this more rigorous | ||
# add warning if no connection available | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. include a DatabaseConnector::dbIsValid call? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense. I left this open as I need to think of all checks on the database connection. |
||
getConnection = function() { | ||
conObj <- private$.connection | ||
return(conObj) | ||
} | ||
|
||
), | ||
|
||
private = list( | ||
connectionDetails = NULL, | ||
.connection = NULL, | ||
.cdmDatabaseSchema = NULL, | ||
.workDatabaseSchema = NULL, | ||
.tempEmulationSchema = NULL, | ||
.targetCohortTable = NULL, | ||
.cdmSourceName = NULL | ||
), | ||
|
||
active = list( | ||
|
||
cdmDatabaseSchema = function(value) { | ||
# return the value if nothing added | ||
if(missing(value)) { | ||
cds <- private$.cdmDatabaseSchema | ||
return(cds) | ||
} | ||
# replace the cdmDatabaseSchema | ||
.setString(private = private, key = ".cdmDatabaseSchema", value = value) | ||
cli::cat_bullet( | ||
glue::glue("Replaced {crayon::cyan('cdmDatabaseSchema')} with {crayon::green(value)}"), | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
}, | ||
|
||
workDatabaseSchema = function(value) { | ||
# return the value if nothing added | ||
if(missing(value)) { | ||
cds <- private$.workDatabaseSchema | ||
return(cds) | ||
} | ||
# replace the workDatabaseSchema | ||
.setString(private = private, key = ".workDatabaseSchema", value = value) | ||
cli::cat_bullet( | ||
glue::glue("Replaced {crayon::cyan('workDatabaseSchema')} with {crayon::green(value)}"), | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
}, | ||
|
||
|
||
tempEmulationSchema = function(value) { | ||
# return the value if nothing added | ||
if(missing(value)) { | ||
tes <- private$.tempEmulationSchema | ||
return(tes) | ||
} | ||
# replace the tempEmulationSchema | ||
.setString(private = private, key = ".tempEmulationSchema", value = value) | ||
cli::cat_bullet( | ||
glue::glue("Replaced {crayon::cyan('tempEmulationSchema')} with {crayon::green(value)}"), | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
}, | ||
|
||
targetCohortTable = function(value) { | ||
# return the value if nothing added | ||
if(missing(value)) { | ||
tct <- private$.targetCohortTable | ||
return(tct) | ||
} | ||
# replace the targetCohortTable | ||
.setString(private = private, key = ".targetCohortTable", value = value) | ||
cli::cat_bullet( | ||
glue::glue("Replaced {crayon::cyan('targetCohortTable')} with {crayon::green(value)}"), | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
}, | ||
|
||
cdmSourceName = function(value) { | ||
# return the value if nothing added | ||
if(missing(value)) { | ||
csn <- private$.cdmSourceName | ||
return(csn) | ||
} | ||
# replace the cdmSourceName | ||
.setString(private = private, key = ".cdmSourceName", value = value) | ||
cli::cat_bullet( | ||
glue::glue("Replaced {crayon::cyan('cdmSourceName')} with {crayon::green(value)}"), | ||
bullet = "info", | ||
bullet_col = "blue" | ||
) | ||
} | ||
|
||
) | ||
) | ||
|
||
|
||
#' @title | ||
#' Create an ExecutionSettings object and set its attributes | ||
#' | ||
#' @param connectionDetails A DatabaseConnector connectionDetails object (optional if connection is specified) | ||
#' @param connection A DatabaseConnector connection object (optional if connectionDetails is specified) | ||
#' @param cdmDatabaseSchema The schema of the OMOP CDM database | ||
#' @param workDatabaseSchema The schema to which results will be written | ||
#' @param tempEmulationSchema Some database platforms like Oracle and Snowflake do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created. | ||
#' @param targetCohortTable The name of the table where the target cohort(s) are stored | ||
#' @param cdmSourceName A human-readable name for the OMOP CDM source | ||
#' | ||
#' @return An ExecutionSettings object | ||
#' @export | ||
createExecutionSettings <- function(connectionDetails, | ||
connection = NULL, | ||
cdmDatabaseSchema, | ||
workDatabaseSchema, | ||
tempEmulationSchema, | ||
targetCohortTable, | ||
cdmSourceName) { | ||
executionSettings <- ExecutionSettings$new(connectionDetails = connectionDetails, | ||
connection = connection, | ||
cdmDatabaseSchema = cdmDatabaseSchema, | ||
workDatabaseSchema = workDatabaseSchema, | ||
tempEmulationSchema = tempEmulationSchema, | ||
targetCohortTable = targetCohortTable, | ||
cdmSourceName = cdmSourceName) | ||
return(executionSettings) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anthonysena I think this is a useful abstraction that can be used in almost all our packages and CohortGenerator seems an appropriate place to put it.
Some initial points that I think we need:
Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function
executeCohorts
.We will need to write a vignette
we should update all functions in this package to use this
We should look at the stratuegus execution context list to see if there is additional stuff we should replicate there
Perhaps we also move or replicate the cdm source meta data function from strategus here?
We should add a helper function to create an instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @azimov and thanks for this contribution @mdlavallee92!
Let's aim to add this as an additional way of encapsulating the settings as you suggested and then fully adopt it when we have the opportunity to break compatibility with v1.x.