Skip to content

Commit

Permalink
Merge pull request #64 from hubverse-org/ak/v4-output-type-ids/63
Browse files Browse the repository at this point in the history
Support v4 output type specification when creating schema
  • Loading branch information
annakrystalli authored Nov 11, 2024
2 parents 4e11a26 + 3e6a64e commit 01e08c3
Show file tree
Hide file tree
Showing 17 changed files with 422 additions and 168 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Description: A set of utility functions for accessing and working with
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Config/testthat/edition: 3
URL: https://github.com/hubverse-org/hubData
BugReports: https://github.com/hubverse-org/hubData/issues
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# hubData (development version)

* Support the determination of hub schema from v4 configuration files (#63). Also fixes bug in `create_hub_schema()` where `output_type_id` data type was being incorrectly auto-determined as `logical` when only point estimate output types where being collected by a hub. Now `character` data type is returned for the `output_type_id` for all schema versions in such situations when auto-determined.

# hubData 1.2.3

* Fix bug in `create_hub_schema()` where `output_type_id` data type was being incorrectly determined as `Date` instead of `character` (Reported in https://github.com/reichlab/variant-nowcast-hub/pull/87#issuecomment-2387372238).
Expand Down
12 changes: 11 additions & 1 deletion R/create_hub_schema.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
#' the `output_type_id` data type automatically from the `tasks.json`
#' config file as the simplest data type required to represent all output
#' type ID values across all output types in the hub.
#' When only point estimate output types (where `output_type_id`s are `NA`,) are
#' being collected by a hub, the `output_type_id` column is assigned a `character`
#' data type when auto-determined.
#' Other data type values can be used to override automatic determination.
#' Note that attempting to coerce `output_type_id` to a data type that is
#' not valid for the data (e.g. trying to coerce`"character"` values to
Expand Down Expand Up @@ -149,6 +152,8 @@ get_output_type_id_type <- function(config_tasks) {
# retired
config_tid <- hubUtils::get_config_tid(config_tasks = config_tasks)

# Get the values of all output type id values across all output types and rounds
# in the hub config
values <- purrr::map(
config_tasks[["rounds"]],
function(x) {
Expand Down Expand Up @@ -203,7 +208,12 @@ get_output_type_id_type <- function(config_tasks) {
# `get_data_type()` which checks characters for ISO date format.
# Should Dates be introduced as output type id values in the future,
# this will need to be revisited.
typeof(c(values, sample_values))
type <- typeof(c(values, sample_values))

if (type %in% c("NULL", "logical")) {
type <- "character"
}
type
}


Expand Down
15 changes: 15 additions & 0 deletions R/reexports-hubUtils.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#' @export
#' @importFrom hubUtils as_model_out_tbl
hubUtils::as_model_out_tbl

#' @export
#' @importFrom hubUtils validate_model_out_tbl
hubUtils::validate_model_out_tbl

#' @export
#' @importFrom hubUtils model_id_merge
hubUtils::model_id_merge

#' @export
#' @importFrom hubUtils model_id_split
hubUtils::model_id_split
39 changes: 0 additions & 39 deletions R/utils-hubUtils.R

This file was deleted.

23 changes: 0 additions & 23 deletions man/as_model_out_tbl.Rd

This file was deleted.

3 changes: 3 additions & 0 deletions man/coerce_to_hub_schema.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/connect_hub.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/create_hub_schema.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

39 changes: 0 additions & 39 deletions man/model_id_merge.Rd

This file was deleted.

39 changes: 0 additions & 39 deletions man/model_id_split.Rd

This file was deleted.

19 changes: 19 additions & 0 deletions man/reexports.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 0 additions & 26 deletions man/validate_model_out_tbl.Rd

This file was deleted.

35 changes: 35 additions & 0 deletions tests/testthat/test-create_hub_schema.R
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,14 @@ test_that("create_hub_schema works correctly", {
output_type_id = "character", value = "integer", model_id = "character"
)
)

# Validate that configs with only point estimate output types returns character (the default)
# not logical
config_tasks <- hubUtils::read_config_file(test_path("testdata/configs/v3-tasks-point.json"))
expect_equal(
create_hub_schema(config_tasks)$GetFieldByName("output_type_id")$ToString(),
"output_type_id: string"
)
})

test_that("create_hub_schema works with sample output types", {
Expand Down Expand Up @@ -135,3 +143,30 @@ test_that("create_hub_schema works with config output_type_id_datatype", {
"nowcast_date: date32[day]\ntarget_date: date32[day]\nlocation: string\nclade: string\noutput_type: string\noutput_type_id: string\nvalue: double\nmodel_id: string"
)
})

test_that("create_hub_schema works with v4 output_type_id configuration", {
config_tasks <- suppressWarnings(
hubUtils::read_config_file(test_path("testdata/configs/v4-tasks.json"))
)
expect_equal(
create_hub_schema(config_tasks)$ToString(),
"forecast_date: date32[day]\ntarget: string\nhorizon: int32\nlocation: string\ntarget_date: date32[day]\noutput_type: string\noutput_type_id: string\nvalue: double\nmodel_id: string"
)

# Validate that configs with only point estimate output types returns character (the default)
config_tasks <- suppressWarnings(
hubUtils::read_config_file(test_path("testdata/configs/v4-tasks-point.json"))
)
expect_equal(
create_hub_schema(config_tasks)$GetFieldByName("output_type_id")$ToString(),
"output_type_id: string"
)
# Ensure `output_type_id_datatype` arg works with v4 configs
expect_equal(
create_hub_schema(
config_tasks,
output_type_id_datatype = "double"
)$GetFieldByName("output_type_id")$ToString(),
"output_type_id: double"
)
})
Loading

0 comments on commit 01e08c3

Please sign in to comment.