Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about derived_task_ids property #193

Closed
annakrystalli opened this issue Oct 11, 2024 · 3 comments · Fixed by #217
Closed

Add documentation about derived_task_ids property #193

annakrystalli opened this issue Oct 11, 2024 · 3 comments · Fixed by #217
Labels
documentation Improvements or additions to documentation higher priority work to prioritize in the near future v4.0.0

Comments

@annakrystalli
Copy link
Member

When updating to v4.0.0 add short section about derived taks IDs

Could be a rephrasing of the NEWS entry in schema:

Introduction of optional derived_task_ids properties to enable hub administrators to define derived task IDs (i.e. task IDs whose values depend on the values of other task IDs). The higher level derived_task_ids property sets the property globally at the hub level but can be overriden by the round level derived_task_ids property. The property allows for primarily validation functionality to ignore such task IDs when appropriate which can significantly improve validation efficency (#96). For more information see hubValidations documentation on ignoring derived task IDs.

@annakrystalli annakrystalli added v4.0.0 documentation Improvements or additions to documentation higher priority work to prioritize in the near future labels Oct 11, 2024
@zkamvar
Copy link
Member

zkamvar commented Nov 13, 2024

I think this would actually help out with #208 because the language used to describe what task id variables are needed for generating the oracle output data is a bit dense:

  • The oracle output should include enough of the task id variables and
    columns with metadata about the outputs (output_type and
    output_type_id) to uniquely identify which oracle_values
    correspond to which predicted values. For example, this will
    typically include task id variables such as location and
    target_date (or target_end_date), since the oracle_value will be
    specific to the location and target_date.
  • Any task id variables that are not necessary to match observations
    with predictions can be omitted from the oracle output. For example,
    if target_date is included then reference_date and horizon
    variables can be omitted because the same observation will generally
    correspond to a particular target_date regardless of the forecast
    horizon. Similarly, in a scenario projection setting, the
    scenario_id can be omitted.

@zkamvar
Copy link
Member

zkamvar commented Nov 19, 2024

From my search of GitHub for derived_task_ids in validate_pr, it looks like we only have to notify @M-7th that the hub- and round-level derived_task_ids property will be available for schemas v4 so that you don't have to manually set those in the validation workflow.

@annakrystalli
Copy link
Member Author

We should also this in quick start and show an example of setting the derived_task_ids property in the config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation higher priority work to prioritize in the near future v4.0.0
Projects
Development

Successfully merging a pull request may close this issue.

2 participants