Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance TC-Gen to verify genesis probabilities from ATCF e-deck files. #1809

Closed
8 of 21 tasks
JohnHalleyGotway opened this issue May 24, 2021 · 11 comments · Fixed by #1967 or #1972
Closed
8 of 21 tasks

Enhance TC-Gen to verify genesis probabilities from ATCF e-deck files. #1809

JohnHalleyGotway opened this issue May 24, 2021 · 11 comments · Fixed by #1967 or #1972
Assignees
Labels
MET: Probability Verification priority: blocker Blocker requestor: DTC/T&E General DTC Testing and Evaluation work required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: new feature Make it do something new
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented May 24, 2021

Describe the New Feature

tc_gen_probabilistic_algorithm_v2.pdf

Please see the attached slides to illustrate 2 main changes that are required for TC-Genesis verification. This issue describes the first of those 2 enhancements. Enhance tc_gen to verify genesis probabilities from ATCF e-deck files. NHC identifies disturbances and issues 3 probability forecasts for how likely it is that disturbance will develop into a tropical storm. The probabilities cover 3 time windows, with 48, 120, or 168 hours.

@halperin-erau has provided some sample data containing these e-deck probabilities. However, as of May 2021, their format is still under development. In the existing and historical versions of these files, both the lat,lon location and valid timestamps are absent. If any of those columns are missing from the input data, tc_gen should print a warning message and ignore that input.

This task is to enhance tc_gen to parse those e-deck probabilities and use them to populate Nx2 probabilistic contingency tables. Create 1 table for each of the time windows (48, 120, and 168) but make that a user-configurable option. Write the resulting probabilistic output.

Be sure to subset output by basin, time window, and perhaps forecaster initials. Support the application of both the development and operational logic, but note that the operational logic will be used by NHC.

Acceptance Testing

List input data types and sources.
Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

The project for 7790901 originally ended in August, 2021. After the no-cost extension, the updated date is February 28th, 2022.

Funding Source

7790901

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added component: application code type: new feature Make it do something new requestor: DTC/T&E General DTC Testing and Evaluation work labels May 24, 2021
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone May 24, 2021
@JohnHalleyGotway
Copy link
Collaborator Author

On 5/24/21, Dan H, Tara, and John HG met to discuss these details. Dan provided the additional followup information below:

Thank you for the discussion today. I've attached some sample data that may be helpful as development continues.
The al*.dat files are TWO sample files. I have included at least one developing and one non-developing disturbance during the years when NHC issued 48 h, 48/120 h, and 48/120/168 h probabilities.
eal152020.dat is the e-deck that NHC wrote without any modification by me. Only the lines with "GN" in the 4th column are relevant to us. Note that the lat/lon information is blank and there is no information regarding forecast genesis time.
eal152020-model.dat is a modified version of eal152020.dat to illustrate a hypothetical probabilistic forecast from post-processed GFSO output. This is the format that TC-Gen would use.
Only the GN lines are included.
The storm number in the 2nd column was changed to an arbitrary number.
Lat/lon information are included in the 7th/8th columns.
Forecast genesis valid time is included in the 13th column.
Let me know if you have any questions about the data.

I looked in a few e-deck files and did not see any of the "GS" (genesis shape) entries. I'll confirm with NHC regarding whether they would like to verify the shape files.

edeck-two-sample-data.tar.txt

@jprestop jprestop added priority: blocker Blocker required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project and removed priority: high labels Jul 15, 2021
@JohnHalleyGotway
Copy link
Collaborator Author

@kathryn Newman I want to get going on MET #1809. Reading through the details, the first thing I want to figure out is WHERE I should do this work. It could be in tc_pairs or tc_gen.
tc_pairs already includes an -edeck command line option for verifying probability of RI and writes probabilistic vx output for that.
tc_gen already includes the genesis algorithm and is meant to handle genesis "stuff" but it does NOT currently include an -edeck command line argument.
So should we have 1 tool (tc_pairs) that processes all the -edeck data? Or split that across 2 tools (tc_pairs and tc_gen) depending on the contents of the -edeck data file? (edited)

@JohnHalleyGotway
Copy link
Collaborator Author

Held a project meeting on 10/28/21 and laid out plans:

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Nov 5, 2021

@halperin-erau question about the sample files you provided:

  • From eal152020-model.dat, I understand that the line below is a forecast that genesis will occur at (31, -78.5) on 8/31/2020 at 06Z. And there's a 20% chance this will actually occur within 120 hours of the issue time of 8/29/2020 at 18Z.
AL, 77, 2020082918, GN, GFSO, 120, 310N,  785W,  20,  120, JHT, genFcst, 2020083106, ,  0, 034,
  • From eal152020.dat, I understand that the line below is also a genesis forecast but the predicted genesis location and time are NOT included. And tc_gen should just ignore these lines.
AL, 15, 2020082918, GN, OFCL, 120,     ,      ,  20,  120, JLB, genFcst, , ,  0, 034, 

QUESTION 1:
Is it the location that really matters rather than the time? For example, if the location is given but no time, should we include it in the verification?
QUESTION 2:

  • From eal152020.dat, I'm wondering about these 2 lines:
AL, 15, 2020083018, GN, OFCL,   0,     ,      , 100,    0,    , invest, 2020083018, al902020,  1, 034, 
AL, 15, 2020083118, GN, OFCL,   0,     ,      , 100,    0,    , genesis, 2020083118, al152020,  1, 034,

These have GN in the 4-th column indicating genesis edeck info. But column 12 has "invest" and "genesis" instead of the "genFcst". I assume we want to only verify GN lines that have "genFcst" in the 12-th column.
@halperin-erau can you please confirm?

Checking the edeck documentation, I see the following details:

TC GENESIS PROBABILITY

ProbItem - time period, ie genesis during next xxx hours, 0 for genesis or dissipate event, 0 - 240 hrs,  4 char.
Initials - forecaster initials,  3 char.
GenOrDis - "invest", "genFcst", "genesis", "disFcst" or "dissipate"
DTG - Genesis or dissipated event Date-Time-Group, yyyymmddhhmm: 0000010100 through 9999123123,  12 char.
stormID - cyclone ID if the genesis developed into an invest area or cyclone ID of dissipated TC, e.g. al032014
min - minutes, associated with DTG in common fields (3rd field in record), 0 - 59 min
genesisNum - genesis number, if spawned from a genesis area (1-999)
undefined - TBD

@halperin-erau
Copy link

halperin-erau commented Nov 5, 2021 via email

JohnHalleyGotway added a commit that referenced this issue Nov 5, 2021
…mpile it. Define ATCF offsets specific to ATCF EDECK GN lines. Update the is_match() logic slightly by moving the checking for consistent valid times from the base class to the ProbRIRWInfo class. That way we can store all probabilities for the same genesis event (across multiple lead times) in the same object.
JohnHalleyGotway added a commit that referenced this issue Nov 5, 2021
@halperin-erau
Copy link

halperin-erau commented Nov 12, 2021 via email

JohnHalleyGotway added a commit that referenced this issue Nov 12, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 12, 2021
…hat the tc-gen application can modify their contents.
JohnHalleyGotway added a commit that referenced this issue Nov 12, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 12, 2021
…rifying genesis probabilities. Still need to add a probgen_mpr line type, a unit test, and documentaiton.
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Nov 13, 2021

Needed doc updates:

  • Added -edeck command line option.
  • New TC-Gen config options for prob_genesis_thresh and output_flag (pct, pstd, pjc, prc).
  • Added PROB_LEAD and PROB_VAL to the GENMPR line type.
  • Description of probgen vx.

JohnHalleyGotway added a commit that referenced this issue Nov 13, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 13, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 13, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 13, 2021
…ched pair objects. This is cleaner than the last kludgy solution.
This was linked to pull requests Nov 14, 2021
@JohnHalleyGotway JohnHalleyGotway removed a link to a pull request Nov 14, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 15, 2021
JohnHalleyGotway added a commit that referenced this issue Nov 15, 2021
@JohnHalleyGotway JohnHalleyGotway removed a link to a pull request Nov 16, 2021
12 tasks
@JohnHalleyGotway JohnHalleyGotway linked a pull request Nov 16, 2021 that will close this issue
12 tasks
@JohnHalleyGotway JohnHalleyGotway linked a pull request Nov 16, 2021 that will close this issue
12 tasks
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance tc_gen to verify genesis probabilities from ATCF e-deck files. Enhance TC-Gen to verify genesis probabilities from ATCF e-deck files. Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Probability Verification priority: blocker Blocker requestor: DTC/T&E General DTC Testing and Evaluation work required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: new feature Make it do something new
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants