Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #2395 TOTAL_DIR #2892

Merged
merged 8 commits into from
May 22, 2024
Merged

Feature #2395 TOTAL_DIR #2892

merged 8 commits into from
May 22, 2024

Conversation

JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway commented May 16, 2024

Expected Differences

  • Do these changes introduce new tools, command line arguments, or configuration file options? [No]

    If yes, please describe:

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [Yes]

    If yes, please describe:

This PR adds the TOTAL_DIR column to the VL1L2, VAL1L2, and VCNT line types. It adds this column BEFORE all the stats previously added for #2395. Since these new stats have not been included in a public release, adding this in the current "middle" of the line type is fine. It also updates Stat-Analysis to use those counts when aggregating the stats from those line types that begin with DIR_.

Pull Request Testing

  • Describe testing already performed for these changes:

The code for this feature branch is compiled as the met_test user in seneca:/d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-feature_2395_TOTAL_DIR.

  1. Tested aggregating VL1L2 lines:
bin/stat_analysis -lookin out/point_stat/point_stat_360000L_20070331_120000V_vl1l2.txt -vx_mask DTC165,DTC166 -interp_mthd NEAREST -job aggregate -line_type VL1L2 -dump_row dump_vl1l2.txt

Input:

TOTAL_DIR  DIR_ME  DIR_MAE    DIR_MSE
685 -1.75317 47.69438 4272.77589
2494 -4.9744  28.90333 1909.85557

Output:

TOTAL_DIR  DIR_ME  DIR_MAE    DIR_MSE
3179 -4.2803 32.95236 2419.00952

And these are the correct weighted averages.

  1. Tested printing summaries of these columns.
bin/stat_analysis -lookin out/point_stat/point_stat_360000L_20070331_120000V.stat -vx_mask DTC165,DTC166 -interp_mthd NEAREST -job summary -line_type VCNT -column TOTAL,TOTAL_DIR,DIR_ME

And it's parse the correct values from the correct columns.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

The code for this feature branch is compiled as the met_test user in seneca:/d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-feature_2395_TOTAL_DIR.

Please review code changes, doc updates, and test as you see fit.

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]

Updated the tables to describe the output line types.

  • Do these changes include sufficient testing updates? [Yes]

I added to new jobs to an existing Stat-Analysis config file to demonstrate the processing of VL1L2 lines.
This generates 2 new output files:

stat_analysis_ps/CONFIG_POINT_STAT_agg_vl1l2.stat
stat_analysis_ps/CONFIG_POINT_STAT_agg_stat_vl1l2_to_vcnt.stat
  • Will this PR result in changes to the MET test suite? [Yes]

    If yes, describe the new output and/or changes to the existing output:

Differences are flagged in all VCNT, VL1L2, and VAL1L2 output lines.

I downloaded and inspected the diff and log artifacts from the GHA run.

  1. Confirmed there are no more warnings about 0-vectors.
  2. Diffs exist in 82 output files:
  • 48 .stat files and 34 .txt files.
  • 79 Point-Stat, 12 Grid-Stat, and 1 Stat-Analysis

I ran the commands listed below to confirm that all these diffs are limited to the VCNT, VL1L2, and VAL1L2 line types:

rm -f diff_line_types
for OUTPUT in `find ./ -type f | grep OUTPUT`; do
  TRUTH=`echo $OUTPUT | sed 's/OUTPUT/TRUTH/g'`;
  diff -b $TRUTH $OUTPUT | egrep "V12.0.0" | awk '{print $25}' | sort -u >> diff_line_types
done
cat diff_line_types | sort -u

Here's the result:

VAL1L2
VCNT
VL1L2
  • Will this PR result in changes to existing METplus Use Cases? [Yes]

    If yes, create a new Update Truth METplus issue to describe them.

Differences are flagged in all VCNT, VL1L2, and VAL1L2 output lines.
See this dtcenter/METplus#2592 issue to update the truth data.

  • Do these changes introduce new SonarQube findings? [Yes]

    If yes, please describe:

The SonarQube quality gate flag failed in this run with 0 new bugs but 7 new code smells.

I did modify the code to address 4 of the 7 ones flagged (Global variables should be const. and Add the "static" storage specifier to this declaration).

But chose not to address the other 3 (Extract the assignment from this expression. and Refactor this function to reduce its Cognitive Complexity from XX to the 25 allowed.)

  • Please complete this pull request review by [Tues 5/21/24].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the version that will include these changes
    Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

… superceded by the new dcount and dacount VL1L2Info members to keep track of the number of valid wind direction vectors.
…types and update the header column tables.
…in the VL1L2, VAL1L2, and VCNT line types.
…d use the values to aggregate results when needed.
@JohnHalleyGotway JohnHalleyGotway added this to the MET 12.0.0 milestone May 16, 2024
@JohnHalleyGotway JohnHalleyGotway linked an issue May 16, 2024 that may be closed by this pull request
24 tasks
…' to satisfy the finding that 'Global variables should be const.' Should probably switch from 'char char *' to strings eventually. But for now, I'm just making up for some SonarQube technical debt.
Copy link
Collaborator

@georgemccabe georgemccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to add an additional line to the vector line types. I see that the SonarQube scan failed but this was addressed in the PR notes.

I see that the test run fails during the difference step, but the errors seem to imply that the test function failed.

COMPARING ref_config_lead_00/point_stat/AFWAv3.4_Noahv3.3/point_stat_AFWAv3.4_Noahv3.3_F000_WINDS_000000L_20110902_000000V.stat
file1: /data/output/met_test_truth/ref_config_lead_00/point_stat/AFWAv3.4_Noahv3.3/point_stat_AFWAv3.4_Noahv3.3_F000_WINDS_000000L_20110902_000000V.stat
file2: /data/output/met_test_output/ref_config_lead_00/point_stat/AFWAv3.4_Noahv3.3/point_stat_AFWAv3.4_Noahv3.3_F000_WINDS_000000L_20110902_000000V.stat
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 100 elements
In addition: Warning message:
In system(paste(strCmd, "2>&1"), intern = T) :
running command '/usr/bin/diff /data/output/met_test_truth/ref_config_lead_00/point_stat/AFWAv3.4_Noahv3.3/point_stat_AFWAv3.4_Noahv3.3_F000_WINDS_000000L_20110902_000000V.stat /data/output/met_test_output/ref_config_lead_00/point_stat/AFWAv3.4_Noahv3.3/point_stat_AFWAv3.4_Noahv3.3_F000_WINDS_000000L_20110902_000000V.stat 2>&1' had status 1
ERROR: compareStat() failed

Does this message indicate that there were differences in the output? If so, it may be good to improve the messaging so it is more clear that there are differences found instead implying that something failed in the diff logic.

…king 'DIR_', check 'DIR_ME', 'DIR_MAE', and 'DIR_MSE' to avoid an false positive match for the 'DIR_ERR' column which is computed from the vector partial sums rather than the individual direction differences.
@JohnHalleyGotway
Copy link
Collaborator Author

Thanks for taking a look @georgemccabe. I did just find/address a "false positive" match in the is_vector_dir_stat() function.

As for the diffs, yes, the rather brittle diff logic fails when the number of columns differ. Rather than enhancing the existing Rscript diff logic, I'd rather switch over to the Python-based one used by METplus soon.

@JohnHalleyGotway
Copy link
Collaborator Author

@bikegeek, FYI, I documented the METplus-Analysis impacts with dtcenter/METdataio#307 and dtcenter/METcalcpy#384.

Copy link
Collaborator

@georgemccabe georgemccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. I see differences in the test output that is expected. I expected some METplus use cases to produce differences after this is merged into develop. This will likely require an update to the truth data. An Update Truth METplus GitHub issue should be created to describe the expected differences.

@JohnHalleyGotway JohnHalleyGotway removed the request for review from bikegeek May 22, 2024 17:27
@JohnHalleyGotway JohnHalleyGotway merged commit 889f1b2 into develop May 22, 2024
36 of 38 checks passed
@JohnHalleyGotway JohnHalleyGotway linked an issue May 22, 2024 that may be closed by this pull request
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

Update Truth: For dtcenter/MET#2892 Add new wind direction verification statistics for RMSE, Bias, and MAE
2 participants