Fixes and adjustments to t_filters_parallel #3714

jhendersonHDF · 2023-10-19T00:36:24Z

Broadcast number of datasets to create in multi-dataset I/O cases so that interference with random number generation doesn't cause mismatches between ranks

Set fill time for datasets to "never" by default and adjust on a per-test basis to avoid writing fill values to chunks when it's unnecessary

Reduce number of loops run in some tests when performing multi-dataset I/O

Fix an issue in the "fill time never" test where data verification could fill if file space reuse causes application buffers to be filled with chosen fill value when reading from datasets with uninitialized storage

Skip multi-chunk I/O test configurations for multi-dataset I/O configurations when the TestExpress level is > 1 since those tests can be more stressful on the file system

jhendersonHDF · 2023-10-19T00:37:46Z

testpar/t_filters_parallel.c

-            n_dsets       = (rand() % (MAX_NUM_DSETS_MULTI - 1)) + 2;
+
+            if (MAINPROCESS)
+                n_dsets = (rand() % (MAX_NUM_DSETS_MULTI - 1)) + 2;


On some systems, other software in the stack appears to be causing rand() calls to get out of sync between ranks, which is responsible for hangs on those systems during the multi-dataset I/O tests. Broadcast these values from rank to ensure they're consistent across every rank

jhendersonHDF · 2023-10-19T00:38:24Z

testpar/t_filters_parallel.c

+    /* Determine number of loops to run through */
+    num_loops = WRITE_UNSHARED_ONE_UNLIM_DIM_NLOOPS;
+    if ((test_mode == USE_MULTIPLE_DATASETS) || (test_mode == USE_MULTIPLE_DATASETS_MIXED_FILTERED))
+        num_loops /= 2;


Reduce the number of loops run in some of these tests for the multi-dataset I/O case since it just amplifies the amount of testing for no particular reason

jhendersonHDF · 2023-10-19T00:38:36Z

testpar/t_filters_parallel.c

@@ -5250,7 +5311,6 @@ test_read_filtered_dataset_all_no_selection(const char *parent_group, H5Z_filter
    void       *read_bufs[MAX_NUM_DSETS_MULTI]    = {0};
    hsize_t     dataset_dims[READ_ALL_NO_SELECTION_FILTERED_CHUNKS_DATASET_DIMS];
    hsize_t     chunk_dims[READ_ALL_NO_SELECTION_FILTERED_CHUNKS_DATASET_DIMS];
-    hsize_t     sel_dims[READ_ALL_NO_SELECTION_FILTERED_CHUNKS_DATASET_DIMS];


jhendersonHDF · 2023-10-19T00:40:16Z

testpar/t_filters_parallel.c

-         * a safe comparison in theory.
-         */
-        VRFY((0 != memcmp(read_bufs[dset_idx], fill_buf, read_buf_size)), "Data verification succeeded");
-    }


Remove two data verification sections from this particular test. They try to ensure that the chosen fill value isn't returned from reads of a dataset with uninitialized storage, but file space reuse means that there's no real guarantee what comes back from the reads and these checks were occasionally failing on some systems

jhendersonHDF · 2023-10-19T00:41:36Z

testpar/t_filters_parallel.c

+                         */
+                        if (test_express_level_g > 1) {
+                            if (((test_mode == USE_MULTIPLE_DATASETS) || (test_mode == USE_MULTIPLE_DATASETS_MIXED_FILTERED))
+                                    && (chunk_opt != H5FD_MPIO_CHUNK_ONE_IO))


When the TestExpress level is > 1, skip testing of multi-chunk I/O in the multi-dataset I/O case since it tends to just hammer the filesystem and the multi_dset test already does a bit of testing of this case.

jhendersonHDF · 2023-10-19T00:42:15Z

testpar/t_filters_parallel.c

+                         * when they're going to be fully overwritten anyway.
+                         * Individual tests will alter this behavior as necessary.
+                         */
+                        VRFY((H5Pset_fill_time(dcpl_id, H5D_FILL_TIME_NEVER) >= 0),


Set the fill time to "never" by default so that we can avoid needlessly writing fill values out to chunks in some cases

lrknox · 2023-10-19T14:10:37Z

Tested on perlmutter: passed in 13 seconds now.

jhendersonHDF · 2023-10-19T19:05:50Z

@lrknox Made one last small change to turn off persistent file free space management so that we don't run into the occasional infinite loop in the library's free space management code. If it's not too much trouble, could you try running on Perlmutter again just to make sure nothing's really changed there? Going to try to get a reproducer for the infinite loop issue.

Broadcast number of datasets to create in multi-dataset I/O cases so that interference with random number generation doesn't cause mismatches between ranks Set fill time for datasets to "never" by default and adjust on a per-test basis to avoid writing fill values to chunks when it's unnecessary Reduce number of loops run in some tests when performing multi-dataset I/O Fix an issue in the "fill time never" test where data verification could fill if file space reuse causes application buffers to be filled with chosen fill value when reading from datasets with uninitialized storage Skip multi-chunk I/O test configurations for multi-dataset I/O configurations when the TestExpress level is > 1 since those tests can be more stressful on the file system Disable use of persistent file free space management for now since it occasionally runs into an infinite loop in the library's free space management code

lrknox · 2023-10-19T21:25:30Z

@lrknox Made one last small change to turn off persistent file free space management so that we don't run into the occasional infinite loop in the library's free space management code. If it's not too much trouble, could you try running on Perlmutter again just to make sure nothing's really changed there? Going to try to get a reproducer for the infinite loop issue.

Sure.

lrknox · 2023-10-19T21:57:00Z

@lrknox Made one last small change to turn off persistent file free space management so that we don't run into the occasional infinite loop in the library's free space management code. If it's not too much trouble, could you try running on Perlmutter again just to make sure nothing's really changed there? Going to try to get a reproducer for the infinite loop issue.

Done. still good - ran 39 test configurations:

All Parallel Filters tests passed - total test time was 10.150296 seconds

Test time = 44.65 sec

Test Passed.
"MPI_TEST_t_filters_parallel" end time: Oct 19 14:42 PDT
"MPI_TEST_t_filters_parallel" time elapsed: 00:00:44

Broadcast number of datasets to create in multi-dataset I/O cases so that interference with random number generation doesn't cause mismatches between ranks Set fill time for datasets to "never" by default and adjust on a per-test basis to avoid writing fill values to chunks when it's unnecessary Reduce number of loops run in some tests when performing multi-dataset I/O Fix an issue in the "fill time never" test where data verification could fill if file space reuse causes application buffers to be filled with chosen fill value when reading from datasets with uninitialized storage Skip multi-chunk I/O test configurations for multi-dataset I/O configurations when the TestExpress level is > 1 since those tests can be more stressful on the file system Disable use of persistent file free space management for now since it occasionally runs into an infinite loop in the library's free space management code

* Correct ld in format strings in cmpd_dset.c (#3697) Removes clang warnings * Clean up comments. (#3695) * Add NVidia compiler support and CI (#3686) * Work around Theta system issue failure in links test (#3710) When the Subfiling VFD is enabled, the links test may try to initialize the Subfiling VFD and call MPI_Init_thread. On Theta, this appears to have an issue that will cause the links test to fail. Reworked the test to check for the same conditions in a more roundabout way that doesn't involved initializing the Subfiling VFD * Fix issue with unmatched messages in ph5diff (#3719) * provide an alternative to mapfile for older bash (#3717) * Attempt to quiet some warnings with cray compilers. (#3724) * Fix CMake VOL passthrough tests by copying files to correct directory (#3721) * Develop intel split (#3722) * Split intel compiler flags into sub-folders * Update Intel options for warnings * Mostly CMake, Autotools needs additional work * Fixes and adjustments to t_filters_parallel (#3714) Broadcast number of datasets to create in multi-dataset I/O cases so that interference with random number generation doesn't cause mismatches between ranks Set fill time for datasets to "never" by default and adjust on a per-test basis to avoid writing fill values to chunks when it's unnecessary Reduce number of loops run in some tests when performing multi-dataset I/O Fix an issue in the "fill time never" test where data verification could fill if file space reuse causes application buffers to be filled with chosen fill value when reading from datasets with uninitialized storage Skip multi-chunk I/O test configurations for multi-dataset I/O configurations when the TestExpress level is > 1 since those tests can be more stressful on the file system Disable use of persistent file free space management for now since it occasionally runs into an infinite loop in the library's free space management code * Suppress cast-qual warning in H5TB Fortran wrapper (#3728) This interface is fundamentally broken, const-wise. * Add new API function H5Pget_actual_select_io_mode() (#2974) This function allows the user to determine if the library performed selection I/O, vector I/O, or scalar (legacy) I/O during the last HDF5 operation performed with the provided DXPL. Expanded existing tests to check this functionality. * Test scripts now execute in-source with creation of tmp dir (#3723) Fixes a few issues created in #3580: * Fixes a problem where committed tools test files were deleted when cleaning after an in-source build * Fixes issues with test file paths in Autotools tools test scripts * Add -h and --help as flags in h5cc & h5fc (#3729) Adds these common help flags in addition to -help * Update the library version matrix for H5Pset_libver_bounds() (#3702) * Fixed #3524 Added 1.12, 1.14, and 1.16 to the table for libver bounds in the H5Pset_libver_bounds docs. * Remove references to LIBVER_V116 --------- Co-authored-by: H. Joe Lee <[email protected]> Co-authored-by: Allen Byrne <[email protected]> Co-authored-by: Scot Breitenfeld <[email protected]> Co-authored-by: Dana Robinson <[email protected]> Co-authored-by: Neil Fortner <[email protected]> Co-authored-by: Glenn Song <[email protected]> Co-authored-by: bmribler <[email protected]>

jhendersonHDF requested review from lrknox, derobins, byrnHDF, fortnern, qkoziol, vchoi-hdfgroup, bmribler, glennsong09, mattjala and brtnfld as code owners October 19, 2023 00:36

jhendersonHDF commented Oct 19, 2023

View reviewed changes

jhendersonHDF force-pushed the t_parallel_filters_mdio_adjust branch from c0e4ea0 to 9dccca5 Compare October 19, 2023 00:42

lrknox approved these changes Oct 19, 2023

View reviewed changes

jhendersonHDF force-pushed the t_parallel_filters_mdio_adjust branch from 9dccca5 to 8e46bab Compare October 19, 2023 19:04

derobins approved these changes Oct 19, 2023

View reviewed changes

derobins merged commit af56339 into HDFGroup:develop Oct 19, 2023
40 checks passed

jhendersonHDF deleted the t_parallel_filters_mdio_adjust branch October 20, 2023 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and adjustments to t_filters_parallel #3714

Fixes and adjustments to t_filters_parallel #3714

jhendersonHDF commented Oct 19, 2023

jhendersonHDF Oct 19, 2023

jhendersonHDF Oct 19, 2023

jhendersonHDF Oct 19, 2023

jhendersonHDF Oct 19, 2023

jhendersonHDF Oct 19, 2023

jhendersonHDF Oct 19, 2023

lrknox commented Oct 19, 2023

jhendersonHDF commented Oct 19, 2023

lrknox commented Oct 19, 2023

lrknox commented Oct 19, 2023

Fixes and adjustments to t_filters_parallel #3714

Fixes and adjustments to t_filters_parallel #3714

Conversation

jhendersonHDF commented Oct 19, 2023

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

jhendersonHDF Oct 19, 2023

Choose a reason for hiding this comment

lrknox commented Oct 19, 2023

jhendersonHDF commented Oct 19, 2023

lrknox commented Oct 19, 2023

lrknox commented Oct 19, 2023

All Parallel Filters tests passed - total test time was 10.150296 seconds Test time = 44.65 sec

Test Passed. "MPI_TEST_t_filters_parallel" end time: Oct 19 14:42 PDT "MPI_TEST_t_filters_parallel" time elapsed: 00:00:44

All Parallel Filters tests passed - total test time was 10.150296 seconds

Test time = 44.65 sec

Test Passed.
"MPI_TEST_t_filters_parallel" end time: Oct 19 14:42 PDT
"MPI_TEST_t_filters_parallel" time elapsed: 00:00:44