Implement selection vector I/O with collective chunk filling #3826

vchoi-hdfgroup · 2023-11-02T19:46:09Z

Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.

Addresses ECP-344

…hunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.

fortnern · 2023-11-02T20:54:58Z

src/H5Dchunk.c

+    H5MM_xfree(io_addrs);
+    H5MM_xfree(io_wbufs);
+    if (!all_same_block_len)
+        H5MM_xfree(io_sizes);


Minor comment: I'd prefer to just always call H5MM_xfree here to make the code more robust against future changes. It's a no-op if io_sizes in NULL so this won't cause problems

The C standard explicitly says that calling free() on NULL pointers is acceptable.

fortnern · 2023-11-02T22:16:50Z

src/H5FDmpio.c

-        if (count > 0)
-            max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[count - 1]);
+        /* Bug: need to account for fixed size optimization */
+        if (count > 0) {


A few things here:

If count == 1 this reads off the end of the s_sizes array (maybe just change the condition to count > 1)

The sizes can actually end anywhere in the array, not just at index 1. I'd recommend checking the last element to see if it's 0, then scanning from index 1 and up until you see a 0. So maybe something like:
if (count > 1) {
uint32_t idx = count - 1;
if (s_sizes[idx] == 0)
for (idx = 0; idx < count - 1; idx++)
if (s_sizes[idx + 1] == 0)
break;
max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[idx])
}

Also the comment shouldn't proclaim it as a bug, just explain what it's doing

Actually my code is wrong too, I'll try to fix it later tonight. We might need to change the H5FD__mpio_vector_build_types function

I think <s_sizes> returned from H5FD__mpio_vector_build_types() can be (1) or (2) as follows:

entries of sizes

Two entries for same size optimization: entry 0 is the size and entry 1 is 0
Maybe the better way is to change H5FD__mpio_vector_build_types().

It can be any number of entries between 1 (with a 0 terminator if count > 1) and count

If we have time it would be great to add a test for this case. We would have to use the public H5FD interface

Applied the patch from Neil.
I will work on adding a test for this case.

For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()

jhendersonHDF · 2023-11-03T18:22:41Z

The code looks fine and is much simpler, but it does remove the part where the old code divided up the chunks among all the MPI ranks. The current approach will have every MPI rank writing all of the chunks out. Some MPI implementations may optimize for this, but I think we ran into the case where this doesn't necessarily happen and so implemented the rank 0 bcast feature. I'd suggest we should either have only rank 0 write out the chunks and all the other ranks should set their I/O sizes to 0 (and maybe set the bufs to NULL and addrs to HADDR_UNDEF) or try to put back the logic of dividing up the work. Ideally the latter since rank 0 only writes may not scale well.

jhendersonHDF

See comment for discussion

vchoi-hdfgroup · 2023-11-03T22:24:01Z

I will work on putting back the logic of dividing up the work like the original H5D__chunk_collective_fill() routine

…imilar to the original H5D__chunk_collective_fill() routine.

fortnern · 2023-11-07T22:11:15Z

src/H5Dchunk.c

@@ -5603,156 +5607,73 @@ H5D__chunk_collective_fill(const H5D_t *dset, H5D_chunk_coll_fill_info_t *chunk_

    /* Check if we have any chunks to write on this rank */
    if (num_blocks > 0 || (leftover && leftover > mpi_rank)) {


Minor nitpick with pre-existing code - the first leftover is redundant since mpi_rank will never be less than 0. This could be if (num_blocks > 0 || leftover > mpi_rank) {

Done with this and subsequent comments.

fortnern · 2023-11-07T22:28:09Z

src/H5Dchunk.c


-        for (i = 0; i < (size_t)blocks; i++) {
-            size_t idx = i + (size_t)(mpi_rank * blocks);
+    if (need_sort)


Could you add some more comments to the blocks that don't have them, explaining what the code is doing?

Also, for this block in particular, add something like this note (I just discussed this with Scot):

Note that we sort all of the chunks here, and not just a subset corresponding to this rank. We do this since we have found MPI I/O to work better when each rank writes blocks that are contiguous in the file, and by sorting the full list we maximize the chance of that happening.

It seems like we could potentially be making extra work for ourselves now by sorting here since the MPI I/O VFD is already scanning the chunk list to check for this anyway and we don't necessarily need to impose the monotonically non-decreasing file displacement requirement on other parallel VFDs. That said, since the sort has to happen at some point anyway if the addresses are out of order, I'm not sure if the above note is really needed here.

fortnern · 2023-11-07T22:37:32Z

testpar/t_filters_parallel.c

@@ -556,10 +556,20 @@ verify_chunk_opt_status(size_t num_dsets, test_mode_t test_mode, bool any_io, bo
             * elements (because no elements were raw data), which is what happens when performing I/O on a
             * filtered dataset with no selection. Vector I/O does report an I/O call was made if passed a raw
             * data element of size 0, so this is consistent. */
+            /* Changes: for allocation, the reported I/O is vector I/O */


No need to call out "changes" in the code comment here. Just explain why we expect the different modes under each condition as the code stands with this PR

fortnern · 2023-11-07T22:39:02Z

testpar/t_filters_parallel.c

-                            VRFY((H5D_SCALAR_IO | H5D_VECTOR_IO) == actual_sel_io_mode_reduced,
-                                 "verified actual selection I/O mode was scalar and vector I/O");
-                        else if (unalloc_read)
+                        /* Changes: for allocation, the reported I/O is vector I/O */


Same here. Explain that both collective filling/allocation and filtered I/O use vector I/O so only that is reported whether or not allocation happened.

jhendersonHDF · 2023-11-09T00:42:52Z

@vchoi-hdfgroup Thanks for putting pack the logic for dividing up the chunks among the ranks. This looks good to me and I don't really have anything to add to @fortnern's comments.

…D__mpio_write_vector().

fortnern · 2023-11-14T19:06:39Z

Looks good to me.

…p#3826) * Changes for ECP-344: Implement selection vector I/O with collective chunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address. * Fixes based on PR review comments: For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types() * Put back the logic of dividing up the work among all the mpi ranks similar to the original H5D__chunk_collective_fill() routine. * Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().

* Preserve MPI-I/O file hints when fapl is closed (#3755) * Fix for issue #3025: Save the MPI info in the file struct so H5Fget_access_plist() can retrieve it from there. * Add compression tests for subfiling (#3769) * Fix typo in comment (#3775) * Fixed a file handle leak in the core VFD (#3779) When opening a file with the core VFD and a file image, if the file already exists, the file check would leak the POSIX file handle. Fixes GitHub issue #635 * Fix a format string warning in the C++ examples (#3776) * Cancel running GitHub workflows on push to same PR (#3772) * Cancel running GitHub workflows on push to same PR * Remove github.sha from workflow concurrency groups * Print some messages in parallel tests on MPI rank 0 only (#3785) Avoids overly verbose output from all processes emitting progress, etc. info. * Avoid attempted use of NULL pointer in parallel compression code (#3786) The parallel compression test code tests for the case where all MPI ranks have no selection in a dataset when writing to it. Add an early exit to the code to avoid attempting to use a NULL pointer due to there being no work to do. * Don't install h5tools_test_utils test program on system (#3793) * Add Doxygen to H5FDsplitter.h (#3794) * H5FD_CURR_SPLITTER_VFD_CONFIG_VERSION * H5FD_SPLITTER_PATH_MAX * H5FD_SPLITTER_MAGIC * H5FD_splitter_vfd_config_t * H5Pset_fapl_splitter() * H5Pget_fapl_splitter() * Update Doxygen initializers & identifiers in VFDs (#3795) * Add Doxygen for all H5FD_<VFD> initializers * Add Doxygen for all H5FD_<VFD>_VALUE values * Mark H5FD_<vfd>_init() calls private in Doxygen * Fix memory corruption in 'MPI I/O FAPL preserve' test (#3806) * Fix usage of h5_clean_files in t_pflush2.c (#3807) * Fix parallel driver check in h5_fixname_real (#3808) * Fix a couple usages of MPI_Info_get (#3809) * Remove H5system.c warning on Windows oneAPI. (#3812) * Add processing of NVHPC flags in linux-gnulibc1 file (#3804) * Disable testing as tests are failing the same as in CMake * Use the current toolchain for examples as default (#3810) * Fix misc. warnings from GCC when compiling with -fsanitize=undefined (#3787) * Set NVHPC maximum optimization level to -O1 for now (#3800) * Set NVHPC maximum optimization level to -O1 for now Compiling HDF5 with NVHPC 23.5 - 23.9 results in test failures in 4 different test files that need to be resolved. Since those tests pass with an optimization level of -O1 (and -O0) and it is currently unclear whether the test failures are due to issues in HDF5 or issues in the 'nvc' compiler, set the maximum optimization level for NVHPC to -O1 until the test failures are resolved. * Disable nvhpc Java testing in CMake and amend known issues * Re-enable testing of Autotools nvhpc * Update some doxygen links to local refs (#3814) * Rework MPI Info FAPL preserve PR to use VFD 'ctl' operations (#3782) * Removed the use of C wrappers from H5P APIs. (#3824) * fix seg fault on frontier/cray * fix seg fault on frontier/cray * fix seg fault on frontier/cray * removed the use of h5pclose_c * removed the use of h5pclose_c * Fortran Wrappers H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f (#3801) * Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f * Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f tests * Create test for H5Pget_dxpl_mpio (#3825) * Create test and add to testphdf5 * Renamed h5fuse.sh to h5fuse (#3834) * provide an alternative to mapfile for older bash * Disable FP exceptions in H5T init code (#3837) The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization. Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore. Fixes #3831 * Add intel oneapi windows build to CI CMake (#3836) * Remove printf format warning on Windows oneAPI. (#3838) * Correct ENV variables (#3841) * Remove Autotools sed hack (#3848) configure.ac contains a sed line that cleans up incorrect library flags which was added to paper over some bugs in earlier versions of the Autotools. These issues are not a problem with the current versions of the Autootols. The sed line causes problems on MacOS, so it has been removed. Fixes #3843 * Make filter unregister callbacks safe for VOL connectors (#3629) * Make filter callbacks use top-level API functions When using VOL connectors, H5I_iterate may not provide valid object pointers to its callback. This change keeps existing functionality in H5Zunregister() without using potentially unsafe pointers. * Filter callbacks use internal API * Skip MPI work on non-native VOL * Add extra space in comments for consistency (#3852) * Add extra space in comments for consistency * uncomment tfloatsattrs test * Update Actions badges to link to relevant workflow (#3850) * Add CMake long double cross-compile defaults (#3683) HDF5 performs a couple of checks at build time to see if long double values can be converted correctly (IBM's Power architecture uses a special format for long doubles). These checks were performed using TRY_RUN, which is a problem when cross-compiling. These checks now use default values appropriate for most non-Power systems when cross-compiling. The cache values can be pre-set if necessary, which will preempt both the TRY_RUN and the default. Affected values: H5_LDOUBLE_TO_LONG_SPECIAL (default no) H5_LONG_TO_LDOUBLE_SPECIAL (default no) H5_LDOUBLE_TO_LLONG_ACCURATE (default yes) H5_LLONG_TO_LDOUBLE_CORRECT (default yes) H5_DISABLE_SOME_LDOUBLE_CONV (default no) Fixes GitHub #3585 * Updates for building and testing VOL connectors * Fix issue with HDF5_VOL_ALLOW_EXTERNAL CMake variable * Initialize parallel testing with MPI_THREAD_MULTIPLE when testing API * Add CMake variable to allow specifying a VOL connector's package name * Remove call to MPI_Init in serial API tests While previously necessary, it now interferes with VOL connectors that may need to be initialized with MPI_THREAD_MULTIPLE * Fixes for CI and presets (#3853) * Change dest for doxygen (#3856) * Implement selection vector I/O with collective chunk filling (#3826) * Changes for ECP-344: Implement selection vector I/O with collective chunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address. * Fixes based on PR review comments: For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types() * Put back the logic of dividing up the work among all the mpi ranks similar to the original H5D__chunk_collective_fill() routine. * Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector(). * Do not publish compression headers or docs (#3865) * Fix typo: look -> loop (#3866) * Moved the README to markdown and expanded its overview of the files, file generation, and other Fortran wrapper development practices as mentioned in the HDF5 architectural document. I added a new figure and included the SVG file and the original xfig file it was generated from. (#3862) * Add HDF5_DISABLE_TESTS_REGEX option to skip tests (#3859) * Fix typo in error message for `MPI_Type_dup`. (#3867) * Complete the `if command line option` sentence. (#3868) * Fix h5dump segmentation fault when --vfd-value option is used (#3873) * Updated URL in funding.yml (#3882) Using new shortened URL, might look better. * Remove unused variable from unmerged changes * Update src/H5Tinit_float.c

Changes for ECP-344: Implement selection vector I/O with collective c…

35fadc4

…hunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.

vchoi-hdfgroup requested review from lrknox, derobins, byrnHDF, fortnern, jhendersonHDF, qkoziol, bmribler, glennsong09, mattjala and brtnfld as code owners November 2, 2023 19:46

vchoi-hdfgroup added WIP Work in progress (please also convert PRs to draft PRs) Component - C Library Core C library issues (usually in the src directory) Type - Improvement Improvements that don't add a new feature or functionality labels Nov 2, 2023

Committing clang-format changes

f018960

fortnern reviewed Nov 2, 2023

View reviewed changes

derobins changed the title ~~Changes for ECP-344: Implement selection vector I/O with collective c…~~ Implement selection vector I/O with collective c… Nov 3, 2023

derobins changed the title ~~Implement selection vector I/O with collective c…~~ Implement selection vector I/O with collective chunk filling Nov 3, 2023

vchoi-hdfgroup and others added 3 commits November 3, 2023 11:28

Fixes based on PR review comments:

5b501b9

For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()

Committing clang-format changes

be8f0e4

Fix spelling error.

b919f73

fortnern approved these changes Nov 3, 2023

View reviewed changes

jhendersonHDF requested changes Nov 3, 2023

View reviewed changes

vchoi-hdfgroup and others added 3 commits November 6, 2023 22:57

Merge branch 'HDFGroup:develop' into ECP-344_develop

6d10906

Put back the logoic of dividing up the work among all the mpi ranks s…

0f30b4f

…imilar to the original H5D__chunk_collective_fill() routine.

Committing clang-format changes

9059377

fortnern reviewed Nov 7, 2023

View reviewed changes

Merge branch 'HDFGroup:develop' into ECP-344_develop

abdc0c0

jhendersonHDF approved these changes Nov 9, 2023

View reviewed changes

vchoi-hdfgroup and others added 5 commits November 8, 2023 20:57

Modifications based on PR review comments.

69f61b5

Committing clang-format changes

d7786bf

Merge branch 'HDFGroup:develop' into ECP-344_develop

3590161

Add a test to verify the fix for the illegal reference problem in H5F…

22a2bcb

…D__mpio_write_vector().

Committing clang-format changes

22fdd88

fortnern approved these changes Nov 14, 2023

View reviewed changes

lrknox removed the WIP Work in progress (please also convert PRs to draft PRs) label Nov 16, 2023

lrknox merged commit ed31aac into HDFGroup:develop Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement selection vector I/O with collective chunk filling #3826

Implement selection vector I/O with collective chunk filling #3826

vchoi-hdfgroup commented Nov 2, 2023 •

edited by derobins

Loading

fortnern Nov 2, 2023

derobins Nov 3, 2023 •

edited

Loading

vchoi-hdfgroup Nov 3, 2023

fortnern Nov 2, 2023 •

edited

Loading

fortnern Nov 2, 2023

fortnern Nov 2, 2023

vchoi-hdfgroup Nov 3, 2023

fortnern Nov 3, 2023

fortnern Nov 3, 2023

vchoi-hdfgroup Nov 3, 2023

jhendersonHDF commented Nov 3, 2023 •

edited

Loading

jhendersonHDF left a comment

vchoi-hdfgroup commented Nov 3, 2023

fortnern Nov 7, 2023

vchoi-hdfgroup Nov 9, 2023

fortnern Nov 7, 2023

fortnern Nov 7, 2023

jhendersonHDF Nov 9, 2023

fortnern Nov 7, 2023

fortnern Nov 7, 2023

jhendersonHDF commented Nov 9, 2023

fortnern commented Nov 14, 2023

		@@ -5603,156 +5607,73 @@ H5D__chunk_collective_fill(const H5D_t dset, H5D_chunk_coll_fill_info_t chunk_

		/* Check if we have any chunks to write on this rank */
		if (num_blocks > 0 \|\| (leftover && leftover > mpi_rank)) {

Implement selection vector I/O with collective chunk filling #3826

Implement selection vector I/O with collective chunk filling #3826

Conversation

vchoi-hdfgroup commented Nov 2, 2023 • edited by derobins Loading

Choose a reason for hiding this comment

derobins Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fortnern Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF commented Nov 3, 2023 • edited Loading

jhendersonHDF left a comment

Choose a reason for hiding this comment

vchoi-hdfgroup commented Nov 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF commented Nov 9, 2023

fortnern commented Nov 14, 2023

vchoi-hdfgroup commented Nov 2, 2023 •

edited by derobins

Loading

derobins Nov 3, 2023 •

edited

Loading

fortnern Nov 2, 2023 •

edited

Loading

jhendersonHDF commented Nov 3, 2023 •

edited

Loading