-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement selection vector I/O with collective chunk filling #3826
Conversation
…hunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.
src/H5Dchunk.c
Outdated
H5MM_xfree(io_addrs); | ||
H5MM_xfree(io_wbufs); | ||
if (!all_same_block_len) | ||
H5MM_xfree(io_sizes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: I'd prefer to just always call H5MM_xfree here to make the code more robust against future changes. It's a no-op if io_sizes in NULL so this won't cause problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C standard explicitly says that calling free() on NULL pointers is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/H5FDmpio.c
Outdated
if (count > 0) | ||
max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[count - 1]); | ||
/* Bug: need to account for fixed size optimization */ | ||
if (count > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things here:
- If count == 1 this reads off the end of the s_sizes array (maybe just change the condition to count > 1)
- The sizes can actually end anywhere in the array, not just at index 1. I'd recommend checking the last element to see if it's 0, then scanning from index 1 and up until you see a 0. So maybe something like:
if (count > 1) {
uint32_t idx = count - 1;
if (s_sizes[idx] == 0)
for (idx = 0; idx < count - 1; idx++)
if (s_sizes[idx + 1] == 0)
break;
max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[idx])
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the comment shouldn't proclaim it as a bug, just explain what it's doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my code is wrong too, I'll try to fix it later tonight. We might need to change the H5FD__mpio_vector_build_types function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think <s_sizes> returned from H5FD__mpio_vector_build_types() can be (1) or (2) as follows:
- entries of sizes
- Two entries for same size optimization: entry 0 is the size and entry 1 is 0
Maybe the better way is to change H5FD__mpio_vector_build_types().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be any number of entries between 1 (with a 0 terminator if count > 1) and count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have time it would be great to add a test for this case. We would have to use the public H5FD interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the patch from Neil.
I will work on adding a test for this case.
For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()
The code looks fine and is much simpler, but it does remove the part where the old code divided up the chunks among all the MPI ranks. The current approach will have every MPI rank writing all of the chunks out. Some MPI implementations may optimize for this, but I think we ran into the case where this doesn't necessarily happen and so implemented the rank 0 bcast feature. I'd suggest we should either have only rank 0 write out the chunks and all the other ranks should set their I/O sizes to 0 (and maybe set the bufs to NULL and addrs to HADDR_UNDEF) or try to put back the logic of dividing up the work. Ideally the latter since rank 0 only writes may not scale well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment for discussion
I will work on putting back the logic of dividing up the work like the original H5D__chunk_collective_fill() routine |
…imilar to the original H5D__chunk_collective_fill() routine.
src/H5Dchunk.c
Outdated
@@ -5603,156 +5607,73 @@ H5D__chunk_collective_fill(const H5D_t *dset, H5D_chunk_coll_fill_info_t *chunk_ | |||
|
|||
/* Check if we have any chunks to write on this rank */ | |||
if (num_blocks > 0 || (leftover && leftover > mpi_rank)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick with pre-existing code - the first leftover is redundant since mpi_rank will never be less than 0. This could be if (num_blocks > 0 || leftover > mpi_rank) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with this and subsequent comments.
|
||
for (i = 0; i < (size_t)blocks; i++) { | ||
size_t idx = i + (size_t)(mpi_rank * blocks); | ||
if (need_sort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some more comments to the blocks that don't have them, explaining what the code is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for this block in particular, add something like this note (I just discussed this with Scot):
Note that we sort all of the chunks here, and not just a subset corresponding to this rank. We do this since we have found MPI I/O to work better when each rank writes blocks that are contiguous in the file, and by sorting the full list we maximize the chance of that happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we could potentially be making extra work for ourselves now by sorting here since the MPI I/O VFD is already scanning the chunk list to check for this anyway and we don't necessarily need to impose the monotonically non-decreasing file displacement requirement on other parallel VFDs. That said, since the sort has to happen at some point anyway if the addresses are out of order, I'm not sure if the above note is really needed here.
testpar/t_filters_parallel.c
Outdated
@@ -556,10 +556,20 @@ verify_chunk_opt_status(size_t num_dsets, test_mode_t test_mode, bool any_io, bo | |||
* elements (because no elements were raw data), which is what happens when performing I/O on a | |||
* filtered dataset with no selection. Vector I/O does report an I/O call was made if passed a raw | |||
* data element of size 0, so this is consistent. */ | |||
/* Changes: for allocation, the reported I/O is vector I/O */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call out "changes" in the code comment here. Just explain why we expect the different modes under each condition as the code stands with this PR
testpar/t_filters_parallel.c
Outdated
VRFY((H5D_SCALAR_IO | H5D_VECTOR_IO) == actual_sel_io_mode_reduced, | ||
"verified actual selection I/O mode was scalar and vector I/O"); | ||
else if (unalloc_read) | ||
/* Changes: for allocation, the reported I/O is vector I/O */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Explain that both collective filling/allocation and filtered I/O use vector I/O so only that is reported whether or not allocation happened.
@vchoi-hdfgroup Thanks for putting pack the logic for dividing up the chunks among the ranks. This looks good to me and I don't really have anything to add to @fortnern's comments. |
Looks good to me. |
…p#3826) * Changes for ECP-344: Implement selection vector I/O with collective chunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address. * Fixes based on PR review comments: For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types() * Put back the logic of dividing up the work among all the mpi ranks similar to the original H5D__chunk_collective_fill() routine. * Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().
* Preserve MPI-I/O file hints when fapl is closed (#3755) * Fix for issue #3025: Save the MPI info in the file struct so H5Fget_access_plist() can retrieve it from there. * Add compression tests for subfiling (#3769) * Fix typo in comment (#3775) * Fixed a file handle leak in the core VFD (#3779) When opening a file with the core VFD and a file image, if the file already exists, the file check would leak the POSIX file handle. Fixes GitHub issue #635 * Fix a format string warning in the C++ examples (#3776) * Cancel running GitHub workflows on push to same PR (#3772) * Cancel running GitHub workflows on push to same PR * Remove github.sha from workflow concurrency groups * Print some messages in parallel tests on MPI rank 0 only (#3785) Avoids overly verbose output from all processes emitting progress, etc. info. * Avoid attempted use of NULL pointer in parallel compression code (#3786) The parallel compression test code tests for the case where all MPI ranks have no selection in a dataset when writing to it. Add an early exit to the code to avoid attempting to use a NULL pointer due to there being no work to do. * Don't install h5tools_test_utils test program on system (#3793) * Add Doxygen to H5FDsplitter.h (#3794) * H5FD_CURR_SPLITTER_VFD_CONFIG_VERSION * H5FD_SPLITTER_PATH_MAX * H5FD_SPLITTER_MAGIC * H5FD_splitter_vfd_config_t * H5Pset_fapl_splitter() * H5Pget_fapl_splitter() * Update Doxygen initializers & identifiers in VFDs (#3795) * Add Doxygen for all H5FD_<VFD> initializers * Add Doxygen for all H5FD_<VFD>_VALUE values * Mark H5FD_<vfd>_init() calls private in Doxygen * Fix memory corruption in 'MPI I/O FAPL preserve' test (#3806) * Fix usage of h5_clean_files in t_pflush2.c (#3807) * Fix parallel driver check in h5_fixname_real (#3808) * Fix a couple usages of MPI_Info_get (#3809) * Remove H5system.c warning on Windows oneAPI. (#3812) * Add processing of NVHPC flags in linux-gnulibc1 file (#3804) * Disable testing as tests are failing the same as in CMake * Use the current toolchain for examples as default (#3810) * Fix misc. warnings from GCC when compiling with -fsanitize=undefined (#3787) * Set NVHPC maximum optimization level to -O1 for now (#3800) * Set NVHPC maximum optimization level to -O1 for now Compiling HDF5 with NVHPC 23.5 - 23.9 results in test failures in 4 different test files that need to be resolved. Since those tests pass with an optimization level of -O1 (and -O0) and it is currently unclear whether the test failures are due to issues in HDF5 or issues in the 'nvc' compiler, set the maximum optimization level for NVHPC to -O1 until the test failures are resolved. * Disable nvhpc Java testing in CMake and amend known issues * Re-enable testing of Autotools nvhpc * Update some doxygen links to local refs (#3814) * Rework MPI Info FAPL preserve PR to use VFD 'ctl' operations (#3782) * Removed the use of C wrappers from H5P APIs. (#3824) * fix seg fault on frontier/cray * fix seg fault on frontier/cray * fix seg fault on frontier/cray * removed the use of h5pclose_c * removed the use of h5pclose_c * Fortran Wrappers H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f (#3801) * Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f * Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f tests * Create test for H5Pget_dxpl_mpio (#3825) * Create test and add to testphdf5 * Renamed h5fuse.sh to h5fuse (#3834) * provide an alternative to mapfile for older bash * Disable FP exceptions in H5T init code (#3837) The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization. Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore. Fixes #3831 * Add intel oneapi windows build to CI CMake (#3836) * Remove printf format warning on Windows oneAPI. (#3838) * Correct ENV variables (#3841) * Remove Autotools sed hack (#3848) configure.ac contains a sed line that cleans up incorrect library flags which was added to paper over some bugs in earlier versions of the Autotools. These issues are not a problem with the current versions of the Autootols. The sed line causes problems on MacOS, so it has been removed. Fixes #3843 * Make filter unregister callbacks safe for VOL connectors (#3629) * Make filter callbacks use top-level API functions When using VOL connectors, H5I_iterate may not provide valid object pointers to its callback. This change keeps existing functionality in H5Zunregister() without using potentially unsafe pointers. * Filter callbacks use internal API * Skip MPI work on non-native VOL * Add extra space in comments for consistency (#3852) * Add extra space in comments for consistency * uncomment tfloatsattrs test * Update Actions badges to link to relevant workflow (#3850) * Add CMake long double cross-compile defaults (#3683) HDF5 performs a couple of checks at build time to see if long double values can be converted correctly (IBM's Power architecture uses a special format for long doubles). These checks were performed using TRY_RUN, which is a problem when cross-compiling. These checks now use default values appropriate for most non-Power systems when cross-compiling. The cache values can be pre-set if necessary, which will preempt both the TRY_RUN and the default. Affected values: H5_LDOUBLE_TO_LONG_SPECIAL (default no) H5_LONG_TO_LDOUBLE_SPECIAL (default no) H5_LDOUBLE_TO_LLONG_ACCURATE (default yes) H5_LLONG_TO_LDOUBLE_CORRECT (default yes) H5_DISABLE_SOME_LDOUBLE_CONV (default no) Fixes GitHub #3585 * Updates for building and testing VOL connectors * Fix issue with HDF5_VOL_ALLOW_EXTERNAL CMake variable * Initialize parallel testing with MPI_THREAD_MULTIPLE when testing API * Add CMake variable to allow specifying a VOL connector's package name * Remove call to MPI_Init in serial API tests While previously necessary, it now interferes with VOL connectors that may need to be initialized with MPI_THREAD_MULTIPLE * Fixes for CI and presets (#3853) * Change dest for doxygen (#3856) * Implement selection vector I/O with collective chunk filling (#3826) * Changes for ECP-344: Implement selection vector I/O with collective chunk filling. Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address. * Fixes based on PR review comments: For H5Dchunk.c: fix H5MM_xfree() For H5FDmpio.c: 1) Revert the fix to H5FD__mpio_write_vector() 2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types() * Put back the logic of dividing up the work among all the mpi ranks similar to the original H5D__chunk_collective_fill() routine. * Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector(). * Do not publish compression headers or docs (#3865) * Fix typo: look -> loop (#3866) * Moved the README to markdown and expanded its overview of the files, file generation, and other Fortran wrapper development practices as mentioned in the HDF5 architectural document. I added a new figure and included the SVG file and the original xfig file it was generated from. (#3862) * Add HDF5_DISABLE_TESTS_REGEX option to skip tests (#3859) * Fix typo in error message for `MPI_Type_dup`. (#3867) * Complete the `if command line option` sentence. (#3868) * Fix h5dump segmentation fault when --vfd-value option is used (#3873) * Updated URL in funding.yml (#3882) Using new shortened URL, might look better. * Remove unused variable from unmerged changes * Update src/H5Tinit_float.c
Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.
Addresses ECP-344