Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix MDTest calculation of multiple iterations was incorrect. #281

Merged
merged 3 commits into from
Nov 30, 2020

Conversation

JulianKunkel
Copy link
Collaborator

@JulianKunkel JulianKunkel commented Nov 26, 2020

Fix the bug reported by Rick to increase clarity. Thanks!

The previous offset calculation when using multiple iterations was:
for (i = start; i < stop; i++) // i = table position == test number
for (k=0; k < size; k++)
for (j = 0; j < iterations; j++)
value = all[(k * tableSize * iterations) + (j*tableSize) + i];

Note that the mean and min/max was then computed over these values.
But as the values were stored in memory in the order: iteration, rank, table
the correct term is: value = all[j * tableSize * size + k * tableSize + i];

Assume iterations = 2 and size = 3, the value for the test i=0 was computed from:
all[0 * 2 *tbl + 0 * tbl] = 0tbl
all[0 * 2 *tbl + 1 * tbl] = 1tbl
all[1 * 2 *tbl + 0 * tbl] = 2tbl
all[1 * 2 *tbl + 1 * tbl] = 3tbl
all[2 * 2 *tbl + 0 * tbl] = 4tbl
all[2 * 2 *tbl + 1 * tbl] = 5tbl

A more clear traversal would have been:
all[0 * 3 *tbl + 0 * tbl] = 0tbl
all[0 * 3 *tbl + 1 * tbl] = 1tbl
all[0 * 3 *tbl + 2 * tbl] = 2tbl
all[1 * 3 *tbl + 0 * tbl] = 3tbl
all[1 * 3 *tbl + 1 * tbl] = 4tbl
all[1 * 3 *tbl + 2 * tbl] = 5tbl

In that sense, it wasn't a functional bug but it decreased readability and now that we want to print the performance of the individual ranks, it is useful to fix this.

src/mdtest.c Outdated
for (j = 0; j < iterations; j++) {
curr = all[(k*tableSize*iterations)
+ (j*tableSize) + i];
for (j = 0; j < iterations; j++) {
Copy link
Contributor

@adilger adilger Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using better variable names than "i" and "j" and "k", like "iter" and "op" might avoid this kind of bug in the future.

Also, putting the "j*tableSize*size + k*tableSize + i" calculation into a small helper function like:

int calc_allreduce_index(int index, int iter, int op)

(or whatever) and using it in the places where all[] is accessed would also avoid similar usage bugs.

@adilger
Copy link
Contributor

adilger commented Nov 28, 2020

The previous offset calculation when using multiple iterations was:

This kind of information would be very useful in the commit message of the patch itself, rather than just the PR, since it will be easily available with the code in the future.

@JulianKunkel JulianKunkel merged commit 4a3e480 into master Nov 30, 2020
@JulianKunkel JulianKunkel deleted the fix-mdtest-iter branch November 30, 2020 14:17
@glennklockwood glennklockwood mentioned this pull request Nov 30, 2020
2 tasks
JulianKunkel added a commit that referenced this pull request Dec 2, 2020
JulianKunkel added a commit that referenced this pull request Dec 3, 2020
Backmerged #281 to fix iteration number
glennklockwood added a commit to glennklockwood/ior that referenced this pull request Dec 18, 2020
glennklockwood added a commit that referenced this pull request Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants