Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved version of 2a3a and 3b2b transposes #363

Merged
merged 28 commits into from
Apr 11, 2022

Conversation

orvedahl
Copy link
Contributor

@orvedahl orvedahl commented Apr 9, 2022

This PR provides an option to use slightly faster versions of the 2a3a and 3b2b transposes. The original transpose routines (now the default) assumed that the m_values were stored as high/low pairs as: m = 0, m_max, 1, m_max-1, 2, m_max-2, .... Then various pairs of m values are distributed across rows/columns. For a process that holds the first 3 pairs of m values, the order would be: m=0,m_max,1.m_max-1,2,m_max-2. This means accessing the m values in order is not contiguous leading to a small performance hit.

The new 2a3a transpose routine relies on a different storage of the m values, such that the local chunk of m values is contiguous. For a process that holds the first 3 pairs of m values, the ordering will now be: m=0,1,2,m_max,m_max-1,m_max-2. This allows for contiguous access of the m values.

The new 3b2b transpose routine simply reorders some of the for loops to take advantage of the Fortran layout, i.e., column major loop ordering.

Copy link
Contributor

@feathern feathern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved -- thanks for this too Ryan. Good bit of optimization here.

@feathern
Copy link
Contributor

There was a small conflict in changlog.md that I resolved. I'm merging this now.

@feathern feathern merged commit 71f0799 into geodynamics:master Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants