Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolation to/from sub-meshes causes parallel deadlock #3173

Closed
garth-wells opened this issue Apr 26, 2024 · 3 comments · Fixed by #3177
Closed

Interpolation to/from sub-meshes causes parallel deadlock #3173

garth-wells opened this issue Apr 26, 2024 · 3 comments · Fixed by #3177
Labels
bug Something isn't working ci Continuous Integration high-priority

Comments

@garth-wells
Copy link
Member

garth-wells commented Apr 26, 2024

PR #3114 causes deadlock in parallel, see https://github.com/FEniCS/dolfinx/actions/runs/8838261829.

Also see #3114 (comment).

The logic in

def interpolate(
and in
def _(expr: Expression, cells: typing.Optional[np.ndarray] = None):
is muddled and conflates None and len(foo) == 0.

@garth-wells garth-wells added bug Something isn't working high-priority ci Continuous Integration labels Apr 26, 2024
@jorgensd
Copy link
Member

Issue is that since we do not associate any relationships between meshes in the objects, it gets problematic to distinguish between non-matching meshes and a sub-mesh parent mesh relation when one has no cells on a process for:

  1. Have parent cells, no submesh cells

  2. No parent cells, no submesh cells (no cells for either nonmatching grid).

  3. Is easy to fix, 2, however, is challenging to fix without adding a flag to interpolate stating if we want to call non-matching interpolation (as this uses mpi communication).

@garth-wells
Copy link
Member Author

Issue is that since we do not associate any relationships between meshes in the objects, it gets problematic to distinguish between non-matching meshes and a sub-mesh parent mesh relation when one has no cells on a process for:

  1. Have parent cells, no submesh cells
  2. No parent cells, no submesh cells (no cells for either nonmatching grid).
  3. Is easy to fix, 2, however, is challenging to fix without adding a flag to interpolate stating if we want to call non-matching interpolation (as this uses mpi communication).

The first problem seems to be that the code conflates None and len(foo) == 0, which breaks when a sub-mesh has no cells on a rank. This is a failure in logic.

The second is a design failure - one function doing too many different things. And it's not clear from the argument descriptions what is going on.

Unless there is an easy and good fix, the change should probably be wound back until it can done properly.

@jorgensd
Copy link
Member

Working on a fix where a user that wants non-matching interpolation has to pass in an enum indicating this. Should not affect usage of normal interpolation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants