Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: VTXWriter outputs erroneous data in parallel when outputting collapsed space #2929

Closed
nate-sime opened this issue Dec 12, 2023 · 11 comments · Fixed by #3091
Closed

[BUG]: VTXWriter outputs erroneous data in parallel when outputting collapsed space #2929

nate-sime opened this issue Dec 12, 2023 · 11 comments · Fixed by #3091
Labels
bug Something isn't working
Milestone

Comments

@nate-sime
Copy link
Contributor

nate-sime commented Dec 12, 2023

Summarize the issue

Using VTXWriter to output a DG function and one of its collapsed sub functions yields incorrect data in parallel.

How to reproduce the bug

The following MWE and attached paraview rendering illustrates the issue. The image is produced after running with 5 MPI processes.

Minimal Example (Python)

import numpy as np
from mpi4py import MPI
import dolfinx

mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 4, 4)

V = dolfinx.fem.VectorFunctionSpace(mesh, ("DG", 1), dim=2)
u_vec = dolfinx.fem.Function(V, name="u")

u_vec.interpolate(lambda x: np.stack((x[0], x[1])))
u_vec.x.scatter_forward()
u1 = u_vec.sub(0).collapse()
u1.x.scatter_forward()
u1.name = "u1"

# Fails in parallel with odd ghosting issues
with dolfinx.io.VTXWriter(
        mesh.comm, "solution.bp", [u_vec, u1], engine="BP4") as vtx:
    vtx.write(0.0)

# Works fine
with dolfinx.io.VTXWriter(
        mesh.comm, "solution_u1.bp", [u1], engine="BP4") as vtx:
    vtx.write(0.0)

Output (Python)

No response

Version

0.7.2

DOLFINx git commit

No response

Installation

From source

Additional information

image

@nate-sime nate-sime added the bug Something isn't working label Dec 12, 2023
@jorgensd
Copy link
Member

Same issue on main with 3 processes and:

import numpy as np
from mpi4py import MPI
import dolfinx

mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 2, 2)

V = dolfinx.fem.functionspace(mesh, ("DG", 1, (2, )))
u_vec = dolfinx.fem.Function(V, name="u")

u_vec.interpolate(lambda x: np.stack((x[0], x[1])))
u_vec.x.scatter_forward()
u1 = u_vec.sub(0).collapse()
u1.x.scatter_forward()
u1.name = "u1"

# Fails in parallel with odd ghosting issues
with dolfinx.io.VTXWriter(
        mesh.comm, "solution.bp", [u_vec, u1], engine="BP4") as vtx:
    vtx.write(0.0)

# Works fine
with dolfinx.io.VTXWriter(
        mesh.comm, "solution_u1.bp", [u1], engine="BP4") as vtx:
    vtx.write(0.0)

Seems like a switch in cells:Screenshot from 2023-12-13 08-53-46
image

Also fails with 3 processes and Lagrange:

import numpy as np
from mpi4py import MPI
import dolfinx

mesh = dolfinx.mesh.create_unit_square(
    MPI.COMM_WORLD, 2, 2, ghost_mode=dolfinx.mesh.GhostMode.none)

V = dolfinx.fem.functionspace(mesh, ("Lagrange", 1, (2, )))
u_vec = dolfinx.fem.Function(V, name="u")

u_vec.interpolate(lambda x: np.stack((x[0], x[1])))
u_vec.x.scatter_forward()
u1 = u_vec.sub(0).collapse()
u1.x.scatter_forward()
u1.name = "u1"

# Fails in parallel with odd ghosting issues
with dolfinx.io.VTXWriter(
        mesh.comm, "solution.bp", [u_vec, u1], engine="BP4") as vtx:
    vtx.write(0.0)

# Works fine
with dolfinx.io.VTXWriter(
        mesh.comm, "solution_u1.bp", [u1], engine="BP4") as vtx:
    vtx.write(0.0)

@jorgensd
Copy link
Member

To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space.
Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments?

import numpy as np
from mpi4py import MPI
import dolfinx

mesh = dolfinx.mesh.create_unit_square(
    MPI.COMM_WORLD, 2, 2, ghost_mode=dolfinx.mesh.GhostMode.none)

V = dolfinx.fem.functionspace(mesh, ("Lagrange", 1, (2, )))
V0, _ = V.sub(0).collapse()

print(V.dofmap.index_map.ghosts,
      V0.dofmap.index_map.ghosts)
[1] [1]
[5 7 6 0] [0 5 7 6]
[6] [6]

@francesco-ballarin
Copy link
Member

To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space.
Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments?
From @jorgensd

I think we agreed in #2881 that that was on purpose.

@jorgensd
Copy link
Member

To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space.
Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments?
From @jorgensd

I think we agreed in #2881 that that was on purpose.

I agree that in the case of using create_sub_map, it doesn't make sense to have that special handling.
However, I find it strange that collapsing a function space reorders the dofs, when we explicitly do not want to re-order all indices when having such a submap.
https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/DofMap.cpp#L26-L83
The function states that it doesn't reorder the dofmap, however, that isn't true, as the ghost indices are reordered.

To me, this suggests that calling create_sub_map is way overkill for collapsing a blocked function space (as you then retain all ghost degrees of freedom). I guess whenever you have proper mixed spaces, it will be tricker to find the subset of indices and ghosts, and some reordering would be expected. I still wonder if we even in that case could do this without any communication, as you know that you are retaining all dofs local in the sub-space, similarly all ghosts are kept (in the subspace).

@garth-wells
Copy link
Member

It's not clear to me that the ghost ordering is the problem. Re-mapping takes place when copying entries from the vector at

// Copy values into new vector
.

@jorgensd
Copy link
Member

jorgensd commented Dec 13, 2023

As the issue shows, they are copied over in the correct order, ie, if you only write the collapsed function to file, the ordering is correct. However, if you write the parent space and collapsed sub space to the same file, we see that the dof maps are not the same, even if the Finite elements are the same (up to block size) and the dof layout is the same for both spaces.

@jorgensd
Copy link
Member

jorgensd commented Dec 13, 2023

It's not clear to me that the ghost ordering is the problem. Re-mapping takes place when copying entries from the vector at

// Copy values into new vector

.

You can also see this with the PR that I have made, as it resolves the issues described here

@garth-wells
Copy link
Member

The root problem still hasn't been properly identified. Is it that VTXWriter requires all functions to have the same dofmap, and in the MWE the maps are not the same?

@jorgensd
Copy link
Member

The root problem still hasn't been properly identified. Is it that VTXWriter requires all functions to have the same dofmap, and in the MWE the maps are not the same?

Yes, VTXWriter assumes that the dofmaps are the same (as there is a 1-1 map between the mesh geometry dofs and the function dofs)
This was previously the case for blocked spaces, but changed whenever create_submap was introduced in dofmap.collapse()`.

there are thee ways of «fixing» this:
Option 1: change create submap as Ive done in #2930
Option 2: Do not allow sub spaces of blocked spaces in the same VTXWriter (has the drawbacl that one would have to save the mesh N times instead of 1).
Option 3: remove create_submap from dofmap.collapse and go back to a less general implementation that keeps the order of the ghosts.

@garth-wells
Copy link
Member

The documentation specifies that the elements for the different functions must be the same, whereas the implementation requires that the element and the dofmap are the same. Immediate issues to be fixed are:

  1. Raise an exception (at least in debug mode) in VTXWriter if the elements and/or dofmaps are not the same.
  2. Fix the documentation.

A second step could be:

  1. Support cases with the same element but different dofmaps.

It is best if this class imposes as few requirements as reasonably possible on the arguments.

@jorgensd
Copy link
Member

  1. Support cases with the same element but different dofmaps.

This means writing the mesh twice, or compute the permutation/mapping from one dofmap to the other, which I am not sure is worthwhile complexity.

i still think this is something that should be fixed collapsed function space code. It is silly that we keep the same numbering for the local dofs, but not the ghosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants