Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid.build_face_edges_connectivity #205

Closed
philipc2 opened this issue Jan 15, 2023 · 5 comments · Fixed by #191
Closed

Grid.build_face_edges_connectivity #205

philipc2 opened this issue Jan 15, 2023 · 5 comments · Fixed by #191
Assignees
Labels
new feature New feature or request
Milestone

Comments

@philipc2
Copy link
Member

Implement Grid.build_face_edges_connectivity

@philipc2 philipc2 added the new feature New feature or request label Jan 15, 2023
@philipc2 philipc2 linked a pull request Jan 15, 2023 that will close this issue
@philipc2
Copy link
Member Author

philipc2 commented Jan 15, 2023

@hongyuchen1030 Have you had a look at the xugrid implementation for this (and other) associated connectivity functions? I'm currently taking some time to review #191 and familiarize myself with xugrid since much of the initial discussion about this package was prior to me joining the team last summer.

I'll also ping @Huite for any feedback and to start up a similar discussion to #130 (comment)

Relevant References

#19 (comment)

https://deltares.github.io/xugrid/examples/connectivity.html

https://github.com/Deltares/xugrid/blob/main/xugrid/ugrid/connectivity.py#L231

@hongyuchen1030
Copy link
Contributor

Thank you for your information. I will take a look at it.

@hongyuchen1030
Copy link
Contributor

hongyuchen1030 commented Jan 17, 2023

After studying the XUGRID, I have some questions:

  1. How did it deal with the dummy nodes from the face_node_connectivity?
    After reading through the close_polygons

    I noticed that the face_node_connectivity cannot have the same Fill_Value as the def close_polygons(face_node_connectivity: IntArray, fill_value: int) used. But according to the UXARRAY documentation, the input
    face_node_connectivity will have Fill_values. The current data read-in will make the FillValue=nan instead of -1 #189

    For example
    face_node_connectivity = [[0, 1, 2, 3], [1, 4, 3, 5], [2, 4, 3 -1]] # The original face_node_connectivity, the last face has the Fill_Value as -1
    If the user called close_polygons(face_node_connectivity, fill_value=-1)Then after line 178(https://github.com/Deltares/xugrid/blob/9741f5bdc47a8f500ba1e562d54fa20b01dd1b9c/xugrid/ugrid/connectivity.py#L173) here, it will become
    closed = [[0, 1, 2, 3, -1], [1, 4, 3, 5, -1], [2, 4, 3 -1, -1]] # Fill the last indice as -1
    And after line181:
    closed = [[0, 1, 2, 3, 0], [1, 4, 3, 5, 0], [2, 4, 3 2, 2]] # It will replace every -1 with the first_node
    Which apparently is not what we want for the last face. Since it also replace its Fill_Value with the first_node

  2. The np.full() function cannot work properly when encountering nan values
    So in close_polygons function
    line 177 closed = np.full((n, m + 1), fill_value, dtype=IntDType),
    If our face_node_connectivity contains nan, it will be converted into -9223372036854775808 by the np.full function, causing problems for the following implementations

@Huite Thank you very much for your codes and it helps a lot with my python vectorizations.

However, about the questions above, I would like to know if I missed something about your implementations of these marginal cases.

@philipc2 And thank you for your help again, after a careful walk-through of the XUGRID codes, I think my implementation ways might be more suitable with the XARRAY. But I definitely will make it more python styles by referring to the usages in the XUGRID.

@Huite
Copy link

Huite commented Jan 19, 2023

Hi @hongyuchen1030

Do you have specific questions?

I'll just make a couple of remarks here, also from briefly reading through #189.

  • The face_node_connectivity is fundamentally an indexing array. As you cannot index with floating point arrays, you'd regularly have to convert to integer arrays, possibly dropping the fill values. Unfortunately, given a _FillValue, xarray will convert it to floats, with NaNs as the fill value. In my setup, I separate the UGRID variables from the other variables and stuff it into a separate topology object. I do the conversion to integer there: https://github.com/Deltares/xugrid/blob/8168ece26465b565382b85a4a07b151d5d0a9845/xugrid/ugrid/ugridbase.py#L295
    (In my case, I expected the user to really not see the connectivity arrays.)
  • I've chosen for a default value of -1. -1 is a valid indexer in Python, and I've been bitten a few times by it. Of course a fill value is just that, a fill value. As long as it's not in the range of the valid indexes (0 .. n_face), it's not a problem. Probably the best choice is something that will error: e.g. np.iinfo(np.int32).min or np.iinfo(np.int32).max (or the 64 bit version). This could be easily changed by introducing a DEFAULT_FILL_VALUE constant.
  • Note that a negative value is not a valid indexer in e.g. Fortran, so except for the fill_value, the face_node_connectivity array should never contain negative numbers.
  • I don't think you understand the purpose of the close_polygons function. The only reason I use this, is so that I can do the vectorized numpy operations afterwards. Because I sort row by row, then call call np.unique afterwards, the duplicates that are introduced here are removed anyway, but it also ensures an edge is created between the first and the last node.
  • Note that duplicating vertices works in many cases: I also use this approach to compute the face area. (Because a triangle with two duplicated vertices will have an area of 0): https://github.com/Deltares/xugrid/blob/main/xugrid/ugrid/connectivity.py#L311

You also ask about numba in #202. Numba primarily works with numpy arrays. It understands (reimplements) many numpy functions and methods. As long as your functions consume and produce numpy arrays you can easily mix vectorized (numpy) functions and numba compiled functions.

I've chosen to use mostly vectorized numpy (and scipy sparse) function. This results in a significantly shorter code, but it's not necessarily much more readable. But if you're dealing with permutations of permutations, it's not trivial to keep in mind anyway
The biggest upside of numba is that you don't allocate a lot of temporary arrays like you have to with numpy (e.g. creating an index array). The downside is that you run into compilation overhead (although you can cache with numba) and you have to write a lot of loops (more lines, more opportunity to make mistakes). However, you're mostly sorting and getting unique values (which can be easily done from sorted arrays: see https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP). I'm quite confident the numpy sorting functions are performant.

I see you're using a Python set. Python sets should be pretty fast (we'd expect pretty optimized C code in the background), but numpy.unique should find unique values faster. Interestingly though, with just ad hoc some benchmarking, it looks like the numpy_indexed unique is two times faster than just numpy.unique..., which is in turn two times faster than a set:

import numpy as np
import numpy_indexed as npi

nodes = np.arange(2_000_000)
a = np.random.choice(nodes, 500_000)
b = np.random.choice(nodes, 500_000)
edges = np.column_stack((a, b)) 

%timeit set(tuple(row) for row in edges)
%timeit np.unique(edges, axis=0)
%timeit npi.unique(edges, axis=0)

# 631 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 261 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 116 ms ± 903 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Note that you can return the index and the inverse from the unique calls as well at essentially no cost, which can be rather useful!

@hongyuchen1030
Copy link
Contributor

Hi @Huite

Thank you so much for your explanation, that helps a lot.

I think my understanding of the close_polygons function is the same as yours. Just that since the face_node_connectivity in our project will contain _Fill_Value_, some of your implementation might not be compatible with ours.

And thank you for the information about np. unique, I will think about how to better utilize the numpy function in my implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants