Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure material models with Eigen library #297

Open
1 task done
aabrown100-git opened this issue Nov 4, 2024 · 14 comments
Open
1 task done

Restructure material models with Eigen library #297

aabrown100-git opened this issue Nov 4, 2024 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@aabrown100-git
Copy link
Collaborator

aabrown100-git commented Nov 4, 2024

Problem

mat_models.cpp and mat_models_carray.h are basically redundant. They both implement material models, except mat_models.cpp uses custom Arrays and mat_models_carray.h uses native carrays. Currently, the get_pk2cc_dev() function from mat_models.cpp is used for ustruct, while the get_pk2cc() function from mat_models_carray.h is used for struct. Also, there is a lot of redundancy between get_pk2cc() and get_pk2cc_dev().

In addition, both Arrays and carrays have drawbacks. Arrays allow to clean syntax, but lead to slower code. carrays are faster, but require clunky syntax for basic operations and are memory unsafe.

All of this makes it confusing and leads to extra work when implementing or restructuring materials models.

Solution

  • Consolidate mat_models.cpp and mat_models_carray.h and use only Arrays
  • Consolidate get_pk2cc() and get_pk2cc_dev()
  • Replace all Arrays with Eigen arrays.

Additional context

This is a useful first step for subsequent restructuring and optimizations of the material models.

Code of Conduct

  • I agree to follow this project's Code of Conduct and Contributing Guidelines
@aabrown100-git aabrown100-git added the enhancement New feature or request label Nov 4, 2024
@aabrown100-git aabrown100-git self-assigned this Nov 4, 2024
@aabrown100-git
Copy link
Collaborator Author

aabrown100-git commented Nov 7, 2024

I consolidated get_pk2cc() and get_pk2cc_dev(), and also consolidated mat_models.cpp and mat_models_carray.h (I found it more challenging to do these steps sequentially, so I just did them all together).

I decided to remove the carray implementations (in mat_models_carray.h) and make everything use Array templates (in mat_models.cpp). The material unit tests and the integration tests all pass.

You can find the latest branch here.

Next, I think we should use this branch to test out how to use Eigen arrays. After some preliminary thinking, I think a way to organize this would be to decide to use Eigen arrays "below" some level in the code. For example, we could convert all arrays in get_pk2cc() to Eigen arrays, but in struct3d() still use Array templates, and have some code to convert appropriately. Also, I think we should still use templated functions to deal with 2 or 3 spatial dimensions, so that the code is suitably performant. chatGPT suggested a nice way to implement this using a helper function.

@aabrown100-git aabrown100-git changed the title Consolidate mat_models.cpp and mat_models_carray.h Restructure material models with Eigen library Nov 7, 2024
@aabrown100-git
Copy link
Collaborator Author

Started very preliminary testing of the conversion to Eigen on the same branch.

@ktbolt
Copy link
Collaborator

ktbolt commented Nov 11, 2024

Array and Tensor4 objects were replaced with Eigen::Matrix objects in the get_pk2cc() and _get_pk2cc() functions

const Array<double>& F  replaced by const Eigen::MatrixXd& F

Array<double>& S replaced by Eigen::MatrixXd& S

Tensor4<double> replaced by  Eigen::Tensor<double, 4>

Btw, get_pk2cc() is not a helper function.

@aabrown100-git
Copy link
Collaborator Author

I got get_pk2cc() working for NeoHookean. It now passes NeoHookean unit tests and struct/ustruct integration tests that use a NeoHookean material model.

I decide to use fixed-size Eigen Matrices (e.g. Eigen::Matrix3d) as much as possible, and use fixed-size Tensors (e.g Eigen::TensorFixedSize<double, Eigen::Sizes<nsd,nsd,nsd,nsd>>) for higher-order tensors.

It should be relatively straightforward to translate the other material models. We will probably have to reimplement some tensor operations (e.g. ten_transpose, ten_ddot) for Eigen tensors.

@ktbolt
Copy link
Collaborator

ktbolt commented Nov 18, 2024

@aabrown100-git Very nice! I believe using fixed-size Eigen Matrices is the way to go.

Do you think the performance is better or about the same?

Have you replaced loops with expressions?

@aabrown100-git
Copy link
Collaborator Author

I ran https://github.com/SimVascular/svFSIplus/tree/main/tests/cases/struct/block_compression for 10 timesteps as a basic timing test. Looks like the new implementation with Eigen is roughly 50% slower than the old implementation with carrays. @ktbolt Can you recommend any profiling techniques?

Yes, at least for the NeoHookean implementation, there are no for loops.

Also, any idea why Ubuntu can't find the Eigen library?

@ktbolt
Copy link
Collaborator

ktbolt commented Nov 18, 2024

@aabrown100-git You can use gprof to profile the code on Ubuntu.

It might be easier and more informative to write some simple programs to model the differences in efficiency of C-style arrays versus how you are using Eigen.

Eigen is installed in ThirdParty but I bet the svFSIplus CMake is not finding it. We have probably been using an already installed Eigen when we test.

@aabrown100-git
Copy link
Collaborator Author

@ktbolt Got it thanks. Will work with @lpapamanolis and @msbazzi on these things!

@aabrown100-git
Copy link
Collaborator Author

aabrown100-git commented Nov 27, 2024

I finished converting get_pk2cc() to use fixed-size Eigen matrices and tensors (Eigen::Matrix<double, nsd, nsd> and Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd, nsd, nsd>>). For tensor operations in mat_fun.h, I have mostly replaced all for loops with built-in Eigen operations. On my mac, it passes all struct and ustruct integration tests, as well as the material model unit tests. Next, I think we can see how fast/slow this is compared to the current C-array implementation.

One thing that may improve performance is optimizing mat_fun::toEigenMatrix(), mat_fun::toArray(), and mat_fun::copyDm which currently copy F, S and Dm from an Eigen matrix to an Array template or vice versa.

Also, just want to mention a couple things that gave me trouble.

  1. Using the auto type keyword sometimes leads to incorrect results with no compilation or runtime error. For example, in the function
template<size_t nsd>
std::pair<EigenMatrix<nsd>, EigenTensor<nsd>> bar_to_iso(
  const EigenMatrix<nsd>& S_bar, const EigenTensor<nsd> &CC_bar, 
  const double J2d, const EigenMatrix<nsd>& C, const EigenMatrix<nsd>& Ci) 
  {

  using namespace mat_fun;

  // Useful scalar
  double r1 = J2d * mat_ddot_eigen<nsd>(C, S_bar) / nsd;

  // Compute isochoric 2nd Piola-Kirchhoff stress
  auto S_iso = J2d*S_bar - r1*Ci;

  // Compute isochoric material elasticity tensor
  EigenTensor<nsd> PP = ten_ids_eigen<nsd>() - (1.0/nsd) * ten_dyad_prod_eigen<nsd>(Ci, C); // Important: using auto here causes tests to fail
  auto CC_iso = ten_ddot_eigen<nsd>(CC_bar, PP);
  CC_iso = ten_transpose_eigen<nsd>(CC_iso);
  CC_iso = ten_ddot_eigen<nsd>(PP, CC_iso);
  CC_iso += (-2.0/nsd) * (ten_dyad_prod_eigen<nsd>(Ci, S_iso)  + ten_dyad_prod_eigen<nsd>(S_iso, Ci));
  CC_iso += 2.0 * r1 * ten_symm_prod_eigen<nsd>(Ci, Ci)  +  (- 2.0*r1/nsd) * ten_dyad_prod_eigen<nsd>(Ci, Ci);

  return std::make_pair(S_iso, CC_iso);
}

if I replace EigenTensor<nsd> PP = ... with auto PP =..., the code compiles and runs without error, but the resulting material elasticity tensor is wrong, leading to failure in the material model unit tests and poor convergence in integration tests.

  1. While it is working, I still don't quite understand how the Eigen Tensor operation shuffle works. For example, in the function
 template <int nsd>
    Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd, nsd, nsd>>
    ten_symm_prod_eigen(const Eigen::Matrix<double, nsd, nsd>& A, const Eigen::Matrix<double, nsd, nsd>& B) {
        
        // Create empty contraction dimensions
        Eigen::array<Eigen::IndexPair<int>, 0> contractionDims = {};

        // Convert the input matrices to Eigen::TensorFixedSize so we can use the contract method
        Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd>> A_tensor = Eigen::TensorMap<const Eigen::Tensor<const double, 2>>(A.data(), nsd, nsd);
        Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd>> B_tensor = Eigen::TensorMap<const Eigen::Tensor<const double, 2>>(B.data(), nsd, nsd);


        // Compute the outer product of A and B, A_ij * B_kl
        // The contraction of A and B with no dimensions specified is the outer product
        Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd, nsd, nsd>> AB = A_tensor.contract(B_tensor, contractionDims); // A_{ij} * B_{kl}

        // Compute the product term1 = A_ik * B_jl and term2 = A_il * B_jk by shuffling the indices
        Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd, nsd, nsd>> term1, term2;
        term1.shuffle(Eigen::array<int, 4>{0, 2, 1, 3}) = AB; // A_{ik} * B_{jl}
        term2.shuffle(Eigen::array<int, 4>{0, 3, 1, 2}) = AB; // A_{il} * B_{jk}

        // Compute the symmetric product: C_{ijkl} = 0.5 * (A_{ik} * B_{jl} + A_{il} * B_{jk})
        Eigen::TensorFixedSize<double, Eigen::Sizes<nsd, nsd, nsd, nsd>> C = 0.5 * (term1 + term2);

        // Return the symmetric product
        return C;
    }

term2 is apparently computed as $term2_{ijkl} = AB_{iljk} = A_{il} * B_{jk}$. I had originally written term2 = AB.shuffle(Eigen::array<int, 4>{0, 3, 1, 2}), which is actually a different operation and leads to the incorrect calculation of this symmetric product.

@ktbolt
Copy link
Collaborator

ktbolt commented Nov 27, 2024

@aabrown100-git Very nice!

For a discussion of using auto see https://eigen.tuxfamily.org/dox/TopicPitfalls.html.

@aabrown100-git
Copy link
Collaborator Author

Great thanks for that link. Just replaced most of the autos with EigenMatrix<nsd>.

@aabrown100-git
Copy link
Collaborator Author

aabrown100-git commented Dec 8, 2024

I profiled my latest Eigen implementation as well as the current C-array implementation using Xcode Instruments on my Mac.

(As an aside, I tried a whole bunch of other profiler tools, like gprof, Valgrind, Easy Profiler, and could not get any of them to work on my Mac. Instruments worked with little effort, so I recommend using it)

See revised numbers in next comment

I found that the Eigen implementation is much slower than the C-array implementation. To give an example, on the Holzapfel-Ogden Modified-Anisotropy test case, get_pk2cc() took 819 ms with C-arrays, and 10.05 s with Eigen (over 10x slower). Most of this slow down is due to tensor operations. For example, the most expensive function calls in get_pk2cc() are ten_dyad_prod(), which took a total of 279 ms with C-arrays, and 1.98 s with Eigen.

This slow down for small Eigen Tensors seems to be somewhat known. The last response from this stack overflow question provides this comparison for matrix-matrix multiplication

image

(I am currently using EigenTensorFixed)

I think we should reconsider using Eigen, or at least Eigen Tensors. Eigen Tensors are technically unsupported by the Eigen project. We could 1) Try to only use Eigen Matrices, 2) Go back to C-arrays for these performance critical tensor operations, 3) Try other tensor libraries as listed here

@aabrown100-git
Copy link
Collaborator Author

aabrown100-git commented Dec 8, 2024

UPDATE ON EIGEN PERFORMANCE TESTS

I just realized I was using the Debug CMAKE_BUILD_TYPE, which seems to slow down Eigen a lot. With the default CMAKE_BUILD_TYPE=RelWithDebInfo, the slow down is less, but still significant.

I also increased the number of timesteps in the test case to 5, to give a longer total simulation time.

The total simulation took 7.3 s with C-arrays and 8.72 s with Eigen. get_pk2cc() took 563 ms with C-arrays and 1.43 s with Eigen. ten_dyad_prod() took 212 ms with C-arrays and 433 ms with Eigen.

We should do some more timing, but in light of these new numbers, perhaps Eigen Tensors are performant enough. Although, in some preliminary testing with ustruct (which in the latest version of main branch uses the custom Array class), Eigen performance is on par with the custom Array performance.

@ktbolt
Copy link
Collaborator

ktbolt commented Dec 10, 2024

@aabrown100-git There are two primary goals of converting the Array templates to Eigen

  1. Produce a cleaner, clearer implementation (e.g. replace indexing by algebraic expressions)
  2. Improve performance

These goals are often conflicting so we need to decide which is more important. I would choose 1) with an acceptable performance hit.

Goal 1)
We don't want to replace Array with more complicated Eigen code. For example, it would be better to just use Eigen with indexing for the ten_symm_prod function rather than the complex function shown above. Anyone (even me) can understand

  for (int l = 0; l < nd; l++) {
    for (int k = 0; k < nd; k++) {
      for (int j = 0; j < nd; j++) {
        for (int i = 0; i < nd; i++) {
           C(i,j,k,l) = 0.5 * ( A(i,k)*B(j,l) + A(i,l)*B(j,k) );
        } 
      }
    }
  }

A compiler might also be able to optimize the indexing or we could do our own loop unrolling.

Goal 2)
We need to make sure that Eigen is using vector instructions. You can check this by looking at the assembly code for a test problem. Or maybe Eigen can tell you? You can try different compiler flags: -march=native -ffast-math -fno-math-errno. You should run on different processors and use different compilers.

We also need to understand how best to use Eigen, how to organize statements that can be optimized. For example, is it better to have one large algebraic statement or is it better to break it up into several statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants