-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed precision MD #355
Comments
Agreed, we can't reasonably assume a certain number of decimal places are needed, and double-single is really only beneficial on something like GeForce where fp64 is crippled. In contrast, even Tesla cards might still benefit for using a mixture of native fp32 and fp64 ops where appropriate.
I had thought about this a little bit as a potential project for someone. One flexible solution might be to replace the current Then, the default build could be mixed precision with I think it's an open question how much performance is really available to be gained from doing this (we could try to guess by comparing double to single in the current version for some representative benchmarks). It has always sounded like an awful lot of effort because someone will really need to scrape line-by-line and make the needed changes very carefully. |
This has come up again. We don't need to necessarily take on this entire project in one PR. If we implement the base data types and compilation options, then we can slowly introduce mixed precision to portions of the codebase as key bottlenecks are identified. With this in mind: While an eventual goal might be to remove As discussed by @mphoward above, the first thing we need is a name for the new data types.
The alternative is to stop considering supporting full single precision builds and only support full double and mixed modes. In this case |
I think we should decide how we want to approach this problem in general, as it might affect whether it makes sense to keep supporting single-precision builds. If we only want to support double and mixed modes, we would need to go through the code and identify points were doubles could safely be converted to floats. If we want to support single precision, there could still be parts where doubles are required (this is the case in MPCD, and I could imagine is also the case in HPMC), so we could need to decide split all the doubles into "should-be" (i.e., typedef In any case, I'm in favor of typenames like the first option, but I might propose an alternative of |
I like the The general wisdom from other projects that have implemented mixed precision MD is that the particle coordinates and integration steps need to be in double precision, while individual force computations can be performed with floats. The summation of net forces and energies either needs to be double or use something like Kahan summation, or both (easiest to just use doubles). HPMC already has a complete mixed precision mode. We would only need to replace I agree that fully automating this process is challenging. That's why I'm proposing a process for gradually adding in support. E.g. we could convert MD pair potentials in one PR, bond potentials in another, MPCD in another, and so on. We will need to get our MD validation framework enabled for the v3 builds in order to verify that the changes don't break simulation correctness. We will also need to put together some energy conservation test scripts to further validate the changes beyond what the automated tests can check. |
Excellent, I support this plan. MPCD already works in mixed precision as well, although it is in a different sense of some steps require double. I will think about whether that particle data needs to be single or double to address in a later PR. One other core change we might need to make is in |
@mphoward's proposal (two separate typedefs) is how I imagined this would be implemented, so I also support this plan. I don't have a strong opinion on the naming; I agree that avoiding names that could be confuse with built-names ( |
While I would like to get this started, I'm going to focus on completing the 3.0 release first. Will start work on mixed precision in 3.1+ |
When adding mixed precision MD, consider adding single-precision optimized GPU code to the fat binary: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list |
Also:
https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html#improved_fp32 |
The current single precision optimized GPUS have a 1:64 double:single ALU ratio. This makes double operations so expensive that even a small number of ops per particle pair drastically drops performance. I implemented the mixed precision mode discussed here. I found that the main bottleneck kernels are the 1) Neighbor list build and 2) Pair force evaluation. The net force summation, integration, cell list, and other kernels on A40 were comparable to timings on A100 as the time to run these kernels is dominated by launch latency. To support large and/or dilute systems, we must compute the delta r and box minimum image convention for particle pairs in double. Otherwise, particles near each other in large boxes will loose precision: For example single precision coordinates in the 1000's have only 2 sig figs left in the delta: 1/10 is far below useable performance, so I am not going to pursue this work further at this time. I will open a PR on |
Description
Implement mixed precision for MD simulations.
Motivation
Compared to a full double precision build, mixed precision may offer performance benefits - especially on GeForce cards. Compared to a full single precision build, mixed precision simulations will conserve energy and momentum significantly better.
Implementation details
The most reasonable mixed precision model for HOOMD is to maintain particle coordinates in double, compute forces in single, and accumulate forces in double. HOOMD is too general for fixed precision force calculations, and double-single implementations do not offer enough benefits given the implementation complexity.
Care must be taken in testing a new mixed precision implementation evaluating the energy and momentum conservation of a number of test systems that exercise the relevant code paths (pair potentials, bond potentials, ....)
The HPMC mixed precision and a new MD mixed precision type should be merged and controlled by the same compile time option.
Question for debate
Should we continue to support single, double, and mixed precision builds? Certainly this will be helpful in developer testing, but it may require significant maintenance overhead in the future. If we decide to keep support more than one combination, which do we test and validate for users? Only mixed?
The text was updated successfully, but these errors were encountered: