Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Towards Type Stability #101

Closed
wants to merge 5 commits into from
Closed

WIP: Towards Type Stability #101

wants to merge 5 commits into from

Conversation

cortner
Copy link
Member

@cortner cortner commented May 18, 2024

This PR is to address a number of itches I have with the current implementation of the interface and its implementation. So far the goal is to keep it entirely backward compatible. I think this may be possible. I believe I am still very close to backward compatible in the interface, though no longer in the internals.

The plan is to address the following issues:

This is now ready for comments.

Bugs and TODOs

  • make all tests pass
  • write tests to show equivalent when using FastAtom
  • write tests that show performance when using FastAtom
  • cannot reinterpret ChemicalElement as UInt8
  • I would like to delete or deprecate DirichletZero
  • I would like to replace AtomView with FastAtom for fast_system[i]. The latter is "isbits" and hence performant. Also much much easier to use and implement, basically working like an SVector.

@cortner
Copy link
Member Author

cortner commented May 18, 2024

Recording changes

  • Introduce PCell type to encode a computational cell with pbc flags
  • Introduce SystemWithCell{D, TCELL} to help with dispatch
  • Adapt FlexibleSystem and FastSystem to accept PCell instead of bounding_box and boundary_conditions
  • Introduce FastAtom to be compatible with FastSystem
  • Made FlexibleSystem mutable.
  • make Bounding box and bc tuples
  • introduce OpenBC to replace DirichletZero

More details:

  • The P in PCell can stand for periodic or for "parallelepiped" (cc Rachel). I don't care really. But I didn't want to call it Cell. If somebody has a better name I will be happy to change it.
  • bounding box and bc should be tuples instead of vectors since (i) inferrable length and (ii) there should be no arithmetic performed on them.
  • The behaviour of isinfinite has changed, it now returns a D-tuple since the system can be infinite in some dimension but not others.

@cortner
Copy link
Member Author

cortner commented May 18, 2024

Question for the community: If I define a system that has open bc in some or all directions (i.e. is infinite in those directions) then the cell-vector pointing in that direction technically doesn't really specify a bounding box anymore. Should the call to bounding box in that case compute the actual extent of the system as specified by the positions of the particles?

My intuition is that this is something that the neighbourlist should do and not the system. I would consider returning the cell vector or NaN or Inf in those cases. But I do not have a strong view yet.

@cortner
Copy link
Member Author

cortner commented May 19, 2024

@mfherbst Do you mind explaining this test to me:

@testset "Chemical formula with system" begin
    lattice     = [12u"bohr" * rand(3) for _ in 1:3]
    atoms       = [Atom(6, randn(3)u"Å"; atomic_symbol=:C1),
                   Atom(6, randn(3)u"Å"; atomic_symbol=:C2),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                  ]
    system      = periodic_system(atoms, lattice)
    @test atomic_symbol(system) == [:C1, :C2, :D, :D, :D]
    @test element_symbol(system) == [:C, :C, :H, :H, :H]
    @test chemical_formula(system) == "C₂H₃"
end

I take it the point is to test use of isotopes. I thought we had agreed this needed to be done differently. But I may well misremember that. This seems to be the only thing that prevents me from completing this PR.

Should there be another particle type Isotope?

struct Isotope
   atomic_number::UInt8
   nneutron::Int
end

How many people in this community care about isotopes?

EDIT: I think there are two alternative solutions:

  • one would be to simply add the nneutron field to ChemicalElement
  • the second would be to maintain a list of isotopes in chem.jl and give those isotopes ids starting with 1000. E.g. 1001 => D. Then the chemical element type gets modified to become
struct ChemicalElement
    id::Int 
end 

with id no longer specifying atomic number but an index into the list of possible atom types. This isn't without danger. One would need to maintain permanent backward compatibility of ids I think and I probably prefer the isotope .

@cortner
Copy link
Member Author

cortner commented May 19, 2024

My last question for now: Are the printing tests really needed? Why would we expect the ASCII output to remain unchanged when we make changes to the code?

@tjjarvinen
Copy link
Collaborator

tjjarvinen commented May 20, 2024

I would add isotopes to ChemicalElement.

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

One way to add this would add extra field to chemical element. Albeit it could be moved to somewhere else too.

stuct ChemicalElement
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

Note, I prefer using number of nucleons over number of neutrons, as then number 0 could be interpret, as field ignored. While with number of neutrons this is not possible.

The total bit size of the structure should be 64, 32, 16 or 128. For 64-bit system the structure is machine level will always be multiple of 64, so that size would be good target.

@cortner
Copy link
Member Author

cortner commented May 20, 2024

I am not too keen to make ChemicalElement too rich - that goes against the idea of a simple prototype implementation. We could have an extended chemical element type. Anyway, I'm happy to go with whatever the majority thinks is useful.

I'm not too sure the size matters too much at this stage, since it will normally be paired with positions, velocities, mass, etc. ... but it's an interesting suggestion for performance tweaks down the road?

@tjjarvinen
Copy link
Collaborator

tjjarvinen commented May 20, 2024

Cache line is 64 bytes. So, 3*Float64 for positions, mass is Float64, velosity is another 3*Float64. If on top of this chemical element is 64-bits then it totals 64 bytes. Meaning that it would very efficient to move in memory. I would definitely go for this.

We could simply implement

 stuct ChemicalElement
   Z::UInt64
end

stuct ChemicalElementExt
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

With these you can reinterpret the simple ChemicalElement to ChemicalElementExt that has option to have the extra features. Thus we have the default option to go for the simple ChemicalElement. While still have the option to define isotopes and additional features.

@cortner
Copy link
Member Author

cortner commented May 20, 2024

I understood the idea. I think it is interesting and I'm happy to be convinced that this is the way to go if enough people agree ... I doubt though that this will be the bottleneck for many interesting applications. Ie anything beyond pair interactions...

@mfherbst
Copy link
Member

I take it the point is to test use of isotopes.

Yes and the fact that some input files actually do not store atom types not just based on the chemical symbols, but add numbers in the end, e.g. in cif files you frequently have C1, C2 etc. instead of just C, which not necessarily means that the carbons are different isotopes. This needs to be somehow dealt with (e.g. by separating element from atomic symbol).

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

Yes that is exactly the case that we need to cover. The isotopes are not so relevant indeed.

In that sense the most simple implementation would be perhaps

struct c
   Z::UInt16
   N::UInt16
   id::UInt32
end

or even without the N property ?

My idea here is that the id is used to render otherwise identical objects different. What distinguish them is sort of out of scope at this simplest interface and is up to the implementation (e.g. by having a Dict{ChemicalElement,Any} to store additional information ?)

Are the printing tests really needed?

Not necessarily to test the precise display from my point of view but to ensure the code runs and does not crash. Crashing code on printing is super annoying and should not happen.

@cortner
Copy link
Member Author

cortner commented Jun 27, 2024

In view of discussions in other channels I think this PR will likely be closed unmerged and some of the points raised here brought back after the interface update.

@cortner
Copy link
Member Author

cortner commented Aug 13, 2024

closing since #107 is now at about the same stage as this old PR

@cortner cortner closed this Aug 13, 2024
@mfherbst mfherbst deleted the co/atom branch October 23, 2024 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants