Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create SolutionBase object in C++ #696

Merged
merged 17 commits into from
Oct 23, 2019
Merged

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Aug 16, 2019

Please fill in the issue number this pull request is fixing:

Fixes #695, fixes #691 .

Changes proposed in this pull request:

  • Create representation of _SolutionBase object in C++ layer (named Solution)
  • Link object into Cython interface
  • Assign phase name in YAML (see Incorrect Solution.ID for YAML generated objects #691) and resolve conflation of gas.ID and gas.name in unit tests
  • Replace ID by more meaningful phase_id attribute. The changed attribute is PEP8 compliant and corresponds to the YAML entry
  • Deprecate Phase::id and Phase::setID
  • Details:
    • New Solution object is defined in Solution.h/Solution.cpp
    • New object collects ThermoPhase, Kinetics and Transport managers
    • Preserves information how managers were constructed as type (e.g. Solution, Interface)
    • Move/share name attribute from Phase to _SolutionBase and ensure unique default names generated in Cython (anticipates uses for serialization)
    • Deprecate ID

Illustration

There are only minor additions to the user interface; changes mostly benefit "work under the hood".

In [2]: gas = ct.Solution('nDodecane_Reitz.yaml')

In [4]: gas.name # fix of #691 
Out[4]: 'nDodecane_RK'

In [5]: gas.composite
Out[5]: ('RedlichKwong', 'Gas', 'Transport')

In [6]: gas.thermo_model, gas.kinetics_model, gas.transport_model
Out[6]: ('RedlichKwong', 'Gas', 'Transport')

In [7]: gas.ID
DeprecationWarning: To be removed after Cantera 2.5. Use `name` attribute instead
Out[7]: 'nDodecane_RK'

In [9]: gas = ct.Solution('nDodecane_Reitz.yaml', phaseid='nDodecane_IG')
FutureWarning: Keyword `name` replaces `phaseid`

Comments

The following advantages are anticipated:

Beyond this PR ... moving creation of objects from Phase/ThermoPhase to Solution would enable switching of thermo models on the fly (similar to transport) and thus eliminate the need for multiple entries in yaml for multiple thermo models (e.g. nDodecane_RK and nDodecane_IG differ in exactly one entry, i.e. thermo, which is either ideal-gas or Redlich-Kwong)

@ischoegl ischoegl changed the title Cpp composite object Create SolutionBase object in C++ Aug 16, 2019
@ischoegl ischoegl closed this Aug 16, 2019
@ischoegl ischoegl reopened this Aug 16, 2019
@codecov
Copy link

codecov bot commented Aug 16, 2019

Codecov Report

Merging #696 into master will decrease coverage by 0.01%.
The diff coverage is 73.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #696      +/-   ##
==========================================
- Coverage   70.63%   70.61%   -0.02%     
==========================================
  Files         372      374       +2     
  Lines       43567    43590      +23     
==========================================
+ Hits        30773    30783      +10     
- Misses      12794    12807      +13
Impacted Files Coverage Δ
include/cantera/base/Base.h 72.72% <72.72%> (ø)
src/base/Base.cpp 75% <75%> (ø)
src/tpx/Hydrogen.h 75% <0%> (-25%) ⬇️
src/tpx/Oxygen.h 75% <0%> (-25%) ⬇️
src/tpx/Methane.h 75% <0%> (-25%) ⬇️
src/tpx/Heptane.h 75% <0%> (-25%) ⬇️
include/cantera/transport/WaterTransport.h 0% <0%> (-20%) ⬇️
include/cantera/kinetics/EdgeKinetics.h 80% <0%> (-20%) ⬇️
src/tpx/HFC134a.h 83.33% <0%> (-16.67%) ⬇️
src/transport/GasTransport.cpp 90.58% <0%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 265a186...348cbc3. Read the comment docs.

@speth
Copy link
Member

speth commented Sep 28, 2019

I had some some high-level comments on this, which it might be worth discussing before going further on the implementation of this. I'll provide a more detailed review once we work out the big picture.

I'm coming around to the idea that a class of this sort would be useful, especially for the second capability you mentioned, where it would serve as the thing that can be serialized to a YAML phase definition.

Regarding the name / ID / phase attributes: Despite the description of this in Phase.h (and now in Base.h) I've never quite understood the need for two distinct identifiers for a phase. I'm pretty sure we could just get rid of "ID" and skip adding the "phase" identifier.

I don't quite understand why the SolutionBase object doesn't provide access to the thermo / kinetics / transport objects. Is this not something you see as useful, or just something to defer until later?

Why is this new class SolutionBase and not just Solution? Are you anticipating other classes derived from this, and what would their role be? The Python _SolutionBase class only for the purpose of dealing with the special inheritance limitations of Cython classes.

serve as a starting point for C++ copy constructors of SolutionBase and associated managers via YAML, which would be useful for true parallel processing.

Perhaps I'm missing something, but I don't think C++ copy constructors are useful for parallel processing. They only provide a way of copying an object within the memory space of a single process, which doesn't help with transmitting objects to other parallel processes (but of course, YAML
serialization could be used for this).

moving creation of objects from Phase/ThermoPhase to SolutionBase would enable switching of thermo models on the fly

I'm not sure I follow - in Python, the creation of all the underlying C++ objects is done in _SolutionBase. And in C++, the creation of the ThermoPhase / Kinetics / Transport objects is independent. There's nothing preventing you from creating a new ThermoPhase with a different model and linking that to a previously-existing Kinetics and Transport object.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 28, 2019

@speth ... thanks for your comments! I agree that a high-level discussion is warranted before it makes sense to start a detailed review.

I'm coming around to the idea that a class of this sort would be useful, ...

👍 Really happy to hear!

Regarding the name / ID / phase attributes: [...] I've never quite understood the need for two distinct identifiers for a phase.

My opinion on this is that within the context of serialization, name would be the solution type (class identifier), and phase would be an unique identifier Edit: ... ahem ... phase would be the same as specified in YAML, and name would be the unique identifier (at least that's how this PR handles this distinction).

I don't quite understand why the SolutionBase object doesn't provide access to the thermo / kinetics / transport objects. Is this not something you see as useful, or just something to defer until later?

It's the last: defer until later. In the current implementation, there isn't a need, as I mainly looked into the plumbing from Python back to C++. I am anticipating that the object could become important in C++ as a one-stop shop access point for various managers, where this type of access would become essential.

Why is this new class SolutionBase and not just Solution? Are you anticipating other classes derived from this [...]?

My main rationale here was to directly map python nomenclature back to C++. Honestly, I have thought about (ultimately) renaming Base.h to Solution.h, but that requires a bit of work as that name is already taken. There may be future applications for inherited C++ objects, but I haven't looked into this yet (#652 may uncover something eventually, although I don't necessarily see a need either Edit: On second thought, some surface chemistry applications could benefit). I guess the Base in SolutionBase makes sense as it is defined in Base.h.

I don't think C++ copy constructors are useful for parallel processing

Could be, but we don't know how users may deploy the Cantera code in custom applications. At some point, I had started looking into wrapping Cantera into deal.II for 2D simulations (ultimately LaminarSmoke put that effort to an end). Based on that experience, being able to easily replicate a Cantera kernel object (i.e. SolutionBase) would be really convenient.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 28, 2019

PS: ad C++ ramifications. If this object were to be adopted for use within the C++ layer, additional thought needs to go into factories, etc.. I didn't want to go there with this PR (and doubt that it needs to be added immediately).

Ultimately, I think that an almost exact replication of object structures in Python and C++ would benefit maintenance and simplify overall understanding (figuring out where managers are actually instantiated took a while).

@speth
Copy link
Member

speth commented Sep 28, 2019

Honestly, I have thought about (ultimately) renaming Base.h to Solution.h, but that requires a bit of work as that name is already taken.

There's no file in Cantera named Solution.h that I'm aware of. While there are some exceptions to this, I generally prefer the correspondence of Foo.h -> declaration for class Foo.

Ultimately, I think that an almost exact replication of object structures in Python and C++ would benefit maintenance and simplify overall understanding (figuring out where managers are actually instantiated took a while).

I think the challenge in figuring out where things are in the C++ layer is more an indication that how Cantera objects are created in C++ is currently a bit of a mess. While some general correspondence of C++ and Python features is useful, I think we shouldn't expect one language interface to be just a transliteration of the other. Doing that just ends up with each interface being the worst of both worlds based on the limitations of each language and fitting in as part of an idiomatic Python or C++ application. And this quickly becomes unmanageable if you consider additional language interfaces.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 28, 2019

There's no file in Cantera named Solution.h that I'm aware of.

My bad 🤦‍♂️ : it's been a while since I submitted the PR. The (potential) renaming conflict I was recalling was Phase.h, so this one is moot. Still, base.h corresponds to base.pyx, so that's where I tried to keep consistency in the current suggestion.

I think the challenge in figuring out where things are in the C++ layer is more an indication that how Cantera objects are created in C++ is currently a bit of a mess.

🤣 Well, I would have called it 'challenging'. It works well, but it is not straight-forward.

While some general correspondence of C++ and Python features is useful, I think we shouldn't expect one language interface to be just a transliteration of the other.

I do agree on your comments about exact replications, but as far as SolutionBase (or whatever name we may ultimately settle on) goes, my personal inclination would be to eventually have a slender SolutionBase constructor called within a factory, which delegates creation of the other three managers in the current fashion. This - at least in my opinion - would allow for straight-forward serialization.

PS: Cython currently not supporting diamond inheritance (I believe this is still true) is one good example where exact replication is limiting. In this context, I also don't think it to be necessary (nor useful) to replicate the diamond structure in composite.py in C++, as C++ allows for much simpler handling of native objects.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 29, 2019

@speth ... Thought about it some and am 👍 with defining Solution in Solution.h in C++ (instead of SolutionBase in Base.h). While the structure of objects is different, Python's Solution object has the same functionality as the C++ variant In terms of tying the managers together (the Python Solution is, however, much more powerful as it inherits from managers, which I don’t believe makes sense in C++). Let me know if you have a better solution in mind ...

PS: an alternative would be to just call those objects Base. It isn’t very informative, though. Another more consistent version would simply rename Base.h to SolutionBase.h, I.e. ignoring the analogy to Cython.

Ingmar Schoegl and others added 4 commits September 29, 2019 08:18
* Add Base.h/Base.cpp with definition of SolutionBase
* Link C++ object into Cython interface
* Add unique_name and type attributes to Cython _SolutionBase
* Resolve conflation of gas.ID and gas.name in unit tests
* Also fixes Cantera#691
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I'm satisfied that a composite class of some sort will provide a useful interface for serialization, as well as a few other purposes.

I think I'd like to defer the resolution of these issues relating to the multiple name-like attributes and whether they need to be unique until we start to flesh out the serialization interface.

src/base/Solution.cpp Show resolved Hide resolved
include/cantera/base/Solution.h Show resolved Hide resolved
src/thermo/Phase.cpp Outdated Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Show resolved Hide resolved
interfaces/cython/cantera/composite.py Outdated Show resolved Hide resolved
src/base/Solution.cpp Outdated Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/test/test_mixture.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Outdated Show resolved Hide resolved
@speth
Copy link
Member

speth commented Oct 6, 2019

Regarding phase/ID and name. It looks like there is another issue I was not considering, see Phase.h's documentation:

 * A species name may be referred to via three methods:
 *
 *    -   "speciesName"
 *    -   "PhaseId:speciesName"
 *    -   "phaseName:speciesName"

I was trying to avoid dragging this into this PR, but I was reminded of this capability while reviewing your earlier PR on enabling case sensitivity in species names. I believe that these latter two methods are not actually in use within Cantera, and I'm really not sure that we need or want them. For multiphase mixtures (C++ class MultiPhase, Python class Mixture, you always need to provide both a phase name/index as well as species name in order to look up information about a species in a particular phase. And for surfaces, the usual approach is that the species names are unique among the phases, e.g. H2O is a gas species and H2O(s) is the same species absorbed onto the surface. So at some point, I was planning to deprecate and remove this capability anyway.

@ischoegl
Copy link
Member Author

ischoegl commented Oct 6, 2019

@speth ... thanks for the prompt review! I think most of your comments are addressed, but I left things that may need further discussions unresolved.

I was trying to avoid dragging this into this PR, but I was reminded of this capability while reviewing your earlier PR on enabling case sensitivity in species names. I believe that these latter two methods are not actually in use within Cantera, and I'm really not sure that we need or want them.

This makes sense. I wasn't really questioning the use, just trying not to break anything 😉 ... it certainly affects the PR, as both name and phase_id are directly involved.

@bryanwweber
Copy link
Member

Hi @ischoegl... sorry to jump in late (again), I just wanted to add my 2 cents on the distinction between name and id semantically. You said

I tried to be consistent with past descriptions, which identified phase_id as non-unique and name as unique (also: usage of --id in ck2cti, --phase-id in ck2yaml, and phase_id as keyword argument in Solution).

To me, name sounds more like something a user would set (people have names by which they introduce themselves and can be changed), whereas an id is fixed and unique, like a Social Security Number. I'm not that familiar with how these terms have been used/mixed in the past based on the existing code, but to the extent that their meanings are changing with this work (and other stuff that you're doing), to me it makes more sense to allow the user to set/change a field called name and let the unique one be called id (or phase_id or whatever). Note also that in YAML, the identifier for a phase has the key name. Of course, if this is already too far along to make this change, that's fine, it is only a small difference and both ways will work.

@ischoegl
Copy link
Member Author

ischoegl commented Oct 8, 2019

@bryanwweber ... from Phase.h (on master):

    /*! @name Name and ID
     * Class Phase contains two strings that identify a phase. The ID is the
     * value of the ID attribute of the XML phase node that is used to
     * initialize a phase when it is read. The name field is also initialized
     * to the value of the ID attribute of the XML phase node.
     *
     * However, the name field may be changed to another value during the
     * course of a calculation. For example, if a phase is located in two
     * places, but has the same constitutive input, the IDs of the two phases
     * will be the same, but the names of the two phases may be different.
     *
     * It is an error to have two phases in a single problem with the same name
     * and ID (or the name from one phase being the same as the id of
     * another phase). Thus, it is expected that there is a 1-1 correspondence
     * between names and unique phases within a Cantera problem.
     */

to me it makes more sense to allow the user to set/change a field called name and let the unique one be called id

This is the current (intended) behavior of this PR 😉 ... phase_id is automatically assigned and (while it can be changed) should be left alone. It is in a way unique, as it refers to an assembled Phase within a Solution (via YAML or CTI/XML). The non-uniqueness arises whenever a user decides to use a duplicate object in multiple reactors, or similar (i.e. it is a strict serialization issue).

The option --phase is consistent with the resulting yaml entry in 'phases'.
The --id option is still supported, with a warning being issued.
@ischoegl
Copy link
Member Author

ischoegl commented Oct 8, 2019

@bryanwweber ... copying some thoughts over here: name does indeed indicate that the user is welcome to change it, whereas id should be left alone (or used with caution). Despite the YAML implementation, I believe that the name field in YAML is an example for the latter, as it probably should support round-trip serialization (YAML->Solution->YAML) where it should remain unchanged.

I view the name of the Solution as temporary, i.e. it can be reassigned as needed by the user (unique/human-readable shorthand, ...). Those local changes shouldn't affect the original YAML entry.

Finally, I believe the differentiation suggested by this PR is consistent with other Cantera interfaces (i.e. there is a long tradition of the keyword argument phaseid in ct.Solution and --id in ck2cti/ck2yaml, despite renaming those to phase_id and --phase-id in this PR)

@ischoegl ischoegl requested a review from speth October 8, 2019 18:34
@ischoegl
Copy link
Member Author

ischoegl commented Oct 8, 2019

Ugh. It looks like I force-pushed a stale branch, losing most of my updates over the last couple of days. I don't think that I have the work elsewhere, so I'll have to retrace.

@ischoegl
Copy link
Member Author

ischoegl commented Oct 8, 2019

🎉 retraced all changes that were lost in a force-push.

* rename C++ object to 'Solution' (from 'SolutionBase')
* remove 'phaseID' from 'Solution' ('id' remains assigned to 'Phase')
* remove 'type' from C++ object (no polymorphism anticipated)
* assign 'name' to 'Solution' (link back from 'Phase' until deprecated)
* clarify 'phase' as 'phase_id' in Python interface
* address various feedback in review comments
@ischoegl
Copy link
Member Author

ischoegl commented Oct 9, 2019

To add an alternative approach (following a discussion with @bryanwweber): instead of trying to enforce Phase:id following a long tradition (and the original intent of some docstrings), the introduction of YAML (which uses the field name) could be used as a reason to switch over to use Phase:name consistently and abandon id and associated names instead (i.e. deprecate it without a substitute).

I.e. use name as a key in Solution('gri30.yaml', name='gri30'), use --name=... in ck2yaml, etc.. It is probably not the end of the world if a user changes the name of a Phase/Solution, and the original identifier (i.e. what was used in the YAML file) gets lost. There is only one field in YAML, and everything else may be excessive ...

I guess I am finally coming around. I still would like to use this opportunity to ensure that the naming convention is being used consistently. The change is not overly complicated ...

* 'name' corresponds to the YAML entry
* rename Solution keyword 'phaseid' to 'name' (instead of 'phase_id')
* rename ck2yaml argument '--id' to '--name' (instead of '--phase-id')
* ensure that C++ Phase::m_id is always the same as Phase::m_name
@ischoegl
Copy link
Member Author

ischoegl commented Oct 10, 2019

I went ahead as the change was quick and straight-forward. Getting rid of ID altogether certainly feels more consistent. I'll wait for a re-review before adjusting #724 (which will be switched to a - much less intrusive - deprecation of Phase::id and Phase::setID)

Ingmar Schoegl added 2 commits October 10, 2019 06:37
* Merge usage of 'id' and 'name' in the context of Phase objects
* Raise deprecation warnings for Phase::id and Phase::setID
@ischoegl
Copy link
Member Author

ischoegl commented Oct 10, 2019

I added deprecation of Phase::id and Phase::setID to this PR to illustrate that there is indeed no need to maintain two string identifiers. I also cherry-picked deprecation of name/id support for Phase::speciesIndex from #724 to consolidate all related efforts (i.e. #724 and #727 are now moot and closed).

I cannot think of anything else at this point, so this PR is ready for review.

Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these updates. I think converging on name as the sole phase identifier and removal of the extra phase_name:species_name access are both useful simplifications to the user interface, and make the introduction of the Solution class simpler.

I think there are just a few minor points left to address before this can be merged.

include/cantera/base/Solution.h Outdated Show resolved Hide resolved
include/cantera/thermo/Phase.h Outdated Show resolved Hide resolved
interfaces/cython/cantera/base.pyx Outdated Show resolved Hide resolved
def __set__(self, name):
self.base.setName(stringify(name))

property composite:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest something along the lines of composite_type or composite_model for this property?

Copy link
Member Author

@ischoegl ischoegl Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tuple of types/models, so I'd prefer to leave it as is (an alternate would be composite_composition, which sounds redundant).

To clarify, composite_type (or composite_model) to me would indicate Solution, Interface, etc. (e.g. a duplicate of type(gas), with gas being a composite object). I wanted to differentiate (and used the name of the declaring module as a guide). I'm open to alternatives though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see your point on that being a bit misleading. Just composite feels somehow incomplete to me, but I don't have a better suggestion, so we can leave this as is for now.

interfaces/cython/cantera/transport.pyx Outdated Show resolved Hide resolved
src/thermo/Phase.cpp Outdated Show resolved Hide resolved
@ischoegl ischoegl requested a review from speth October 22, 2019 22:46
@speth speth merged commit dc66407 into Cantera:master Oct 23, 2019
@ischoegl ischoegl deleted the cpp-composite-object branch December 16, 2019 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Question: equivalent of composite Python classes in C++ Incorrect Solution.ID for YAML generated objects
3 participants