Enable saving the entire simulation state to disk and restoring it again #757

kyllingstad · 2024-03-04T14:48:19Z

This feature is desired in the OptiStress project, where we will need to simulate the same system many times in a loop with parameter variations. It will save a lot of time since we can start each simulation from a “warmed up” state.

Depends on #756 and #768.

kyllingstad · 2024-03-04T14:50:48Z

Suggestions for suitable storage formats are welcome. Note that according to the FMI spec, each subsimulator does its own serialisation and deserialisation, and all the co-simulator sees are binary blobs. So the format needs to support storage of arbitrary binary data.

This is a follow-up to #765 and the final step to close #757. Here, I've implemented functionality to export the internal state of individual subsimulators in a generic, structured form, and to import them again later. This exported form is intended as an intermediate step before serialisation and disk storage. The idea was to create a type that can be inspected and serialised to almost any file format we'd like. The type is defined by `cosim::serialization::node` in `cosim/serialization.hpp`. It is a hierarchical, dynamic data type with support for a variety of primitive scalar types and a few aggregate types: strings, arrays of nodes, dictionaries of nodes, and binary blobs. (Think JSON, only with more types.)

kyllingstad · 2024-10-25T07:41:13Z

@davidhjp01, you asked in another issue discussion whether you should start working on this issue. But as noted in the issue description, this depends on #768, which is a work currently in progress, so there is a limit to how much can be done on this yet.

It might be good to start looking into suitable file formats for the saved state, though. We need some format which can store the contents of a cosim::serialization::node, i.e., a hierarchical data structure with both numerical, textual, and binary data types (see node_data for a list of the types).

Personally, I would prefer something which is lightweight both in terms of features, complexity, and additional dependencies, but efficiency is also a factor. I guess we can discuss where the perfect trade-off lies when we have some alternatives on the table.

Once we've decided on a storage format, it is also possible to write the functions to save/load a generic cosim::serialization::node to/from a file even if #768 is not completely done yet.

davidhjp01 · 2024-11-11T07:03:38Z

AI generated list 😅:

Protocol Buffers (Protobuf)
MessagePack
HDF5 (Hierarchical Data Format)
CBOR (Concise Binary Object Representation)
Avro
BSON (Binary JSON)
FlatBuffers

Format	Efficiency	Memory Usage	Ease of Use	Library Size
Protocol Buffers (Protobuf)	Highly efficient in terms of serialization and deserialization speed	Designed to be memory-efficient, especially with arena allocation	Requires defining a schema using proto files, which can be a learning curve	Relatively lightweight, no built-in compression
MessagePack	Known for high efficiency, providing fast serialization and deserialization	Memory-efficient, reduces size of serialized data significantly compared to JSON	Easy to use and integrates well with various programming languages	Small and compact, suitable for limited resource environments
HDF5 (Hierarchical Data Format)	Highly efficient for handling large datasets, supports parallel I/O operations	Designed to manage large amounts of data efficiently, though files can be large	Steeper learning curve due to complexity, offers extensive features	Relatively large due to comprehensive feature set
CBOR (Concise Binary Object Representation)	Efficient in terms of serialization speed and compactness of data	Designed to be memory-efficient, suitable for devices with limited resources	Easy to use, does not require a schema, similar to JSON	Small and lightweight
Avro	Efficient for serialization and deserialization, especially in big data environments	Space-efficient, does not store field type information with each field	Requires defining a schema in JSON, which adds an extra step	Moderately sized, balancing features and performance
BSON (Binary JSON)	Efficient for storage and scan-speed, though less efficient than JSON in some cases	Uses more memory than JSON due to length prefixes and explicit array indices	Easy to use, especially for those familiar with JSON	Relatively small, integrates well with MongoDB
FlatBuffers	Designed for maximum memory efficiency, allows direct access to serialized data	Highly memory-efficient, requires minimal allocations	More complex to use due to schema definition and direct memory access	Small and optimized for performance

kyllingstad · 2024-11-11T08:32:45Z

Nice summary. Without having spent a lot of time thinking about this, I immediately lean towards the simple and efficient schema-less formats, i.e., MessagePack, CBOR, or BSON. I don't have hands-on experience with any of them, but having read a bit about them, I think CBOR looks most promising. It seems to have been designed as an improvement of MessagePack, is an IETF standard (which is good for stability and third-party support), and has multiple C++ implementations.

kyllingstad · 2024-11-11T09:36:30Z

I do have experience with Protocol Buffers, though, and while it is good in terms of performance and built-in versioning, I think I'd prefer to avoid the extra compilation step and use of machine-generated source code.

davidhjp01 · 2024-11-22T12:50:39Z

I can try some of the options to find out potential candidates :)

kyllingstad · 2024-11-25T19:01:43Z

@davidhjp01 wrote (in what I suspect was the wrong thread?):

Just leaving some ideas here - I have used nlohman_json before and I liked it, and seems it has support to encode/decode json data to/from CBOR. May be worth to consider switching it from boost ptree? Also I like its documentation :) https://json.nlohmann.me/api/basic_json/

I like the idea. We actually used nlohmann_json in the past, but it was replaced with yaml-cpp because we wanted to support writing scenario files in YAML too (#275). YAML is a superset of JSON, so one library could support both.

I'm not sure it can replace Boost PropertyTree, though. We're mainly using that to parse XML right now (in addition to Xerces-C++), and I started using it as a generic data type in #769. I don't think we're using it for JSON anywhere.

Update There are some limitations using nlohman_json for CBOR encoding/decoding it is documented here: https://json.nlohmann.me/features/binary_formats/cbor/, but does not seem critical for our use case?

Hmm… that may be problematic. From a cursory look, it seems that the data would need to pass through a JSON-based data structure before they get converted to CBOR, and thus they lose type information. For example, JSON doesn't differentiate between signed and unsigned integer types, nor between different integer widths. Am I right, or can we keep the type information intact somehow?

kyllingstad · 2024-11-25T19:37:35Z

I'm not sure it can replace Boost PropertyTree, though. […] I started using it as a generic data type in #769.

Or was that actually where you suggested replacing it, i.e. turn cosim::serialization::node into an alias for nlohmann::json instead of boost::ptree? If so, that is absolutely something we can and should discuss. The lack of distinct numeric types and the inability to extend it with more types in the future worries me, but maybe it shouldn't?

kyllingstad added the enhancement New feature or request label Mar 4, 2024

kyllingstad self-assigned this Mar 4, 2024

kyllingstad mentioned this issue Mar 4, 2024

Add interfaces to FMI functions for saving/serialising/deserialising/restoring FMU state #756

Closed

2 tasks

kyllingstad mentioned this issue Mar 4, 2024

Implement partial FMI 3.0 support #741

Open

2 tasks

kyllingstad mentioned this issue Jun 22, 2024

Enable saving and restoring subsimulator state #765

Merged

kyllingstad mentioned this issue Jul 4, 2024

Enable saving the entire simulation state in memory and restoring it again #768

Closed

kyllingstad mentioned this issue Jul 4, 2024

Enable exporting and importing subsimulator state #769

Merged

kyllingstad mentioned this issue Jul 18, 2024

Simplify API and code by dropping (alleged) support for changing system structure during simulation #771

Open

4 tasks

kyllingstad assigned davidhjp01 and unassigned kyllingstad Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable saving the entire simulation state to disk and restoring it again #757

Enable saving the entire simulation state to disk and restoring it again #757

kyllingstad commented Mar 4, 2024 •

edited

Loading

kyllingstad commented Mar 4, 2024

kyllingstad commented Oct 25, 2024

davidhjp01 commented Nov 11, 2024

kyllingstad commented Nov 11, 2024

kyllingstad commented Nov 11, 2024 •

edited

Loading

davidhjp01 commented Nov 22, 2024

kyllingstad commented Nov 25, 2024 •

edited

Loading

kyllingstad commented Nov 25, 2024

Enable saving the entire simulation state to disk and restoring it again #757

Enable saving the entire simulation state to disk and restoring it again #757

Comments

kyllingstad commented Mar 4, 2024 • edited Loading

kyllingstad commented Mar 4, 2024

kyllingstad commented Oct 25, 2024

davidhjp01 commented Nov 11, 2024

kyllingstad commented Nov 11, 2024

kyllingstad commented Nov 11, 2024 • edited Loading

davidhjp01 commented Nov 22, 2024

kyllingstad commented Nov 25, 2024 • edited Loading

kyllingstad commented Nov 25, 2024

kyllingstad commented Mar 4, 2024 •

edited

Loading

kyllingstad commented Nov 11, 2024 •

edited

Loading

kyllingstad commented Nov 25, 2024 •

edited

Loading