-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage format for Network.java objects #9
Comments
I do not have any preference regarding the underlying technology. I know that @scriptkitty was in favor of JSON. I briefly read up on JAXB on Wikipedia. To me, the big advantage over JSON-libraries seems to be that "JAXB allows storing and retrieving data in memory in any XML format, without the need to implement a specific set of XML loading and saving routines", i.e., more out-of-the-box functionality. |
Exactly. With jaxb parsing and saving is one line and java code is generated during compile time from xsd model. Do not have to code on the model anymore just draw :). We can even use annotations for existing classes. |
Currently, we can store and load curves independent of the DiscoDNC configuration. E.g., you can store a network you used with curve backend RTC and load it later with DNC Curves and the Rational BigInteger number backend. This is achieved by storing our String representation for Curves as well as the corresponding parsers for Curves and numbers. If I understand JAXB correctly, we would loose this feature as we then store an XML representation of the actual instances of all the involved classes. Is that correct? In fact, we currently rather store a NetworkFactory (interface in de.uni_kl.cs.discodnc.network) than a network. I.e., it is quite some overhead when repeatedly stored in every network. |
No, we can decide what we store in the XML and I want to store only network data, nothing calculation specific. For that we have to first come up with a clear model design. |
This clear separation sounds good. More thoughts on it in #8's thread. Assuming this separation is implemented, we only need to decide upon the storage format here -- XML, JSON or something else. I do not have any preference as long as the programmatical creation you mention is still featured. |
I'd like to pitch in another input for this issue. While having static files is a good approach for static networks like networks existing in real life, having a way to define dynamic networks with specific parameters is quite useful when evaluating algorithms against many networks. This is a relevant use-case for researchers. One idea is to use a Domain Specific Language (DSL) as a method for defining a network. Thanks to the JVM, various options are already available (eg. Clojure) and could be extended for the use-case of networks. Such approach would enable not only dynamic parameters such as defining the arrival or service curve parameters, but also more generally the network topology for example. |
I am also constantly dealing with the challenge to generate reasonable networks for evaluation myself. Not restricting to a 1:1 mapping between stored network and network object to evaluate sounds like a nice relief to this problem. For example, I mentioned in #17 that the networks used for one of my publications already come at 37MB in the DiscoDNC v2.4.0 storage format. However, these stored networks are a 1:1 mapping that encode arrival curves, service curves and maximum service curves -- a suboptimal solution we want to improve on. My current understanding of the ultimate goal is to store resources descriptions (curves) and the network topology in separate files. To instantiate a network, we then require to load the network topology file (only servers, links, flows) and a resource parameter file whose entries can be mapped to the network instances. This mapping creates the network instances and it should, of course, be flexible such that parameter alternatives/ranges given in the resource description leads to multiple network instances. For example, if we want to have a homogeneous network, we should not be required to define the same service / arrive curve for all servers / flows explicitly. Then, we could also use such a generic resource configuration across multiple network topologies. Now there is a scenario not covered by my current view on this: creation of multiple network topologies from a parameterized description -- I did not think of this, thank you for pointing it out! |
OK.
From the discussion I see that model representation/format can be used for 2 different use-cases:
While researchers see networks as service and arrival curves, industry does not know too much about this. They see networks based on the protocol (AFDX, TTTethernet...). If we decide to stay on the calculation level arrival and service curve representations are totally OK but for industry type descriptors the protocol and some additional parameters (BAG, bandwidth, frame length) can be expected. Anyway from these lower level elements (arrival and service curves) can be calculated based on a lot of theoretical papers (each protocol has its own). Also on very low level we do not have elements like Virtual Links or Queues. We just have flows with their paths, servers (several servers on each device, on a switch each queue is a server) and each flow has an arrival curve to the next server and each server has its service curve. IMO let's decide which way to go on as I feel that we somehow want to create a mixture of these. On representation: XML is standard, you can do XML abstracts from databases very quickly. DSLs are usually closed formats that are hard to create. I remember that vector has the special CANDB format that is used by a lot of companies with extensions but there is no standard parser for it causing a lot of problems for all the companies. I also saw network descriptions in excel format at Airbus. |
In general, I think @matyesz's wrap-up is quite correct -- just the difference between expectations of researchers and industry is often not that strict. I have received emails from academics asking how to use the DiscoDNC. They often stopped considering the tool when they were confronted with deriving service and arrival curves manually. Long story short, a higher layer that allows users put in their familiar network representation potentially benefits many interested parties. Of course, this representation then needs to be converted to the feed-forwardized server graph for analysis with the DiscoDNC. This is basically the brainstorming I intended to spark in #8 (but I am happy about comments here as well :-) ).
This is basically the same as presented in @fabgeyer's paper [2]. To get from 1 to 2 to 3 in the publication, we created a glue code that uses instances of our single network object out of its intended server graph context -- that's why it did not make it into a public release yet. About the complexity and thus size of the eventual storage format, this is how I see it evolving in the longer term: Regarding the storage format: To summarize, I think development on the network backend will work its way up from our current layer 3 to layer 1, step by step. Thus, I suggest to start with storing our current single feed-fowrardized server graph network and one separate resource description file with fixed parameters for curves. This can already be extended without progress on #8, e.g., to a resource file with parameter ranges. Having this foundation, the format should, of course, keep up with the development of the network backend in order to store upper layer information as well as meta data about their relations (mapping from devices to servers etc. -- see the labels in [2] for an example). [1] Iterative Design Space Exploration for Networks Requiring Performance Guarantees (Bruno Cattelan, Steffen Bondorf), In IEEE/AIAA 36th Digital Avionics Systems Conference (DASC 2017), 2017. |
Testing ground for storage format options |
Yes, with JAXB you have to create the xsd schema for the result and the xml that contains the data. Add some maven dependencies and goals to create the Java classes from the schema and everythign else comes automatically. |
I.e., other than writing Java classes to the file system.
Discussion started in Issue #8, with @matyesz raising the following point
"An idea (please evaluate):
we could use xsd to create the network model and generate java code with jaxb.
Pros:
xsd has visual editors so you edit the network model as an uml, easy to change
xml support out of the box - network can be described both in java or via xml (users have their data in db so it is easier for them to have an xml extra"
The text was updated successfully, but these errors were encountered: