-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context for measurements #35
Comments
Hi @ameinhardt, Do you mean allowing both possibilities, right? an URL to another description or an inlined description. |
I'd vote for an (optional) external reference. Inlined data is overly redundant. Architecturally, a URL would normally not point to the device itself, as a PPMP source is not a server in most cases. So, this is somewhat outside the domain of a payload protocol. For the metadata we may also have some overlap with Vorto, but I am not an expert with that. |
@fpatz absolutely, it would be weird that the URL points to the device itself. But a URL/any kind of reference to some resource with the description could work, although still clunky in my opinion. I can imagine many scenarios in which the consumer of the messages does not have access to that "description server", or the description is missing, outdated, etc. Moreover, for very intensive data transfer scenarios, sending this reference over and over again hundreds of times per second could be undesirable. And of course, inlining it would be an absolute waste of bandwith. At the end this description is something that would be used just for configuring data consumers, which should happen extremely seldom. I don't know... I think this feature doesn't look like a bright idea... |
Retooling (Umrüsten) is not so seldom. In that case, units and especially limits might change. Accuracy of a measurement could vary the more a machine is heated up etc.. I agree with the redundancy problem and architectural preference. On the other hand, in order to avoid complexity, PLCs might just send such information every time.
If needed, this
It's up to the receiver, if he accepts references. He decides whether to resolve them every time, via a cache or any other registry that is not part of the scope of PPMP (vorto etc.). |
@ameinhardt I assumed that this description is just a help for humans (the "integrators"?) to understand the messages. If this is not the case, please correct me. Under this assumption, and regarding your comment:
Seldom is actually a very ambiguous term :). What I wanted to say is that this description is only meaningful when the integrator is deciding what to do with this very kind of message. Once that is working, he does not really need at all to know that the unit is e.g. Fahrenheit until the next time the machine is retrofitted or modified. How often could that happen? once after millions of messages? |
Hi, Given that I was one of the people who originally requested this change, I'd like to clarify why it's needed. We often have analog sensors in use, which deliver only unqualified values in the range 0000-FFFF, so there's no obvious way of converting that to a useful value with a unit. Even IO-Link sensors can differ in their output, depending on the configured mode. The system that handles collected data is typically managed by completely different people to those who set up the sensors. The additional coordination effort between groups is considerable, so it is simpler if the data qualification can be included as part of the PPMP message. It also automatically resolves the problem of how to handle configuration changes - especially with large numbers of sensors. The additional data bytes are not a consideration in our case, as the local network is nowhere near capacity. However, external references would be a problem as external access is generally blocked. Using a reference to a locally available server only works until the message is passed to a different network segment, or even the cloud, where the reference again cannot be accessed. Our requirement is to be able to have fully qualified self-contained packets of data, where the data can also be non-numeric. I should also note that the entire As far as units go: in our discussion (Bosch, Sick, Balluff), we came to the conclusion that treating the unit as a label would be more pragmatic, as use cases often arise where the unit ends up being something domain specific that won't convert to an SI form. That said, I do like the IETF SenML suggestion. Steve |
About units: The SenML IETF draft says this:
To our knowledge, IANA has not yet done this, and it's not clear when this will become available. Until that time, SenML is effectively unusable unfortunately. The method used by OPC/UA to define units might be worth looking at. Here a short overview of their fields from OPC UA Part 8, S. 15-16:
The suggested The unit ID is only relevant when a numbered list of units is referenced by the URI. For example, the IO-Link consortium does that. As before, the whole thing must remain optional. Those who need context should be able to select between either a reference or inline (just not both at the same time). That covers the use cases that have been mentioned here. Should we use the same field names from the OPC/UA spec? Steve PS. @ameinhardt: I hope I'm not too late with this suggestion! |
Hi @bf-bryants , I think the OPC UA approach still has the problem you pointed out before: people managing the receiving part of the system may not be able to get the description/semantics referred by the I'd suggest a simplified schema:
I would not care at all about the OPC UA displayable names ( Not sure about the Then, another details would be how important it is for us to have those gradients and offsets that were suggested at the beginning. My first idea would be that the device sending this information should send the values already corrected instead of sending them along with the correction factors... but I don't know if this is possible in all cases. What do you guys think? |
@bgusach: your suggestions regarding unit makes sense, in my opinion. @bf-bryants: do you agree with the 'in-house' default and simplification of displayName?
Should we offer either a simplified form of:
or a complex object with mandatory id & namespace URI:
|
@ameinhardt , that'd work, but what about always having a flat object for each dimension? No namespace: context: {
temperature: {
unitID: "C"
}
} Namespace and possibly other stuff: context: {
temperature: {
unitID: "C",
namespace: "...",
otherStuff: "..."
}
} I think that makes parsing and validating easier. |
I agree with @bgusach:
However, the previous two comments have inadvertently made a case for allowing a numeric identifier where the namespace has them (ie: in addition to the label): You've both named the unit This leaves me with the following per measurement field:
The last two of these could be merged if we allow the unit to be a string or a number - but I don't know if the schema will support that. While I'm here: would it be sensible to allow a per-message default unit namespace? If a namespace is used at all, it's likely that multiple fields will use the same namespace, so we could avoid some text duplication. The Base 64 was chosen because it's the de facto standard, with MIME et al. Is there something else that should be included here? It should still explicitly stated that all of a field's values must be of the same type for implicitly typed values. Thus you could never have this: "temperature": [ 1.0, false, "3" ] Steve |
Hi @bf-bryants ,
I didn't think too much about that and copied/modified @ameinhardt's example (that I'd say it is a bad idea to have two possible identificators ( Side note: we should restrict the string ID to something like [a-zA-Z0-9]+
I'd say it is not very elegant to have type information both in the JSON format and within the payload. Moreover from my limited point of view, binary data is not very useful if you don't have a proper description of what it is (e.g. "it is a jpeg", or "the first byte means this, the second one that", etc) , so I'd personally stick to using the description (either inlined or in the namespace documentation) and saying something like However, I don't really know all the use cases from the real world, and if it is a must for you that base64 strings are automatically converted to binary data on the consumer side, I guess there is no way around using an extra field (or maybe using some kind of prefix like
Technically other encodings could be used to embed binary data in JSON, as base85 or base91 (which are more bandwith efficient), but you are right there: base64 is the de facto standard and the improvements of other encodings are probably not worth the hassle.
That's a good idea. Default, and if some units want to use another namespace, they're free to override the default one. Probably the context object is the right place to define this default. |
Hi, After letting it bounce about in my head for a couple of days, I think you're right about using only one field for a unit. As you point out, putting a number into a string is an option; it's easy to detect. I'm not so sure about restricting content. I feel that it should be possible to specify at least SI units directly, which means you'd also need superscript numbers, the degree and slash symbols, eg: it should be possible to write "km/h" etc. Here's an example of a binary data use case: sending a current tag ID from an RFID reader along with other measurement data such as conveyor speed etc. There's no format or meaning to the data, other than that it's an ID. We know it's a tag ID from the combination of the field name and the PPMP message's device ID (we use UUIDs); we use that to look up configuration information. Note that the conversion to PPMP/JSON is often done by a little embedded field device (eg: IO-Link master), with limited context and resources. While it can read configuration information about its connected devices, interpretation of data is generally not possible. My suggestion would be to use an additional property (which is outside of the standard's scope). An alternative would be to add an optional field for the MIME type to the context. Neither is suitable for complex data descriptions though. As it's not far how far we should go with data description details, I am inclined not do do it at all; I therefore expect that the receiver either ignores such fields, or they know how to deal with the data somehow. I think it would be good to omit explicit type information if it's implicitly and non-ambiguously available. Whether something is a number or a string is clear, but interpreting string content can be a source of problems. I would avoid using prefixes inside string data, as we then have a lot more effort to ensure that we're not looking at a string that happened to start with the same characters. The same problem occurs if a string happens to start with the same characters that base64 uses (or bases 85, 91 or 122). I must assume a string is just a string unless explicitly marked as being something else. Note that this can be optional (in my opinion). If we can mark a specific string field as using a certain encoding by using out-of-band configuration, then the PPMP message can omit that information. As above, anybody who can't interpret the string content can ignore it. BTW: the only reason for this is because JSON has no binary data type of its own. :-( TL;DR:
Steve |
Hi @bf-bryants
I think the unit IDs should analog to variable names in a programming language, among others they should be readable and hard to confuse. You suggested using numbers as IDs for the engineering units, and it makes sense to be from the "hard to confuse" point of view,
I'm not sure I understand what you meant in that paragraph. Could you try to explain it in another way and maybe give examples?
Yup, that with the prefixes was rather a dummy "brainstormy" idea. Could work, but to make it fast we should prefix all the strings to know if they are strings, base-xx or some other exotic stuff. Meh... :)
That is true. In the FAQ it is stated that PPMP happens to be JSON, but could be changed to something else if necessary. Although that was said regarding size, this problem could be a reason to move to some other standard (something like messagepack, protobuf, BSON, ...?). What do you think @ameinhardt ? Thanks, |
Hi,
That won't work because SI units are case sensitive - for example with mega (M) and milli (m). I see only two choices here - either we use a correct set of rules for validating the content, or we don't validate it and live with the fact some people will put rubbish in that field. I am currently leaning towards the second, as getting the first one right will delay the V3 release too long. :-)
That was about how to describe binary content sent with measurement data. Short answer: "Don't!" It's also possible to use an "additional property" (aka custom field), but such a field is not part of the PPMP spec by virtue of it being a custom field. If the receiver doesn't know what's in the binary data field, they should ignore it.
We're currently looking at binary formats for lower-level direct data exchanging where performance is more important. I like how protobuf has an explicit definition layer. We're also looking at CBOR, which is a strict JSON superset but explicitly supports various number types and binary data. However, as a general interchange format, JSON is very widely accepted and is human readable. I would actually be surprised if PPMP starts using something else. Best regards, Steve |
Hi,
You suggested using |
Yes - that was an idea from the OPC/UA spec. A namespace URI gives us a complete enumerated set of possibilities, and the unit ID refers to one of them. It seemed likely that devices will already have this information (eg: for OPC/UA, IO-link etc), so re-using it for PPMP would make life easier for them. The problem is what to do when no namespace is used, as the numeric ID has no meaning. A string allows at least something to be set - but as you quite rightly point out, people will put all sorts of rubbish in there. If we're going to go with a strict validation, I'd prefer a numeric ID from a specific namespace.
You may find that other people do have a problem with that, given that the symbol As I mentioned previously, I'd prefer not to validate strings at all. A numeric ID from an external namespace can't get messy. My opinion is that we are either very strict (namespace+number) or we leave it open (unvalidated string). Steve |
Hi @bf-bryants, @bgusach, flatten({
"content-spec": "urn:spec://eclipse.org/unide/measurement-message#v3",
"device": {
"id": "a4927dad-58d4-4580-b460-79cefd56775b"
},
"measurements": [
{
"ts": "2018-05-28T07:41:31.603Z",
"series": {
"time": [
0,
23,
24
],
"temp.1": [
45.4231,
46.4222,
44.2432
],
"temp.2": [
42
]
}
}
]
}) results in {
"content-spec": "urn:spec://eclipse.org/unide/measurement-message#v3",
"device.id": "a4927dad-58d4-4580-b460-79cefd56775b",
"measurements.0.ts": "2018-05-28T07:41:31.603Z",
"measurements.0.series.time.0": 0,
"measurements.0.series.time.1": 23,
"measurements.0.series.time.2": 24,
"measurements.0.series.temp\\.1.0": 45.4231,
"measurements.0.series.temp\\.1.1": 46.4222,
"measurements.0.series.temp\\.1.2": 44.2432,
"measurements.0.series.temp\\.2.0": 42
} In the same way, one could apply additional transformation like cbor, gzip or other to the standard payload. @bf-bryants, @bgusach, can you give a (preferably final) example, taking the discussion into account? |
@bf-bryants I stand by my opinion: a restricted string has the same reliability as an integer, and offers "good enough" readability. A free string offers great readability but loses all reliability for programming purposes. Just an example, the following two strings are different: And I still think something like That's my opinion, I guess it's up to our BDFL @ameinhardt to decide 😄 |
in my opinion that's up to the namespace. The ids in a custom namespace could be numbers as well as clearly defined strings. |
The most important aspect for me is not about grouping or not, but having a variable schema, i.e. having either an string or an object for the Then, almost as a taste thing, I don't see the benefit of the grouping. Having In other words, my first choice would be:
And then this:
But I'm against allowing both a string or an object under the |
I don't agree with that. If you allow a "free string", you allow it for every case: with or without namespace. And free strings are terrible IDs. |
Hi,
I'll combine my replies into one message.
I understand that flat maps could be easier to parse by computers.
I'm not aware of JSON parsers having this problem. On the contrary, the structure (as in V2) allowed us to directly reference sub-elements as complete objects. I also prefer the grouped structure.
Transformations such as flattening, CBOR, Gzip etc would be better left separate from V3 in my opinion - so that we can get the V3 content field definitions concluded in the very close future!
It would, however, be an interesting discussion point for V3.1 or V4.
I keep proposing: The unit id is a not-clearly defined label if no namespace is defined. If a namespace is given, it shall be treated as nonambiguous id in that namespace.
This is still my position.
It doesn't matter that the key is a free string, as it must still exactly match a key in the namespace. We use 'free strings' as key names in other places in PPMP, and that's not causing any problems.
A unit ID without a namespace is not a useful definition, and is at best only of use as an advisory label; I see no point in adding restrictions to it.
BTW: I have no preference as to whether a unit definition is in its own object or flat.
Steve
|
Probably this discussion is not yet concluded, so I would second
|
As a system integrator, I want to get context information alongside the measurements, in order to facilitate the interpretation of the data.
I'm not so happy about sending redundant information and suggested a manifest/schema -link for that. Nevertheless, I understand also that for simplicity and in case of retooling (Umrüsten) machines, such context might change. In that case inline context does make sense. Maybe we could allow inline or a context reference like json-schema "$ref"?
This is also a requested by Balluff and Trumpf. Former discussion here:
https://www.eclipse.org/forums/index.php/t/1084951/
Previously discussed example context:
The text was updated successfully, but these errors were encountered: