Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of bitwise enums (flags) #24

Open
queequac opened this issue Dec 4, 2017 · 23 comments
Open

Definition of bitwise enums (flags) #24

queequac opened this issue Dec 4, 2017 · 23 comments

Comments

@queequac
Copy link

queequac commented Dec 4, 2017

So far enumerations had two drawbacks for me:

  1. Values can be anything. Typically I have a name and a numerical value in most programming languages that support enums as a built-in type. So I have to decide, go with numbers or go with the name.

  2. There seems to be no support for combination of enum values, so I cannot realize flags or a bitwise enumeration. Most likely this is because the value is not defined by a single built-in type (number or string for example). In case of strings a comma or pipe would be good for multiple flags. In case of numbers simply adding the value.

How could this be addressed? Unfortunately I guess this will be possible through a custom type only, if this can be expressed in JSON Schema at the moment at all.

@handrews
Copy link
Contributor

handrews commented Dec 5, 2017

@queequac I'm not entirely sure I follow your question. Are you trying to generate code? Validate data? Something else? Some of these things are better-supported than others by the current JSON Schema vocabulary. The best answer for you will depend on what you're trying to do.

The validation vocabulary is actually not well-suited to code generation (although many people use it for that, usually with various restrictions and/or extensions). We hope to address code generation separately from validation in the near future somehow. The features overlap but are not identical.

@gregsdennis
Copy link
Member

gregsdennis commented Dec 5, 2017

@handrews I think this is best explained with an example.

An enumeration in C# can be defined as follows:

[Flags]
enum MyEnumeration
{
    Value1 = 1,
    Value2 = 2,
    Value3 = 4,
    Value4 = 8,
    ...
}

These values can then be combined using the bit-wise OR operator:

var myValue = MyEnumeration.Value1 | MyEnumeration.Value2;

What's interesting about this is that myValue now contains a valid MyEnumeration instance/value that isn't defined within the context of the enumeration itself. .Net (and many other languages) accomplishes this by defining that enumerations are merely integers under the covers.

Bit-wise enumerations in JSON

A problem arises when attempting to serialize these values to/from JSON when using names for enumeration values because there is no explicit label for the combined value. Serializers can approach this in different ways. For example, Manatee.Json serializes into a delimited string.

Enumerations in Schema

To validate a string representation of a enumeration value in JSON, the native enumeration is typically translated to an enumeration schema.

{ "enum" : [ "Value1", "Value2", "Value3", "Value4" ] }

While this works for enumerations that are merely explicit values, it fails to support these bit-wise enumerations. (e.g. There is no JSON Schema mechanism to support a value that is Value1 combined with Value2.)

The interesting twist, however, is that JSON Schema does not define that enumeration values MUST or even SHOULD be strings. Instead they just need to be unique JSON values. The JSON being validated simply needs to be equal to one of these values.

This gives rise to a question: how would one combine arbitrary JSON values in the way that these other languages combine their enumeration values?

@handrews
Copy link
Contributor

handrews commented Dec 19, 2017

This is a good question, I'm really not sure how to handle it, or even if it is in-scope for JSON Schema.

One thing that we've recommended recently for people who want to associate more descriptive strings with enumerated values is to use oneOf + const instead of enum:

{
  "oneOf": [
    {"const": 1, "title": "Value1"},
    {"const": 2, "title": "Value2"},
    {"const": 4, "title": "Value3"},
    {"const": 8, "title": "Value4"}
  ]
}

This gets all of the information into the schema, but "title" is not a validation keyword. It's just an annotation- a bit of information that an application can use if the instance is valid against that subschema.

Playing with this a bit more, we could come up with

{
  "oneOf": [
    {"enum": [1, "Value1"]},
    {"enum": [2, "Value2"]},
    {"enum": [4, "Value3"]},
    {"enum": [8, "Value4"]}
  ]
}

Now we can validate the instance in either string or integer form. There is nothing in JSON Schema that says that those two forms are interchangeable- enum is generally used for mutually exclusive values. But an application could decide to use them that way. It just can't expect all applications to behave the same way.

Hmm... I'm just tossing out some ideas here. No clear solution yet.

@gregsdennis
Copy link
Member

This still doesn't address the topic of the issue, which is values that consist of two or more discrete values. For example, Value1 & Value2 which numerically is 3.

You'd still have to list out all possible combinations.

@handrews
Copy link
Contributor

@gregsdennis yeah that's why I said "No clear solution yet." I probably should have said "no solution at all yet" :-)

Just trying to poke around and see what constructs we might be able to build off of.

This may end up being outside of the scope of validation spec (meaning that you could add an extension keyword of some sort, but it wouldn't be standardized as part of what all validators need to implement).

This is the sort of thing that you could easily implement as a preprocessing build step- add your own keyword, then have a script that goes through and works out all of the combinations and dumps them into an enum to produce a standards-compliant schema.

I'm not saying it's out of scope yet, but it's an option worth considering. This specific feature is a shorthand for how certain languages use enumerated values, so it's not as universal as most existing keywords (arguments to the contrary are welcome, though).

@gregsdennis
Copy link
Member

gregsdennis commented Dec 20, 2017

The only thing I can think to do that would be within the domain of the spec would be to allow for an array of values where the items each were one of the enum values.

Schema:

{
  "properties" : {
    "flagEnumProp" : {
      "type" : "array",
      "items" : { "enum" : [ "Value1", "Value2", "Value3", "Value4" ] }
    }
  }
}

Instance:

{
  "flagEnumProp" : [ "Value1", "Value3" ]
}

but then, the spec wouldn't have to explicitly allow for this since that's currently supported.

I think what's being asked is to have the enum keyword validate an array of the values defined as well as individual values. I wonder if that leads into any kind of recursive nightmares (thinking of mathematics where a set of sets contains the empty set...)

@gregsdennis
Copy link
Member

@queequac would the array solution above allow you to do what you want? You can likely configure your serializer to (de)serialize flag enums as an array of named values.

@queequac
Copy link
Author

queequac commented Dec 20, 2017

@gregsdennis To be honest, I am not 100% sure. Okay, I do see the point of having an option how to combine values.

But that's just one half of the story. Personally I feel much more uncomfortable with the fact that enumerations in JSON Schema are just a bunch of anything. For me this is too theoretical. Why should someone use arbitrary objects? Having this absolute degree of freedom for enum values might only be plausible for people that come from languages that do not have a built-in enum type. But even then it's not handy since it's still anything and not an enumeration.
The vast majority of languages uses a keyword (which is closest to the string most people use in JSON Schema) plus a value. And usually this value is numerical - either as internal representation (if you don't care for the actual value) or explicitly assignable through the developer.

Even in JavaScript projects you usually do see enumerations being expressed like the following very often:
var CoolEnum = { None: 0, Value1: 1, Value2: 2, Value3: 4 };
being used in a way people are used to from other languages, such as
var myValue = CoolEnum.Value1;

To sum things up: The current enum in JSON schema does not really help people with enumerations, it is too simplistic and does not address real-world-problems. For me it has a smell that it is more easy to mimic the current schema definition of enum with oneOf or other onboard means than expressing a "real" enum. Not everybody lives in a world where no built-in enum exists, so I'd prefer to have a standardized way that fits better for languages that do have. :)

@queequac
Copy link
Author

By the way, I like typescript's definition of enums: Enums allow us to define a set of named constants.

This is pretty close to what you have in most languages.

Would also be fine for me to leave open the constants' actual type and having a fallback strategy if no values got explicitly defined for an enumeration.
But all constants within one enum should be of same type. (While string and number should be good enough for 99,9% of all developers, I guess. Don't make the story more complex than necessary.)

Nevertheless: I do not think it's appealing trying to solve this with proprietary constructs with today's JSON schema. And if we'd find a solution that allows us to express sets of named numerical or string constants within the standard, this would be an huge improvement. Also the move to bitwise combinations would be within reach. (Restricting this capability to numerical only, of course.)

@queequac
Copy link
Author

So here it is, a maximum backward-compatible proposal... just as some first idea how we could extend today's specification while not breaking it.

enum

An enum is a set of constants, each defined through a key. An instance validates successfully if its value is equal to one of the constants in the set.

If keys and constants are identical, the key SHOULD be a string but might be of any other type. In this case the enum keyword SHOULD be an array holding constants only.
{"enum": ["None", "Some", "Thing", "Else"]}
Note: The above is backward-compatible with current JSON Schema. The array would be the short-hand notion for keys and constants being identical. And it would be the relaxed format that allows any types and null.

If keys and constants shall be different, keys MUST be strings and the enum keyword MUST be an object. The object represents the set, where each child is a key with its associated constant.

{"enum": {"None": 0, "Some": 1, "Thing": 2, "Else": 3}}

Note: You are still free to use numbers, strings (or any other type) for constants.* But knowing the schema you have not lost the meaning of the constant being just a magic number (or object) otherwise.

* While I have to admit, it feels still strange allowing mixed types for constants instead of restricting them to a uniform type per set… but doing so it might no longer make sense to support Boolean, null and so on. And in case of objects, is it equal object types according to some schema? Personally, I would restrict constants here to a uniform type of either number or string per set.

Option: In case of enum, instances could also validate against key or constant, both being interchangable.

flags

The keyword flags refers to a set of textual keys with numerical constants in powers of two (that is: 0, 1, 2, 4, 8, and so on). An instance validates successfully if its value can be the result of bitwise operations (AND, OR, EXCLUSIVE OR) performed on the set's constants.

{"flags": {"None": 0, "Earth": 1, "Wind": 2, "Fire": 4}}

Note: Since instances are dealing with constants only, validation is quite easy. While Greg's sample Value1 & Value2 working with the keys is more the programming style, it is usually not the persistence format (that being 3). But having the keys in the schema we can transfer this information from data to code.

@awwright
Copy link
Member

Is this just for validating instances, or is it for something else (user interfaces)?

What would be an example of a schema, and valid and invalid instances?

@queequac
Copy link
Author

@awwright Primarily for validation, secondly for type introspection (and maybe code generation). If you'd leave out the keys, you'd end up with magic numbers only.

As I said, this is just some first ideas and I gave some options to be considered.

Samples have been included, but I will give some more with valid and invalid instances (while leaving out the first one with the array, since this is as of today).

{
   "type": { 
      "enum": {
         "cat": 1,
         "dog": 2,
         "cow": 100
      }
   }
}

Valid instances: 1, 2, 100
Invalid instances: 0, 3, 5, 101, foo, null

{
   "type": { 
      "flags": {
         "None": 0,
         "Circulatory": 1,
         "Digestive": 2,
         "Endocrine": 4,
         "Immune": 8,
         "Integumentary": 16,
         "Lymphatic": 32,
         "Musculoskeletal": 64,
         "Nervous": 128,
         "Reproductive": 256,
         "Respiratory": 512,
         "Urinary": 1024,
      }
   }
}

Valid instances: 0, 1, 7 (being Circulatory, Digestive and Endocrine), 2047 (being all of them), 68 (being Endocrine and Musculatory), ...
Invalid instances: -1, Urinary, Respirotary,Urinary, foo, null, 5000

If I am not totally wrong, flags istances could be validated like the following:

  • compute cumulative bitwise OR over all constants
  • compute "instance & ~allConstants"
  • the result must be zero (otherwise the instance cannot be formed through the available constants)

Just thinking loud: Could make sense to have new stuff under one new keyword, instead of getting it mixed up with enum. Something like a keyword set (or using flags for both) where some could set an additional attribute for bitwise (default: false)

{
   "type": { 
      "set": {
         "Earth": 1,
         "Wind": 2,
         "Fire": 8
      },
      "bitwise": false
   }
}

Valid instances: 1, 2,8 (anything else invalid)

{
   "type": { 
      "set": {
         "Earth": 1,
         "Wind": 2,
         "Fire": 8
      },
      "bitwise": true
   }
}

Valid instances: 1, 2, 3, 8, 9, 10, 11 (anything else invalid)

@awwright
Copy link
Member

How would this be different than specifying something like {"mask": 2047}, where instance & mask == instance in order to validate integer instances?

@queequac
Copy link
Author

Which aspect are you questioning? Just the note on how to validate flags and/or the keyword bitwise?
Well, it seems not different from a technical point of view, but why should I explicitly define the mask if the value can be calculated through the constants anyway?
Also, it bears the risk someone adds/removes some constant later in time, missing to adjust the mask. There might even be "holes" in list of flags, so it might not be best usability to make people calculate their mask. But maybe just a matter of taste.

@gregsdennis
Copy link
Member

gregsdennis commented Dec 23, 2017

@queequac

I realize that your example is exploratory, but I really am not a fan of contextual keywords, as you show bitwise to be.

Secondarily, you have these keywords under the type keyword (maybe you intended to extend enum?), which may only be a string or an array of strings, where each string is one of the six basic types (object, array, string, number, integer, null). Your keyword(s) would need to be under something else. I think your proposal of using a dedicated flags keyword for this functionality would be best.

That said:

  1. Many people serialize/persist enumerations by name rather than number. The solution needs to account for that. I suggest that valid values include arrays of values defined by the flags set as well as the numeric value. For example, ["earth", "wind"] should be valid as well as merely 3.
  2. In the case that the flags are strictly ascending powers of two, would it be necessary for the author to list the values? For example, would ["earth", "wind", "fire"] work just as well as {"earth": 1, "wind": 2, "fire": 4}?
  3. Would the author be required to define a zero value? If we use the syntax in 2, would the values be zero-indexed (would earth be assumed as the zero value)?

@gregsdennis
Copy link
Member

gregsdennis commented Dec 28, 2017

It should be noted that with bitwise enums, sequencing has no impact on the overall value.

[ "earth", "wind", "fire" ]

is the same as

[ "fire", "earth", "wind" ]

@gregsdennis
Copy link
Member

gregsdennis commented Jan 1, 2018

@handrews I think that the meta-schema may benefit from this. Currently type is defined as any of a number of strings or an array of those strings.

from draft 7 meta-schema

{
  "definitions" : {
    ...
    "simpleTypes" : { "enum" : [ "array", "boolean", ... ] },
    ...
  },
  "properties" : {
    ...
    "type" : {
      "anyOf" : [
        { "$ref" : "#/definitions/simpleTypes" },
        {
          "type" : "array",
          "items" : "#/definitions/simpleTypes",
          "minItems" : 1,
          "uniqueItems" : true
        }
      ]
    },
    ...
  }
}

Really the functionality you want is a set of values that may also be combined in a non-repetitive manner by use of an array. I think this could be simplified with a flags keyword that implies this behavior:

{
  "definitions" : {
    ...
    "simpleTypes" : { "flags" : [ "array", "boolean", ... ] },
    ...
  },
  "properties" : {
    ...
    "type" : { "$ref" : "#/definitions/simpleTypes" },
    ...
  }
}

(This is really a return to the simplicity of the definition of enum in draft 4.)

Within the meta-schema, flags could be defined the same as enum. In the validation spec, the wording could be

The value of this keyword MUST be an array. This array SHOULD have at least one element. Elements in the array SHOULD be unique.

An instance validates successfully against this keyword if its value is:

  • equal to one of the elements in this keyword's array value, or
  • an array containing only elements in this keyword's array value

Elements in the array might be of any value, including null.

(This assumes a simplistic interpretation of the values that don't have numerics behind the scenes, thus a numeric value would not be implicitly valid in an instance.)

@queequac
Copy link
Author

queequac commented Jan 4, 2018

@gregsdennis Sorry that it took me that long to answer, was really offline during the holidays. :)

Your proposal for flags is really simple and addresses one of the issues very well.

Nevertheless, I'd still prefer not to keep the numbers totally behind the scenes. As we discussed above, nearly all programming languages are dealing with numbers in this case. We should at least keep the OPTION to have numbers for those who prefer to work with them (or even need to).
In documents and on the wire a single number is much better than a verbose list of values. Just imagine a type that has 25 flag values and you want all of them to be set... why should people persist the whole list over and over again?

Looking at your three questions above:
1.: Agree
2. and 3.: Would like the option to use either an array (resulting in an automatic numbering starting with 1, leaving out the zero-value) or asigning numbers on your own by specifying an object instead of the array. The latter allows also to have "holes" in your flags.

Having this option on today's enum would also be great... just counting by one instead of having powers of two.

Final note: Having numbers (maybe even just defined implicitly), ordering as mentioned in #21 would be easy.

@handrews
Copy link
Contributor

handrews commented Jan 6, 2018

@gregsdennis @queequac I'm still following this, but can't shake the feeling that this is very specific to how strongly typed low-level languages work with this sort of data. And I also have a vague feeling that that puts this outside of the standard validation vocabulary. But I don't have clear argument in support of that so I want to leave this open for further discussion.

Since draft-08 is dealing with a lot of issues around extensibility and what it means to have the different standardize vocabularies, and how additional vocabularies can be added, I'm going to move this issue into the draft-future milestone. I think that once we sort out draft-08, it will become more clear where and how we should draw the line between standard and extension vocabularies.

I think it will also provide practical guidance on how to successfully build and distribute an extension vocabulary, which we don't have right now. That makes people reluctant to do so, and therefore people want all of their ideas in the standard vocabulary. Right now, there's no good interoperability story for extensions.

So definitely continue discussing ideas, but let's wait until we have a better feel on how this might work with, for instance, a set of strongly typed or low-level-storage keywords as an additional vocabulary.

Node: the draft-future milestone doesn't preclude this being addressed in draft-09. The draft-09 milestone is, at this point, just for things defining the scope of draft-09. As we start that draft, we'll move in other things that seem like they'll fit.

@gregsdennis
Copy link
Member

gregsdennis commented Jan 7, 2018

can't shake the feeling that this is very specific to how strongly typed low-level languages work with this sort of data

Yes, this is a language-specific thing. Not all languages support enums, and only a subset of those support bitwise operations on them. But then, enums are still included in the spec, so why not support bitwise operations, too? That some languages don't support a feature isn't necessarily grounds for JSON schema to not support that feature.

I know for C# it's just syntactic sugar because underlying the enum is an integer. This is how bitwise operations are supported. That said, there is no checking for invalid values. If I can't define an integer value of 7 using bitwise operations on declared values, there's nothing preventing me from casting a 7 to my enum type. It's still valid, both at compile time and run time. (I'm not sure what my point is here. It works both as an argument against [any integer is valid] and for [not all integers should be valid] this feature.)

Even considering that, with the current spec:

Regarding names

I think that the way that the type keyword is defined in the draft 6+ meta schema works well enough, even if it is rather verbose.

Regarding numerical equivalents

One would have two options:

  1. Supplement the run time checking of the schema implementor in order to validate that a given integer is a valid value. This defeats the purpose of having the schema other than to validate it's an integer.

  2. Evaluate all possible bitwise combinations and create an enum with those values.

    enum MyEnum { Value1 = 1, Value2 = 2, Value8 = 8 }
    "MyEnum" : { "enum" : [ 0, 1, 2, 3, 8, 9, 10, 11 ] }

    This array grows exponentially (count = 2^n) with more enum values and can be prone to calculation errors.

@handrews
Copy link
Contributor

handrews commented Jan 7, 2018

@gregsdennis it's not so much that I'm skeptical of this topic in particular, but that I don't think that we have a good general heuristic for what should and shouldn't go in the primary validation spec. In fact, most of the purely validation proposals that are still open are kind of marginal in some way:

They apply to only some languages, or they are shortcuts that may break desirable schema design properties, or they get into very complex and rare scenarios. I feel like we've more or less covered the obvious, reasonably universal things, and now we need to decide how broadly relevant a concept needs to be before it is included in the main validation spec.

In order to do that, I think we need to have a good feel for how easy it is to make extended vocabularies and have reasonable expectations of interoperability.

Once we know those things, then I think it will be obvious whether this concept belongs in the main validation spec, or whether it is better thought of as the first proposal in some sort of extension vocabulary.

I need to think about how to motivate that discussion and decision. But first we need to get through 512-515.

@gregsdennis
Copy link
Member

I don't mind putting this off until post-draft-8. I think that we've nailed down what it is we're looking to support for this feature.

I definitely agree with figuring out 512-515 first.

@handrews
Copy link
Contributor

This seems like a good candidate for an extension keyword. Moving to the vocabularies repository.

@handrews handrews transferred this issue from json-schema-org/json-schema-spec Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants