Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum case sensitivity #193

Open
iamdanfox opened this issue Jan 10, 2019 · 15 comments
Open

Enum case sensitivity #193

iamdanfox opened this issue Jan 10, 2019 · 15 comments

Comments

@iamdanfox
Copy link
Contributor

iamdanfox commented Jan 10, 2019

Motivation

From discussion on palantir/conjure-java#134, it appears that the case sensitivity of Conjure enums is not defined precisely enough in the wire format spec. @markelliot provided context that early adopters may still be relying on the case insensitivity of enums. This needs to be defined precisely so that new languages (e.g. Rust) can be implemented correctly and precise positive and negative test cases can be added to the conjure-verification server.

Given an enum definition:

Direction:
  values:
  - UP
  - DOWN
  - LEFT
  - RIGHT

Should a message of {"direction": "up"} be accepted by a Conjure server as equivalent to {"direction": "UP"}? Also should this lowercase presentation of "up" be preserved when re-serialized?

(Previously discussed #184)

Option 1: lenient deserialization, round-trip preserves case

The enum values RIGHT, right, rIgHt should all deserialize without error and be equivalent to a canonical enum value of Direction.RIGHT. Re-serializing these values should produce the same input strings: RIGHT, right and rIgHt.

Clearly this option minimizes the chance of servers rejecting an input. However, it does require all Conjure language implementations to store the original casing information for each enum to enable re-serialization, requiring slightly higher memory overhead for each enum instance. This is not currently supported by conjure-java and would be a significant undertaking to add to conjure-typescript (as typescript enums are currently pure strings and there isn't a convenient way of overriding equality in typescript so that a switch statement would match both rIgHt and RIGHT).

Option 2: lenient deserialization, no round-trip

This is conjure-java's behaviour, as of 2.6.2.

    @Test
    public void testEnumCasingDeserializationInvariantToInputCase() throws Exception {
        assertThat(mapper.readValue("\"ONE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"one\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"onE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"oNE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
    }

    @Test
    public void enumSerializationDoesntPreserveCase() throws Exception {
        assertThat(mapper.writeValueAsString(mapper.readValue("\"one\"", EnumExample.class))).isEqualTo("\"ONE\"");
        assertThat(mapper.writeValueAsString(mapper.readValue("\"ONE\"", EnumExample.class))).isEqualTo("\"ONE\"");
        assertThat(mapper.writeValueAsString(mapper.readValue("\"onE\"", EnumExample.class))).isEqualTo("\"ONE\"");
    }

Notably conjure-typescript does not support this lenient deserialization as there isn't a serialization layer yet (palantir/conjure-typescript#48). As such, I think we can be sure that no conjure-typescript clients tolerate receiving lowercase enums from a server, even if they may somehow still be sending them.

Option 3: strict deserialization

A request containing right would be considered invalid, and would be rejected by the server. Clients could be certain that all enum responses would be uppercase. This minimizes the amount of memory necessary to store enums internally and also makes new language implementations simpler.

Implementation for conjure-java: palantir/conjure-java#143

@iamdanfox
Copy link
Contributor Author

If we go with the strict deserialization (option 3):

  • I don't expect this to impact frontends much as they have always received uppercase enums from servers (this is how conjure-java's serialization works and TypeScript switch statements only work with uppercase enums.
  • We need to be very careful of the scenario where lowercase enums have been persisted to a database. If we change the behaviour of conjure-java then products might start silently deserializing these objects into 'UNKNOWN' variants instead of the expected real values. (Making conjure-java throw if a lowercase character is encountered would probably ameliorate this)

@sfackler
Copy link
Member

I'd lean towards specifying option 3, personally. However, it seems reasonable to have conjure-java continue deviating from that behavior until we can be confident that its current case-conversion behavior isn't being used in practice in the field.

@carterkozak
Copy link
Contributor

I'm in favor of option 3 as well -- I don't like complicated things.

carterkozak added a commit to palantir/conjure-verification that referenced this issue Jan 10, 2019
Enum values are not allowed to use spaces per the spec.
This change does not add the negative test while we wait on
a resolution to palantir/conjure#193
carterkozak added a commit to palantir/conjure-verification that referenced this issue Jan 10, 2019
Enum values are not allowed to use spaces per the spec.
This change does not add the negative test while we wait on
a resolution to palantir/conjure#193
bulldozer-bot bot pushed a commit to palantir/conjure-verification that referenced this issue Jan 11, 2019
Enum values are not allowed to use spaces per the spec.
This change does not add the negative test while we wait on
a resolution to palantir/conjure#193
@markelliot
Copy link
Contributor

I'm also a fan of simple things -- and believe it's simpler in implementation to say enum values are case-insensitive, that implementations will always naturally produce values in uppercase, than it is to special-case handling in all languages that the deserialized values are uppercase, and may fill an UNKNOWN value.

@carterkozak
Copy link
Contributor

Should we differentiate between unknown and unknowable?
The value SOME_ENUM_VALUE could potentially match across api versions (unknown) but Some Enum 💯 value is unknowable and we are confident that it is not, nor ever will be part of the enum.

I would prefer not to deviate handling enum and union key interpretation. It would be odd if one were case senseitive and the other not. Do we have any prior art in the conjure spec for case insensitivity?

@iamdanfox
Copy link
Contributor Author

I'd like to close this out and go with option (3), as implemented palantir/conjure-java#143 - we've been shoring up the spec throughout the open-sourcing process and I think this is just one more under-defined topic to iron out.

I'm confident that this won't cause lots of disruption for users because TypeScript users have never been able to switch on lowercase enum values, so I don't think they've ever been used.

@markelliot
Copy link
Contributor

markelliot commented Jan 14, 2019 via email

@sfackler
Copy link
Member

Yeah, I think it makes sense to limit the unknown variant handling to strings which could actually be valid but unknown variants. That's just an ASCII SCREAMING_SNAKE_CASE check, right?

@sfackler
Copy link
Member

The same goes for unknown union variant names too, I suppose.

@markelliot
Copy link
Contributor

markelliot commented Jan 14, 2019 via email

@sfackler
Copy link
Member

Oh yeah - I meant that we should make sure that unknown union variant names are eligible to be union variant names. The check would be for snakeCase for those I believe rather than SCREAMING_SNAKE_CASE.

@carterkozak
Copy link
Contributor

lowerCamelCase

@sfackler
Copy link
Member

Er yeah, that >_>

@carterkozak
Copy link
Contributor

I’m also unsure how we’d cleanly coalesce union fields and enums to have the same casing.

We're already blocked from supporting this by existing typescript clients which require exact matches on both enum values and union type names.

@iamdanfox
Copy link
Contributor Author

For context, @markelliot raised concerns in a slack thread that the newly imposed enum strictness imposes an unfair burden on any existing user-facing services we have that are relying on the old lenient behaviour.

palantir/conjure-java#232 is a feature flag which would allow existing services to keep the old behaviour around.

bulldozer-bot bot pushed a commit to palantir/conjure-java that referenced this issue Feb 18, 2019
## Before this PR

Pre-2.8.0, conjure-java generated very lenient code: an enum defined as `[BLUE, GREEN]` would accept values `blue`, `bLuE` and `bLUE` as BLUE. It would also tolerate unknown values, such as "RED", "", "!@%^£".

When looking into tightening up the [Conjure spec](palantir/conjure#193), we decided that all conjure languages should not be required to implement this behaviour. In 2.8.0, we tightened up the validation of conjure-generated enums, I expected this to be safe because conjure-typescript only ever emitted uppercase values. Unfortunately, this caused P0s at some products where bad values had already been persisted into databases, from non-typescript origins.

To alleviate some of this pain, we added a [`--useInsensitiveEnums`](#232) flag to bring back the old behaviour. Unfortunately, this would still throw when deserializing strange values like `""`, which have already been persisted to databases. Also, existing products do not know whether they have nonsensical values in their database already, and realistically do not have serialization tests that exercise every enum edge case, so they didn't know whether they needed to opt-in.

## After this PR

To avoid causing lots more P0s and keeping feature flags forever, this PR reverts conjure-java's codegen to the old pre-2.8.0 behaviour.  The Conjure spec still says only deserialize and serialize UPPERCASE values, but this PR intentionally diverges from the spec in order to avoid causing pain for existing products.

We think this will be safe for inter-language communication because a string `"bLuE"` is deserialized as `MyEnum.BLUE` and re-serialized as `"BLUE"` so typescript clients keep working.

In the future, we may want to implement a feature where servers can reject all 'unknown' values in incoming requests.


cc @diogoholanda @markelliot @cakofony
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants