Enum case sensitivity #193

iamdanfox · 2019-01-10T11:34:43Z

Motivation

From discussion on palantir/conjure-java#134, it appears that the case sensitivity of Conjure enums is not defined precisely enough in the wire format spec. @markelliot provided context that early adopters may still be relying on the case insensitivity of enums. This needs to be defined precisely so that new languages (e.g. Rust) can be implemented correctly and precise positive and negative test cases can be added to the conjure-verification server.

Given an enum definition:

Direction:
  values:
  - UP
  - DOWN
  - LEFT
  - RIGHT

Should a message of {"direction": "up"} be accepted by a Conjure server as equivalent to {"direction": "UP"}? Also should this lowercase presentation of "up" be preserved when re-serialized?

(Previously discussed #184)

Option 1: lenient deserialization, round-trip preserves case

The enum values RIGHT, right, rIgHt should all deserialize without error and be equivalent to a canonical enum value of Direction.RIGHT. Re-serializing these values should produce the same input strings: RIGHT, right and rIgHt.

Clearly this option minimizes the chance of servers rejecting an input. However, it does require all Conjure language implementations to store the original casing information for each enum to enable re-serialization, requiring slightly higher memory overhead for each enum instance. This is not currently supported by conjure-java and would be a significant undertaking to add to conjure-typescript (as typescript enums are currently pure strings and there isn't a convenient way of overriding equality in typescript so that a switch statement would match both rIgHt and RIGHT).

Option 2: lenient deserialization, no round-trip

This is conjure-java's behaviour, as of 2.6.2.

    @Test
    public void testEnumCasingDeserializationInvariantToInputCase() throws Exception {
        assertThat(mapper.readValue("\"ONE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"one\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"onE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
        assertThat(mapper.readValue("\"oNE\"", EnumExample.class)).isEqualTo(EnumExample.ONE);
    }

    @Test
    public void enumSerializationDoesntPreserveCase() throws Exception {
        assertThat(mapper.writeValueAsString(mapper.readValue("\"one\"", EnumExample.class))).isEqualTo("\"ONE\"");
        assertThat(mapper.writeValueAsString(mapper.readValue("\"ONE\"", EnumExample.class))).isEqualTo("\"ONE\"");
        assertThat(mapper.writeValueAsString(mapper.readValue("\"onE\"", EnumExample.class))).isEqualTo("\"ONE\"");
    }

Notably conjure-typescript does not support this lenient deserialization as there isn't a serialization layer yet (palantir/conjure-typescript#48). As such, I think we can be sure that no conjure-typescript clients tolerate receiving lowercase enums from a server, even if they may somehow still be sending them.

Option 3: strict deserialization

A request containing right would be considered invalid, and would be rejected by the server. Clients could be certain that all enum responses would be uppercase. This minimizes the amount of memory necessary to store enums internally and also makes new language implementations simpler.

Implementation for conjure-java: palantir/conjure-java#143

The text was updated successfully, but these errors were encountered:

iamdanfox · 2019-01-10T15:26:13Z

If we go with the strict deserialization (option 3):

I don't expect this to impact frontends much as they have always received uppercase enums from servers (this is how conjure-java's serialization works and TypeScript switch statements only work with uppercase enums.
We need to be very careful of the scenario where lowercase enums have been persisted to a database. If we change the behaviour of conjure-java then products might start silently deserializing these objects into 'UNKNOWN' variants instead of the expected real values. (Making conjure-java throw if a lowercase character is encountered would probably ameliorate this)

sfackler · 2019-01-10T18:27:14Z

I'd lean towards specifying option 3, personally. However, it seems reasonable to have conjure-java continue deviating from that behavior until we can be confident that its current case-conversion behavior isn't being used in practice in the field.

carterkozak · 2019-01-10T19:14:03Z

I'm in favor of option 3 as well -- I don't like complicated things.

Enum values are not allowed to use spaces per the spec. This change does not add the negative test while we wait on a resolution to palantir/conjure#193

markelliot · 2019-01-11T16:25:56Z

I'm also a fan of simple things -- and believe it's simpler in implementation to say enum values are case-insensitive, that implementations will always naturally produce values in uppercase, than it is to special-case handling in all languages that the deserialized values are uppercase, and may fill an UNKNOWN value.

carterkozak · 2019-01-11T18:06:29Z

Should we differentiate between unknown and unknowable?
The value SOME_ENUM_VALUE could potentially match across api versions (unknown) but Some Enum 💯 value is unknowable and we are confident that it is not, nor ever will be part of the enum.

I would prefer not to deviate handling enum and union key interpretation. It would be odd if one were case senseitive and the other not. Do we have any prior art in the conjure spec for case insensitivity?

iamdanfox · 2019-01-14T17:10:44Z

I'd like to close this out and go with option (3), as implemented palantir/conjure-java#143 - we've been shoring up the spec throughout the open-sourcing process and I think this is just one more under-defined topic to iron out.

I'm confident that this won't cause lots of disruption for users because TypeScript users have never been able to switch on lowercase enum values, so I don't think they've ever been used.

markelliot · 2019-01-14T19:46:19Z

Will closing this out also involve updating all implementations to explicitly treat received lowercase values with an explicit, differentiated error from unknown but uppercase values?

sfackler · 2019-01-14T21:15:10Z

Yeah, I think it makes sense to limit the unknown variant handling to strings which could actually be valid but unknown variants. That's just an ASCII SCREAMING_SNAKE_CASE check, right?

sfackler · 2019-01-14T21:16:35Z

The same goes for unknown union variant names too, I suppose.

markelliot · 2019-01-14T21:24:37Z

I don’t think union fields have quite the same set of restrictions. As a result, if we error on casing (vs choosing to make enums case-insensitive), I’m also unsure how we’d cleanly coalesce union fields and enums to have the same casing.

…

________________________________ From: Steven Fackler <[email protected]> Sent: Monday, January 14, 2019 9:17 PM To: palantir/conjure Cc: Mark Elliot; Mention Subject: Re: [palantir/conjure] Enum case sensitivity (#193) The same goes for unknown union variant names too, I suppose. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_palantir_conjure_issues_193-23issuecomment-2D454164444&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=Vp81aOxWZvuVgVK_wp-VF3pIYG92B19LcCw6XKeYC0U&m=oYhXl0Ia97r8mYlsy_I5YTXdfg07pkH3B_aZ9cBI3bs&s=SaLjxFY2Gz4XNs8nMEgp2NLTSTLlS6qenSdt-PrPXEY&e=>, or mute the thread [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAp3Qz2fPIvwoHHBGVPa9MRiR-2D8uNOcZks5vDPOzgaJpZM4Z5UMK&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=Vp81aOxWZvuVgVK_wp-VF3pIYG92B19LcCw6XKeYC0U&m=oYhXl0Ia97r8mYlsy_I5YTXdfg07pkH3B_aZ9cBI3bs&s=tYH7oO4rvzZEJd4WJoRlHnNlGEJpszv09NBDHNqo0wQ&e=>.

sfackler · 2019-01-14T21:38:08Z

Oh yeah - I meant that we should make sure that unknown union variant names are eligible to be union variant names. The check would be for snakeCase for those I believe rather than SCREAMING_SNAKE_CASE.

carterkozak · 2019-01-14T21:38:53Z

lowerCamelCase

sfackler · 2019-01-14T21:41:42Z

Er yeah, that >_>

carterkozak · 2019-01-15T16:16:17Z

I’m also unsure how we’d cleanly coalesce union fields and enums to have the same casing.

We're already blocked from supporting this by existing typescript clients which require exact matches on both enum values and union type names.

iamdanfox · 2019-02-11T18:19:27Z

For context, @markelliot raised concerns in a slack thread that the newly imposed enum strictness imposes an unfair burden on any existing user-facing services we have that are relying on the old lenient behaviour.

palantir/conjure-java#232 is a feature flag which would allow existing services to keep the old behaviour around.

@diogoholanda

## Before this PR Pre-2.8.0, conjure-java generated very lenient code: an enum defined as `[BLUE, GREEN]` would accept values `blue`, `bLuE` and `bLUE` as BLUE. It would also tolerate unknown values, such as "RED", "", "!@%^£". When looking into tightening up the [Conjure spec](palantir/conjure#193), we decided that all conjure languages should not be required to implement this behaviour. In 2.8.0, we tightened up the validation of conjure-generated enums, I expected this to be safe because conjure-typescript only ever emitted uppercase values. Unfortunately, this caused P0s at some products where bad values had already been persisted into databases, from non-typescript origins. To alleviate some of this pain, we added a [`--useInsensitiveEnums`](#232) flag to bring back the old behaviour. Unfortunately, this would still throw when deserializing strange values like `""`, which have already been persisted to databases. Also, existing products do not know whether they have nonsensical values in their database already, and realistically do not have serialization tests that exercise every enum edge case, so they didn't know whether they needed to opt-in. ## After this PR To avoid causing lots more P0s and keeping feature flags forever, this PR reverts conjure-java's codegen to the old pre-2.8.0 behaviour. The Conjure spec still says only deserialize and serialize UPPERCASE values, but this PR intentionally diverges from the spec in order to avoid causing pain for existing products. We think this will be safe for inter-language communication because a string `"bLuE"` is deserialized as `MyEnum.BLUE` and re-serialized as `"BLUE"` so typescript clients keep working. In the future, we may want to implement a feature where servers can reject all 'unknown' values in incoming requests. cc @diogoholanda @markelliot @cakofony

iamdanfox added the wire format label Jan 10, 2019

iamdanfox mentioned this issue Jan 10, 2019

Remove broken enum test cases palantir/conjure-verification#111

Merged

carterkozak mentioned this issue Jan 10, 2019

Fix MapEnumExampleAlias test case palantir/conjure-verification#114

Merged

iamdanfox mentioned this issue Jan 21, 2019

Generated enum valueOf method is incorrectly case-insensitive palantir/conjure-java#134

Closed

This was referenced Feb 12, 2019

Add the '--useInsensitiveEnums' flag to allow case-insensitive enum values palantir/conjure-java#232

Merged

[fix] bring back old enum behaviour palantir/conjure-java#238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enum case sensitivity #193

Enum case sensitivity #193

iamdanfox commented Jan 10, 2019 •

edited

Loading

iamdanfox commented Jan 10, 2019

sfackler commented Jan 10, 2019

carterkozak commented Jan 10, 2019

markelliot commented Jan 11, 2019

carterkozak commented Jan 11, 2019

iamdanfox commented Jan 14, 2019

markelliot commented Jan 14, 2019 via email

sfackler commented Jan 14, 2019

sfackler commented Jan 14, 2019

markelliot commented Jan 14, 2019 via email

sfackler commented Jan 14, 2019

carterkozak commented Jan 14, 2019

sfackler commented Jan 14, 2019

carterkozak commented Jan 15, 2019

iamdanfox commented Feb 11, 2019

Enum case sensitivity #193

Enum case sensitivity #193

Comments

iamdanfox commented Jan 10, 2019 • edited Loading

Motivation

Option 1: lenient deserialization, round-trip preserves case

Option 2: lenient deserialization, no round-trip

Option 3: strict deserialization

iamdanfox commented Jan 10, 2019

sfackler commented Jan 10, 2019

carterkozak commented Jan 10, 2019

markelliot commented Jan 11, 2019

carterkozak commented Jan 11, 2019

iamdanfox commented Jan 14, 2019

markelliot commented Jan 14, 2019 via email

sfackler commented Jan 14, 2019

sfackler commented Jan 14, 2019

markelliot commented Jan 14, 2019 via email

sfackler commented Jan 14, 2019

carterkozak commented Jan 14, 2019

sfackler commented Jan 14, 2019

carterkozak commented Jan 15, 2019

iamdanfox commented Feb 11, 2019

iamdanfox commented Jan 10, 2019 •

edited

Loading