Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union of Enum and String types throws an error #345

Closed
damion-add-on-depot opened this issue Apr 28, 2021 · 4 comments
Closed

Union of Enum and String types throws an error #345

damion-add-on-depot opened this issue Apr 28, 2021 · 4 comments

Comments

@damion-add-on-depot
Copy link

damion-add-on-depot commented Apr 28, 2021

I created an avro schema with an enumerated type, but in some cases, JSON data for the property associated with the type is set as an empty string. I figured I could work around that by creating a union as follows:

{
    "type":[
        {
            "type": "enum",
            "name": "WeekDay",
            "symbols": ["sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday"]
        },
        "string"
    ],
    "name": "weekday"
}

But I got the following error when parsing the source data, where the weekday property was set to an empty string:

Error: invalid [{"name":"WeekDay","type":"enum","symbols":["sunday","monday","tuesday","wednesday","thursday","friday","saturday"]},"string"]: ""

Is that the intended behavior?

I read the Avro spec and it appears that it should be possible to create a union of types enum and string.

@damion-add-on-depot
Copy link
Author

Did some debugging and narrowed in on the following function:

WrappedUnionType.prototype._write = function (tap, val) {
    var index, keys, name;
    if (val === null) {
        index = this._branchIndices["null"];
        if (index === undefined) {
            throwInvalidError(val, this)
        }
        tap.writeLong(index)
    } else {
        keys = Object.keys(val);
        if (keys.length === 1) {
            name = keys[0];
            index = this._branchIndices[name]
        }
        if (index === undefined) {
            throwInvalidError(val, this)
        }
        tap.writeLong(index);
        this.types[index]._write(tap, val[name])
    }
};

This function/method throws an error for unions with enum and string types.

@damion-add-on-depot
Copy link
Author

damion-add-on-depot commented Apr 28, 2021

Came up with a workaround.

I abandoned trying to get a union of types enum and string to work and focused on making empty strings a valid symbol for enums.

Updated the lib/util.js module with the following changes:

// add a new regex pattern specific to enum symbols that matches names and the empty string
var SYMBOL_PATTERN = /^$|^[A-Za-z_][A-Za-z0-9_]*$/;
.
.
.
// add a new function to validate symbol names
/** Check whether a string is a valid Avro enum symbol identifier.
function isValidSymbol(str) { return SYMBOL_PATTERN.test(str); }
.
.
.
module.exports = {
   .
   .
   .
   isValidSymbol: isValidSymbol
   .
   .
   .
};

And finally, update the lib/types.js module to use the isValidSymbol function for the EnumType class as follows:

function EnumType(schema, opts) {
    Type.call(this, schema, opts);
    if (!Array.isArray(schema.symbols) || !schema.symbols.length) {
        throw new Error(f("invalid enum symbols: %j", schema.symbols))
    }
    this.symbols = Object.freeze(schema.symbols.slice());
    this._indices = {};
    this.symbols.forEach(function (symbol, i) {
        if (!utils.isValidSymbol(symbol)) {
            throw new Error(f("invalid %s symbol: %j", this, symbol))
        }
        if (this._indices[symbol] !== undefined) {
            throw new Error(f("duplicate %s symbol: %j", this, symbol))
        }
        this._indices[symbol] = i
    }, this);
    this._branchConstructor = this._createBranchConstructor();
    Object.freeze(this)
}

Again, this is a hack but it will suffice until the issue with the union of enum and string types is resolved.

@mtth
Copy link
Owner

mtth commented Apr 29, 2021

Unions with string and enum types are supported. Their values just need to be wrapped to ensure that we don't lose branch information. This is necessary as the default unwrapped representation is identical for both types.

Here are examples of valid values for the union above:

const str = {string: ''}; // Empty string.
const day = {WeekDay: 'monday'}; // Enum value.

Take a look at this comment for more context as well as a couple examples.

@damion-add-on-depot
Copy link
Author

Thank you for the solutions! The explanation about logical types from the comment you referenced should really come in handy for more complex types.

@mtth mtth closed this as completed May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants