Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] StringReader.readUnquotedString() does not support non-ASCII characters #103

Open
Pante opened this issue Nov 7, 2021 · 4 comments · May be fixed by #131 or #145
Open

[BUG] StringReader.readUnquotedString() does not support non-ASCII characters #103

Pante opened this issue Nov 7, 2021 · 4 comments · May be fixed by #131 or #145

Comments

@Pante
Copy link

Pante commented Nov 7, 2021

I'm the maintainer of Chimera, a library that allows the Brigadier command framework to be used in Spigot plugins. Recently, a developer opened an interesting issue in which they reported that argument parsing failed when non-ASCII characters were specified.

Sample error

Looking into the issue, the cause of the argument parsing failing for non-ASCII characters can be traced to:

    // StringArgumentType's parse() method
    @Override
    public String parse(final StringReader reader) throws CommandSyntaxException {
        if (type == StringType.GREEDY_PHRASE) {
            final String text = reader.getRemaining();
            reader.setCursor(reader.getTotalLength());
            return text;
        } else if (type == StringType.SINGLE_WORD) {
            return reader.readUnquotedString(); // <-- calls this
        } else {
            return reader.readString();
        }
    }
    
    // StringReader's readUnquotedString() method
    public String readUnquotedString() {
        final int start = cursor;
        while (canRead() && isAllowedInUnquotedString(peek())) { // <-- calls this
            skip();
        }
        return string.substring(start, cursor);
    }
    
    // Cause of failure
    public static boolean isAllowedInUnquotedString(final char c) {
        return c >= '0' && c <= '9'
            || c >= 'A' && c <= 'Z'
            || c >= 'a' && c <= 'z'
            || c == '_' || c == '-'
            || c == '.' || c == '+';
    }

This affects StringArgumentType.word(), StringArgumentType.string(), StringReader.readString() and StringReader.readUnquotedString() and any other dependencies on these classes/methods.

Maybe I'm missing some context but it seems weird to only allow a small subset of ASCII characters. In my opinion, this implementation of StringReader.readUnquotedString() is extremely wonky and should be refactored to support non-ASCII characters.

In the interim, I decided to create a simple utility method that reads strings until a whitespace is encountered. It behaves similarly to StringReader.readUnquotedString() while supporting special characters.

Source

    public static String unquoted(StringReader reader) {
        var start = reader.getCursor();
        while (reader.canRead() && reader.peek() != ' ') {
            reader.skip();
        }
        
        return reader.getString().substring(start, reader.getCursor());
    }
@vdvman1
Copy link

vdvman1 commented Nov 7, 2021

Unquoted strings are intentionally only a certain set of characters, anything beyond those are considered "special characters" and require quoted strings

This is to avoid unquoted strings greedily parsing things that are meant to be parsed as other syntax

@Pante
Copy link
Author

Pante commented Nov 8, 2021

Unquoted strings are intentionally only a certain set of characters, anything beyond those are considered "special characters" and require quoted strings

This is to avoid unquoted strings greedily parsing things that are meant to be parsed as other syntax

Would you mind elaborating on the second point please? I can't think of a case where parsing special characters may unintentionally greedily parse arguments. Even if that was the case, it seems like a abuse of StringReader.readStringQuote() where StringReader.readStringUntil(char) should be used instead.

It also seems extremely unintuitive that "топчик" will be rejected, it doesn't really contain any "special characters" that will need to be avoided to prevent greedy parsing.

@vdvman1
Copy link

vdvman1 commented Nov 8, 2021

I agree that it should be expanded to better support international text.

Unquoted strings are used in SNBT values in compounds, where the character after could be either a , for another key:value pair, or a } for ending the compound.
That could indeed be fixed by a more general readStringUntil that can accept multiple characters, this is just the reason for the original definition of readUnquotedString

@ghost
Copy link

ghost commented Nov 8, 2021

I believe that it is necessary to add support for at least Cyrillic, since many servers in post-Soviet countries use Cyrillic commands and a lot of players complain that they have to write " characters so that the commands works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants