Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem defining string with literal '\' #1895

Closed
mvhensbergen opened this issue Oct 16, 2024 · 4 comments · Fixed by #1907
Closed

Problem defining string with literal '\' #1895

mvhensbergen opened this issue Oct 16, 2024 · 4 comments · Fixed by #1907
Assignees

Comments

@mvhensbergen
Copy link

Hi team,

I am new to spicy and am not sure if I am doing something wrong or if there is an issue with spicy.

I am trying to attempt to work on zeek/zeek#3955 where I need to implement escaping. In short, I need to escape the character '(' to the three byte string "\28". So this is a literal "\" and must not be interpreted as an escape code.

This made me create an escaping function where I need to define the replacement string "\28" but no matter what I try, I can't seem to get this to work. Consider the example script below:

module test;

global a : string = "a";
global b : string = "\\";
global c : string = "\a" + "test";
global d : string = "\\a";
global e : string = "\\"+ "28";
global f : string = "\\28";

print(a);
print(b);
print(c);
print(d);
print(e);
print(f);

I would assume that the variable b would be printed on the console as \ but it is printed as \\ (even though it is c++ compliant escaping). Also, e and f output \\28.

Trying other type of syntaxes such as:

global b : string = "\";
global b : string = "\\";
global b : string = '\';
global b : string = "\28";

all either do not compile or give me the double \\ result.

Eventually these double literal slashes also end up in the zeek logs and thus are erroneous.

Am I doing something wrong?

@evantypanski
Copy link
Contributor

evantypanski commented Oct 16, 2024

I think this is a little unintuitive on Spicy's end, but try putting this in a parser. For example, I made this:

# i1895.spicy
module I1895;

public type Data = unit {
    before: bytes &until=b"\\28";
    after: bytes &until=b"\\29";

    on %done { print self; }
};

Notice that it's bytes as the type and just the slash is escaped.

Try with some data, I grabbed what came from that issue and escaped it to get:

$ printf "name=Test\\\\28Admin\!\\\\29"
name=Test\28Admin!\29

The escaping is weird because you have to double-escape some things when it goes through bash, but you can see the actual bytes in prints are on the inside of the parens in that issue (so the inside part of (&(name=Test\28Admin!\29)))

Then just shove that into the spicy parser:

$ printf "name=Test\\\\28Admin\!\\\\29" | spicy-driver i1895.spicy
[$before=b"name=Test", $after=b"Admin!"]

That seems like what you want to me, does that seem accurate?

EDIT: I just reread and see you're making a function to do that escaping, here's an example of the parser that uses that escaping in order to find the lparen and rparen. Basically, print may lie to you (since it escapes when printing even if the underlying bytes aren't), but the parser won't :)

# i1895.spicy
module I1895;

function escape(my_bytes: bytes): bytes {
    local escaped: bytes;
    for ( byte in my_bytes ) {
        switch (byte) {
            case '(': escaped += b"\\28";
            case ')': escaped += b"\\29";
            default: escaped += byte;
        }
    }
    return escaped;
}

public type Data = unit {
    before: bytes &until=escape(b"(");
    after: bytes &until=escape(b")");

    on %done { print self; }
};

@rsmmr
Copy link
Member

rsmmr commented Oct 17, 2024

Basically, print may lie to you

Yeah, this indeed seems like the underlying issue here, Spicy is storing things correctly, but is "overescaping" on output.

Compare these two prints:

module test;

import spicy;

global x = b"\\";

print x;
print spicy::bytes_to_hexstring(x);
# spicyc -j a.spicy
\\
5C

We should fix this but it probably needs a bit of care to understand when we do need to escape and when not (because string rendering is used in different contexts).

@mvhensbergen
Copy link
Author

Hi all,

thank you for your swift reactions to this issue. I think I will postpone my attempt at the original escaping issue for LDAP until this issue has been resolved. I don't think I currently have an idea on how to fix this properly.

Let me know if you require any more input!

Regards

Martin

@mvhensbergen
Copy link
Author

In relation to the above discussion, I find the following line suspect in the bytes.h:

rval += Bytes(hilti::rt::to_string_for_print(parts[i]));

I would expect the join function of bytes return the joined bytes, but instead it returns printable strings (as is also indicated in the docstring).

If a function uses join on bytes that contain special characters there is a risk of them getting escaped implicitly.

@rsmmr rsmmr self-assigned this Oct 22, 2024
@rsmmr rsmmr closed this as completed in edf96cb Oct 25, 2024
rsmmr added a commit that referenced this issue Oct 25, 2024
* origin/topic/robin/gh-1895-escaping:
  Do no longer escape backslashes when printing strings or bytes.
  Rename `expandEscapes()` to `expandUTF8Escapes`.
  Introduce style flags for functions rendering values into strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants