-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: phishing detection #9610
Feat: phishing detection #9610
Conversation
} | ||
|
||
private function isLink(string $text): bool { | ||
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not using a custom Regexp.
But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).
If I'm correct then FILTER_VALIDATE_URL
would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something
. Additionally it doesn't accept URLs containing non-ASCII characters.
I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?
Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gist.github.com/gruber/8891611 maybe this can be a more robust solution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text_Filter is based on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the original source, yes. But a library might bring the benefit that possible changes (there's a list of TLDs in there, which may change) are tended for by multiple people/projects – I'm unsure about the Horde libs in this regard, though.
use OCA\Mail\AddressList; | ||
use OCA\Mail\Contracts\ITrustedSenderService; | ||
|
||
class PhishingDetectionService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this new class needs tests
} | ||
foreach ($zippedArray as $zipped) { | ||
if($this->isLink($zipped['linkText'])) { | ||
if (str_contains($zipped['linkText'], $zipped['href']) === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use a library to normalize the href-content, because encodings and funny details (e.g. data-URLs) can easily break this comparison even if both parts actually reference the same resource.
Also currently this check would flag a link that e.g. doesn't contain its URL-scheme in its link text, which seems a bit harsh to me? (Example that would be flagged: <a href="https://cloud.nextcloud.com/">cloud.nextcloud.com</a>
).
} | ||
|
||
private function isLink(string $text): bool { | ||
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not using a custom Regexp.
But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).
If I'm correct then FILTER_VALIDATE_URL
would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something
. Additionally it doesn't accept URLs containing non-ASCII characters.
I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?
Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?
]; | ||
} | ||
foreach ($zippedArray as $zipped) { | ||
if($this->isLink($zipped['linkText'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches only if the link text only contains one URL-like "word". An anchor element with a link text like "login at cloud.nextcloud.com" would not be checked in this code.
Great presentation today 👍 Sorry, I didn't have the opportunity to review the pr.
|
71db744
to
88e87cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not test but left some comments on the code.
815468d
to
450ceaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
tests/Integration/Service/Phishing/PhishingDetectionServiceIntegrationTest.php
Outdated
Show resolved
Hide resolved
could you check test coverage of new classes? |
|
Can we go to 100? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
5756d51
to
a12538d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
648cf22
to
3690030
Compare
Signed-off-by: Hamza Mahjoubi <[email protected]>
3690030
to
9b87b23
Compare
Ref #9453