Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: phishing detection #9610

Merged
merged 1 commit into from
Jun 27, 2024
Merged

Feat: phishing detection #9610

merged 1 commit into from
Jun 27, 2024

Conversation

hamza221
Copy link
Contributor

@hamza221 hamza221 commented Apr 29, 2024

Ref #9453

}

private function isLink(string $text): bool {
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not using a custom Regexp.

But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).

If I'm correct then FILTER_VALIDATE_URL would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something. Additionally it doesn't accept URLs containing non-ASCII characters.

I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?

Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/gruber/8891611 maybe this can be a more robust solution ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text_Filter is based on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the original source, yes. But a library might bring the benefit that possible changes (there's a list of TLDs in there, which may change) are tended for by multiple people/projects – I'm unsure about the Horde libs in this regard, though.

use OCA\Mail\AddressList;
use OCA\Mail\Contracts\ITrustedSenderService;

class PhishingDetectionService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new class needs tests

lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
}
foreach ($zippedArray as $zipped) {
if($this->isLink($zipped['linkText'])) {
if (str_contains($zipped['linkText'], $zipped['href']) === false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use a library to normalize the href-content, because encodings and funny details (e.g. data-URLs) can easily break this comparison even if both parts actually reference the same resource.

Also currently this check would flag a link that e.g. doesn't contain its URL-scheme in its link text, which seems a bit harsh to me? (Example that would be flagged: <a href="https://cloud.nextcloud.com/">cloud.nextcloud.com</a>).

}

private function isLink(string $text): bool {
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not using a custom Regexp.

But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).

If I'm correct then FILTER_VALIDATE_URL would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something. Additionally it doesn't accept URLs containing non-ASCII characters.

I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?

Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?

lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
];
}
foreach ($zippedArray as $zipped) {
if($this->isLink($zipped['linkText'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches only if the link text only contains one URL-like "word". An anchor element with a link text like "login at cloud.nextcloud.com" would not be checked in this code.

@kesselb
Copy link
Contributor

kesselb commented May 28, 2024

Great presentation today 👍

Sorry, I didn't have the opportunity to review the pr.

  1. I'm uncertain about the reply-to warning. The purpose of reply-to is to send the response to a different email. I wonder if we want to use a less scary warning for such a case. For example, our calendar invitations are also using reply-to.

  2. The PhishingDetectionService reminds me a bit about our CheckSetupController in server. I think we could move every check to an own class. Testing is also easier than. Wdyt @ChristophWurst?

@hamza221 hamza221 mentioned this pull request May 29, 2024
17 tasks
Copy link
Member

@st3iny st3iny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test but left some comments on the code.

lib/IMAP/ImapMessageFetcher.php Outdated Show resolved Hide resolved
lib/PhishingDetectionResult.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/ReplyToCheck.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Show resolved Hide resolved
lib/IMAP/ImapMessageFetcher.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Outdated Show resolved Hide resolved
lib/PhishingDetectionResult.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/DateCheck.php Outdated Show resolved Hide resolved
lib/Address.php Show resolved Hide resolved
@hamza221 hamza221 force-pushed the feat/phishing-detection branch 2 times, most recently from 815468d to 450ceaf Compare June 3, 2024 17:32
Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

lib/Service/PhishingDetection/ContactCheck.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/DateCheck.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/DateCheck.php Show resolved Hide resolved
@ChristophWurst
Copy link
Member

codecov/project — 43.34% (-6.87%) compared to c6e28fd

could you check test coverage of new classes?

@hamza221
Copy link
Contributor Author

codecov/project — 43.34% (-6.87%) compared to c6e28fd

could you check test coverage of new classes?

image

@ChristophWurst
Copy link
Member

Can we go to 100? :)

@hamza221
Copy link
Contributor Author

image

Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

lib/IMAP/ImapMessageFetcher.php Outdated Show resolved Hide resolved
Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Signed-off-by: Hamza Mahjoubi <[email protected]>
@hamza221 hamza221 merged commit 5c86fe4 into main Jun 27, 2024
36 checks passed
@hamza221 hamza221 deleted the feat/phishing-detection branch June 27, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants