Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict validation must be able to be turned off #146

Open
rikvdh opened this issue Mar 24, 2021 · 13 comments
Open

Strict validation must be able to be turned off #146

rikvdh opened this issue Mar 24, 2021 · 13 comments

Comments

@rikvdh
Copy link

rikvdh commented Mar 24, 2021

Bug Report

I'm using this library for parsing mails in different mailboxes for a ticketing system. While reading mail from mailboxes I face all kinds of "invalid" mails. Malformed headers, malformed e-mail addresses, invalid mime-types for multi-part mails, invalid subjects (containing non ASCII characters). But, the most important thing is, I want to recover as much as I can from these message since most of the time it is irrelevant or is read correctly.

Since I cannot control what people throw at me I want to be able to disable the 'strict' validation.

Q A
Version(s) 2.14.0

Summary

See above.

Current behavior

See above.

How to reproduce

I can provide some mails which I got, but these contain personal information. Let me know if this is needed.

Expected behavior

Give me the e-mail (for as far as it can be parsed)

@rikvdh rikvdh added the Bug Something isn't working label Mar 24, 2021
@Slamdunk Slamdunk added Enhancement and removed Bug Something isn't working labels Mar 25, 2021
@Slamdunk
Copy link
Contributor

Hi, this seems to me more of an Enhancement than a Bug.
Please send me an example email to my address (you can find it here) so I can purge it from sensitive information and post it here as an example.

If you have more than one failing emails with different error, send me all of them in a zip please.

@glensc
Copy link
Contributor

glensc commented Mar 25, 2021

@Slamdunk it's really not that you need a specific email, figure what ever invalid crap, and try to load it.

I'm Eventum, I've created a separate loader for this, it tries to split headers in different ways (\r\n, \n, \r and combination), also renames the broken headers to X-Broken, but at least finishes loading the mail.

It took several iterations to get where it is now because people are very creative about what and how to break:

The header splitter has also own implementation because zend\mail was not able to handle variations.

It probably has test input examples too of broken emails, need to find each for the corresponding PR they were added, i.e find PR's of each MailLoader.php change;

Maybe this gets some ideas boiling, maybe the outcome is really not this library concern, maybe it instead could add a similar loader here with recovery mode like in eventum.

So far I did not have to patch invalid mime types, maybe because the app doesn't care about them so much.

@glensc
Copy link
Contributor

glensc commented Mar 25, 2021

@rikvdh as for email privacy, obfuscate them in the sense that it doesn't lose the value. usually not needed the exact value of the mail line. for mime body base64 parts, you can replace the content with base64_encode('test') or the like.

@rikvdh
Copy link
Author

rikvdh commented Mar 26, 2021

@glensc @Slamdunk I shall try to find them next week or so, obfuscate them en upload them here. :-)

@glensc Having a 'external mail loader' is an option. But I think it should be nice to have this feature included into 'mainline'.

@glensc
Copy link
Contributor

glensc commented Mar 26, 2021

@rikvdh please also include a minimal code how you load emails.

@glensc
Copy link
Contributor

glensc commented Apr 13, 2021

@rikvdh perhaps add Storage\Message to the title to limit scope of reading messages from storage systems (imap, pop3)

@rikvdh
Copy link
Author

rikvdh commented Apr 27, 2021

It took a while, sorry for that. @glensc I really don't load emails myself, I use IMAP and call the getMessage($id) method on the Storage\Imap class. But for the sake of completeness I've uploaded an example on how to load to Gist. Sorry for the idiotic GIFs in the message, I have no control of what people send us and don't want to alter the email too much.

https://gist.github.com/rikvdh/c6d30d4d98d326445babb22026dbdf7f

This is just one with an invalid header. I shall try to find more.

@vjainsugar
Copy link

We've been running into this problem too where Laminas throws exceptions for things like malformed headers. We would prefer to just get as much of the email as we can since we can't control what the sender sends. We've found that email clients like gmail and outlook accept these emails and display whatever they can. Would there be a way through configuration to disable throwing exceptions (and instead maybe log that an email has some problems) and still continue reading/parsing whatever it can?

@Slamdunk
Copy link
Contributor

Hi, sorry for the late reply.
This is indeed an issue that would involve a lot of added flags to turn off validations, and with no guarantee that the new code has valid behavior.
Note that I'm also the maintainer of ddeboer/imap so I also face a lot of invalid mails and know the issue.
@glensc indeed use in Eventum an effective approach.

I have a quick and extra-dirty solution, that not everyone will like, but can get you through: introduce a specific exception for each case with a reference to the original Header, so you can easily flatten or strip it out and re-try the parse:

// Given a new laminas-mail exception
class InvalidHeaderException
    extends \Laminas\Mail\Header\Exception\InvalidArgumentException
    implements \Laminas\Mail\Header\Exception\ExceptionInterface
{
    private $originalHeader;

    public function __construct($originalHeader, $message = "", $code = 0, Throwable $previous = null)
    {
        parent::__construct($message, $code, $previous);
        $this->originalHeader = $originalHeader;
    }
    
    public function getOriginalHeader(): string
    {
        return $this->originalHeader;
    }
}

// The user code can look like
function parseMessage(string $message): \Laminas\Mail\Storage\Message {
    try {
        return new \Laminas\Mail\Storage\Message(['raw' => $message]);
    } catch (InvalidHeaderException $invalidHeaderException) {
        $originalHeader = $invalidHeaderException->getOriginalHeader();
        // Record it
        $flattenedMessage = preg_replace(
            sprintf('s/^%s$/', preg_quote($originalHeader)),
            'X-Broken-Header: ' . preg_replace('s/\W/', '-', $originalHeader),
            $message
        );
        // OR remove it
        $flattenedMessage = preg_replace(
            sprintf('s/^%s$/', preg_quote($originalHeader)),
            '',
            $message
        );

        return parseMessage($flattenedMessage);
    }
};

$message = parseMessage(file_get_contents('foo.eml'));

WDYT?

@jbostoen
Copy link

jbostoen commented Nov 7, 2022

Also came across an issue like this with a non-compliant email address in the header, eg

Cc: "IMCEAEX-_o=ExchangeLabs_ou=Exchange+20Administrative+20Group+20+123_cn=Recipients_cn=user123@EURPRD10.PROD.OUTLOOK.COM"
 <IMCEAEX-_o=ExchangeLabs_ou=Exchange+20Administrative+20Group+20+123_cn=Recipients_cn=user123@EURPRD10.PROD.OUTLOOK.COM>

The idea would be to still be able to process this message; and ignore this invalid address.

I notice the laminas-validator does have some bypass options, but we need to be able to set them using laminas-mail.

@georgique
Copy link
Contributor

I was struggling with this too, so I had to extend storage (Pop3 and Imap), Storage\Message, Headers and Mime\Decode classes, so that I could override methods to try to parse headers and proceed in case of an error instead of breaking.
I have an idea of how much work it would be to make this entire process configurable as there are a lot of various components involved.
However, as an interim solution, what if we introduce an ability to appoint classes for handling various processes? For example, instead of calling Headers::fromString(), we introduce a property $headersClass and in the code do $this->headersClass::fromString(). That would make things easier for extending until the whole thing is reworked.
I can work on this and make a PR.

@maxiwheat
Copy link

I'm having issues transitioning to Laminas Mail from a very old Zend_Mail implementation (1.12). I am doing this because I began receiving deprecated warnings when using some Zend_Mail methods, so I tried migrating to Laminas, but this is preventing us to do so.

I read the content of an IMAP mailbox, and it can contain all sort of things, including badly formatted emails in headers. I would like to be able to still parse and use the Message object even if some email addresses are not "standard". Currently Here are the 2 types of exception I got :

  • [Laminas\Mail\Exception\InvalidArgumentException] The input is not a valid email address. Use the basic format local-part@hostname
  • [Laminas\Mail\Exception\InvalidArgumentException] The input exceeds the allowed length

Example of header causing the second type of Exception :

To: SRS0=7qz2=CI=cyberimpact.com=6ohVGXCfJz-WCBCweBT3KJqleSrKwEu2Q8IdqoSmty4_pKu6kQInaNU1-EFbB92C@notimportant.com

@Xerkus
Copy link
Member

Xerkus commented Jun 21, 2023

This might not have full email parsing. I vaguely remember mime body content parsing was not implemented and I do not think anyone was working on that since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants