Fast, robust, standards-compliant MIME decoder. Ships with extensive tests and fuzz tests.
npm install @ronomon/mime
-
Decodes on demand only as much as necessary to access a particular property. For example, if you need
mime.subject
, thenMIME
will search for theCRLF
pair marking the end of the headers and decode only thesubject
header, without decoding any other headers and without decoding the body. This works well with the first few layers of spam defenses, which often only need to decode particular headers to reject an email. -
Caches decoded properties for subsequent use.
-
Uses native fuzz-tested C++ Base64 and Quoted-Printable bindings.
MIME
's Base64 decoder in particular was developed for decoding wrapped Base64 more efficiently and detecting obvious corruption and character truncation. -
Uses custom lookup tables to minimize the cost of branching through too many conditionals when decoding.
-
Avoids unnecessary string and buffer allocations. Algorithms accept and work with buffers directly, and allocate and copy buffers only when necessary.
-
Avoids regular expression decoders.
-
Provides detailed error messages which refer to the relevant RFCs to assist debugging, and which can be used directly as part of an SMTP
reply
. -
Accepts
CRLF
andLF
line-endings (which are common) but notCR
line-endings (which are rare). -
Accepts illegal transport padding frequently added by intermediaries (e.g. within the angle brackets of a
msg-id
orangle-addr
, and between tokens in anencoded-word
). -
Decodes a variety of malformed but common mailbox syntaxes (e.g. no angle brackets around the
addr-spec
, with adisplay-name
present on the left or right). -
Removes balanced single quotes around the
display-name
oraddr-spec
in an email address (sometimes added by Outlook). -
Decodes
encoded-words
not separated byWSP
. -
Decodes
encoded-words
with emptyencoded-text
. -
Decodes
encoded-words
inContent-Type
andContent-Disposition
parameters (encoded by Outlook and Gmail contrary to RFC 2047 5 Use of encoded-words in message headers). -
Rejects
encoded-words
containing malicious "mailsploit" control characters. -
Removes any directory path components from an attachment name or filename (when accessed via
mime.filename
, see Usage below). -
Decodes
msg-ids
not separated by whitespace or commas (i.e. separated only by angle brackets). -
Rejects unrecognized
Content-Transfer-Encoding
mechanisms contrary to RFC 2045 6.4 (e.g. anything other than7bit
,8bit
,binary
,base64
, orquoted-printable
). This is to avoid accepting responsibility for content which will not display correctly, if at all. In contrast, the spec advocates silently altering theContent-Type
. -
Rejects malicious
RFC 2231
continuation indices designed to cause overallocation. -
Rejects Base64 data containing illegal characters (anything which is not a valid Base64 or whitespace character, e.g. null bytes which could cause security issues).
-
Rejects Base64 data which is clearly truncated (as opposed to just missing padding).
-
Corrects Quoted-Printable data containing illegal characters (anything which is not a valid Quoted-Printable character, e.g. null bytes which could cause security issues).
-
Rejects illegal character sequences according to the specified
charset
. -
Rejects truncated character sequences according to the specified
charset
. -
Normalizes and aliases a variety of character sets to the canonical character set, (e.g.
ks_c_5601-1987
is sometimes used by Outlook and is aliased toCP949
- Korean, otherwise the characters would decode from the wrong character set and be unintelligible). -
Rejects unknown character sets not supported by
iconv
. -
Decodes
text/*
body parts toUTF-8
buffers if theContent-Type
indicates that the body is encoded in any other character set. -
Rejects unterminated
comments
andquoted-strings
. -
Rejects invalid
Content-Type
syntax. -
Detects missing multipart parts (e.g. no terminating boundary delimiter).
-
Rejects dangerous
message/external-body
andmessage/partial
media types. -
Decodes a variety of time zones and year formats.
-
Accepts missing time zone and assumes UTC to support email clients such as Blackberry which do not provide the required time zone in the
Date
header. -
Rejects invalid
Date
header syntax. -
Rejects missing
From
header. -
Rejects headers containing forbidden characters.
-
Rejects folded header lines which exceed the 998 line length limit, but only after allowing for clients such as Outlook.com which exclude the
field-name
andcolon
from their character count, and which mistake the limit to be 1000 characters excluding the CRLF. The limit is in fact 998 characters excluding the CRLF. -
Rejects multipart boundaries containing forbidden characters.
-
Rejects malicious data designed to cause CPU-intensive decoding and stack overflows.
-
Rejects malicious multiple occurrences of crucial headers and parameters, which could cause clients to render an email differently from that scanned by anti-virus software.
-
RFC 5322 - Internet Message Format.
-
RFC 5321 - Simple Mail Transfer Protocol.
-
RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies.
-
RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types.
-
RFC 2047 - MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text.
-
RFC 2183 - The Content-Disposition Header Field.
-
RFC 2231 - MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations.
var MIME = require('@ronomon/mime');
// Instantiate a new mime instance (no decoding will take place):
var mime = new MIME.Message(buffer);
// Decoding will take place when the following getter properties are accessed.
// These getter properties may throw an exception for malformed MIME data.
mime.headers; // { 'received': [<Buffer>] }
mime.body; // <Buffer>
mime.from; // [ { name: <String>, email: <String> } ]
mime.sender; // { name: <String>, email: <String> } / undefined
mime.replyTo; // [ { name: <String>, email: <String> } ]
mime.to; // [ { name: <String>, email: <String> } ]
mime.cc; // [ { name: <String>, email: <String> } ]
mime.bcc; // [ { name: <String>, email: <String> } ]
mime.messageID; // <String> / undefined
mime.references; // [ <String>, <String> ]
mime.inReplyTo; // [ <String>, <String> ]
mime.date; // <Unix Timestamp Integer>
mime.subject; // <String>
mime.contentDisposition; // { value: <String>, parameters: {} }
mime.contentType; // { value: <String>, parameters: {} }
mime.contentID; // <String> / undefined
mime.filename; // <String> / undefined
mime.parts; // [ <MIME.Message>, <MIME.Message> ]
To run all included tests and fuzz tests:
node test.js