-
-
Notifications
You must be signed in to change notification settings - Fork 271
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
33 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,28 @@ | |
|
||
/** | ||
* Defines a set of restriction flags for email address validation. To remain completely true to RFC 2822, all flags should be set to <code>true</code>. | ||
* <p> | ||
* There are a few basic use cases: | ||
* <ol> | ||
* <li> | ||
* User wants to scrape as much data from a possibly-ugly address as they can and make a sensible address from it; these users typically allow all | ||
* kinds of addresses (except perhaps for single-domain addresses) because in the wild, legitimate senders often violate 2822. E.g. If your goal is to | ||
* parse spammy emails for analysis, you may want to allow every variation out there just so you can parse something useful. | ||
* </li> | ||
* <li> | ||
* User wants to check to see if an email address is of proper, normal syntax; e.g. checking the value entered in a form. These users typically make | ||
* everything strict, since what most people consider a "valid" email address is a drastic subset of 2822. For users with the strictest requirements, | ||
* this library may not be enough, since although it checks most of RFC 2822, it might still be too 'tolerant' for their needs (on the other side of | ||
* the spectrum, most libraries use a simple [email protected] type regex, which as we of course know is | ||
* <a href="http://www.troyhunt.com/2013/11/dont-trust-net-web-forms-email-regex.html">rarely a good idea</a>) | ||
* </li> | ||
* <li> | ||
* User wants to intelligently parse a possibly-ugly address with the goal being a cleaned up usable address that other software | ||
* (MTAs, databases, whatever) can use / parse without breaking; {@link #DEFAULT} tailors to this use case (with the possible exception of | ||
* {@link #ALLOW_DOT_IN_A_TEXT}, to taste). In our experience they allowed "real" addresses the highest percentage of the time, and the addresses they | ||
* failed on were almost all ridiculous. | ||
* </li> | ||
* </ol> | ||
* | ||
* @author Benny Bottema | ||
*/ | ||
|
@@ -20,6 +42,7 @@ public enum EmailAddressCriteria { | |
* ("example.com"), then don't include this critera. | ||
*/ | ||
ALLOW_DOMAIN_LITERALS, | ||
|
||
/** | ||
* This criteria states that as per RFC 2822, quoted identifiers are allowed (using quotes and angle brackets around the raw address), e.g.: | ||
* <p> | ||
|
@@ -29,6 +52,7 @@ public enum EmailAddressCriteria { | |
* (<tt>[email protected]</tt> - no quotes or angle brackets), then don't include this criteria. | ||
*/ | ||
ALLOW_QUOTED_IDENTIFIERS, | ||
|
||
/** | ||
* This criteria allows "." to appear in atext (note: only atext which appears in the 2822 "name-addr" part of the address, not the | ||
* other instances) | ||
|
@@ -42,6 +66,7 @@ public enum EmailAddressCriteria { | |
* quotes. | ||
*/ | ||
ALLOW_DOT_IN_A_TEXT, | ||
|
||
/** | ||
* This criteria allows "[" or "]" to appear in atext. Not very useful, maybe, but there it is. | ||
* <p> | ||
|
@@ -58,12 +83,12 @@ public enum EmailAddressCriteria { | |
* you. | ||
*/ | ||
ALLOW_SQUARE_BRACKETS_IN_A_TEXT, | ||
|
||
/** | ||
* This criteria allows as per RFC 2822 ")" or "(" to appear in quoted versions of the localpart (they are never allowed in unquoted | ||
* versions) | ||
* <p> | ||
* You can disallow it, but better to include this criteria. I left this hanging around (from an earlier incarnation of the code) as a random option you | ||
* can | ||
* You can disallow it, but better to include this criteria. I left this hanging around (from an earlier incarnation of the code) as a random option you can | ||
* switch off. No, it's not necssarily useful. Long story. | ||
* <p> | ||
* If this criteria is not included, it will prevent such addresses from being valid, even though they are: "bob(hi)smith"@test.com | ||
|
@@ -72,15 +97,15 @@ public enum EmailAddressCriteria { | |
|
||
/** | ||
* The default setting is not strictly 2822 compliant. For example, it does not include the {@link #ALLOW_DOMAIN_LITERALS} criteria, which results in | ||
* exclusions on single domains. | ||
* exclusions on single domains. Useful for cleaning up email strings that other middleware (ie. the next server) will be able to understand. | ||
* <p> | ||
* Included in the defaults are: <ul> <li>{@link #ALLOW_QUOTED_IDENTIFIERS}</li> <li>{@link #ALLOW_PARENS_IN_LOCALPART}</li> </ul> | ||
* Included in the defaults are: <ul> <li>{@link #ALLOW_QUOTED_IDENTIFIERS}</li> <li>{@link #ALLOW_PARENS_IN_LOCALPART}</li> </ul>. | ||
*/ | ||
public static final EnumSet<EmailAddressCriteria> DEFAULT = of(ALLOW_DOMAIN_LITERALS); | ||
public static final EnumSet<EmailAddressCriteria> DEFAULT = of(ALLOW_QUOTED_IDENTIFIERS, ALLOW_PARENS_IN_LOCALPART); | ||
|
||
/** | ||
* Criteria which is most RFC 2822 compliant and allows all compliant address forms, including the more exotic ones. | ||
* Criteria which is most RFC 2822 compliant and allows all compliant address forms, including the more exotic ones. Most useful for validating the broadest | ||
* range of email address that should be allowed within the boundaries of RFC compliancy. | ||
*/ | ||
public static final EnumSet<EmailAddressCriteria> RFC_COMPLIANT = of(ALLOW_DOMAIN_LITERALS, ALLOW_QUOTED_IDENTIFIERS, ALLOW_DOT_IN_A_TEXT, | ||
ALLOW_SQUARE_BRACKETS_IN_A_TEXT, ALLOW_PARENS_IN_LOCALPART); | ||
public static final EnumSet<EmailAddressCriteria> RFC_COMPLIANT = EnumSet.allOf(EmailAddressCriteria.class); | ||
} |