Skip to content

Commit

Permalink
publicsuffix: hold icann-ness until wildcards fully match
Browse files Browse the repository at this point in the history
Consider the "uberspace.de" domain. The relevant rules in the Public
Suffix List are "*.uberspace.de" (in the PRVIATE DOMAIN section) and
"de" (in the ICANN DOMAIN section).

The PublicSuffix function returns a string and a bool. Both before and
after this commit, the string returned is "de", which is correct
according to a literal interpretation of the formal algorithm. But the
bool returned, icann-ness, is false before and true after. The correct
answer is true, since the matching rule, "de", is in the ICANN DOMAIN
section of the PSL.

Before this commit, the two-stage match for "*.uberspace" set the icann
bit when matching the back part, "uberspace", before checking that the
front part, the "*" wildcard, also matched.

A couple more examples, for the "bd" and "ck" domains. The relevant
rules are "*.bd" and "*.ck", with no non-wildcard "bd" or "ck" rule.
Before this commit, the PublicSuffix function would return (icann ==
true), when the correct result is (icann == false), the same as for
"nosuchtld".

Benchmarks get worse, but correctness trumps performance. Future commits
may be able to recover some of the loss. In any case, in absolute terms,
15µs is still pretty fast.

name             old time/op  new time/op  delta
PublicSuffix-56  11.0µs ± 0%  14.8µs ± 2%  +34.57%  (p=0.000 n=9+10)

Change-Id: I85ca6ab57a31308af5a29c46313197897eab5ab6
Reviewed-on: https://go-review.googlesource.com/c/154977
Reviewed-by: Nigel Tao <[email protected]>
  • Loading branch information
nigeltao committed Dec 20, 2018
1 parent 45fec1d commit 927f977
Show file tree
Hide file tree
Showing 4 changed files with 191 additions and 96 deletions.
8 changes: 6 additions & 2 deletions publicsuffix/gen.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ var (
labelsList = []string{}
labelsMap = map[string]bool{}
rules = []string{}
numICANNRules = 0

// validSuffixRE is used to check that the entries in the public suffix
// list are in canonical form (after Punycode encoding). Specifically,
Expand Down Expand Up @@ -167,11 +168,14 @@ func main1() error {
}
s = strings.TrimSpace(s)
if strings.Contains(s, "BEGIN ICANN DOMAINS") {
if len(rules) != 0 {
return fmt.Errorf(`expected no rules before "BEGIN ICANN DOMAINS"`)
}
icann = true
continue
}
if strings.Contains(s, "END ICANN DOMAINS") {
icann = false
icann, numICANNRules = false, len(rules)
continue
}
if s == "" || strings.HasPrefix(s, "//") {
Expand Down Expand Up @@ -287,7 +291,7 @@ func gitCommit() (sha, date string, retErr error) {

func printTest(w io.Writer, n *node) error {
fmt.Fprintf(w, "// generated by go run gen.go; DO NOT EDIT\n\n")
fmt.Fprintf(w, "package publicsuffix\n\nvar rules = [...]string{\n")
fmt.Fprintf(w, "package publicsuffix\n\nconst numICANNRules = %d\n\nvar rules = [...]string{\n", numICANNRules)
for _, rule := range rules {
fmt.Fprintf(w, "%q,\n", rule)
}
Expand Down
8 changes: 6 additions & 2 deletions publicsuffix/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,12 @@ func (list) String() string {
// https://wiki.mozilla.org/Public_Suffix_List/Use_Cases
func PublicSuffix(domain string) (publicSuffix string, icann bool) {
lo, hi := uint32(0), uint32(numTLD)
s, suffix, wildcard := domain, len(domain), false
s, suffix, icannNode, wildcard := domain, len(domain), false, false
loop:
for {
dot := strings.LastIndex(s, ".")
if wildcard {
icann = icannNode
suffix = 1 + dot
}
if lo == hi {
Expand All @@ -100,7 +101,7 @@ loop:
}

u := nodes[f] >> (nodesBitsTextOffset + nodesBitsTextLength)
icann = u&(1<<nodesBitsICANN-1) != 0
icannNode = u&(1<<nodesBitsICANN-1) != 0
u >>= nodesBitsICANN
u = children[u&(1<<nodesBitsChildren-1)]
lo = u & (1<<childrenBitsLo - 1)
Expand All @@ -116,6 +117,9 @@ loop:
}
u >>= childrenBitsNodeType
wildcard = u&(1<<childrenBitsWildcard-1) != 0
if !wildcard {
icann = icannNode
}

if dot == -1 {
break
Expand Down
Loading

0 comments on commit 927f977

Please sign in to comment.