Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Delete multiple default_off rulesets (^Z) #10516

Closed
wants to merge 1 commit into from
Closed

Delete multiple default_off rulesets (^Z) #10516

wants to merge 1 commit into from

Conversation

cschanaj
Copy link
Collaborator

@cschanaj cschanaj commented Jun 20, 2017

#9906 all target not working over HTTPS.

I would like to delete more rulesets with a single yet relatively small PR. thanks!

Remark:

  • No wildcard target
  • Not more than 2 target
  • No linked rulesets, also no mismatches, problematic etc in filename.
  • all target not working over HTTPS
  • Non alexa-top-1m

Source code: https://github.com/cschanaj/covfefe

@J0WI
Copy link
Contributor

J0WI commented Jun 20, 2017

I'm not feeling comfortable with just deleting default_off rules. Especially for cert related issues a user could have different concerns than a browser trusted CA. #9906 (comment)

I have no doubts for server related issues, like refused or redirect.

@jeremyn
Copy link
Contributor

jeremyn commented Jun 21, 2017

I've already merged a lot of these deletion pull requests, for what it's worth. If the ruleset is bad then there's no point in keeping it around.

For your point about cert errors, 99.9% of the time an invalid certificate for any reason will be understood as broken/buggy by the user. For that 0.1% case where the user has a special relationship to the site and is willing to trust their self-signed certificate or whatever, they can make a custom rule.

I do think that a ruleset should be default_off for some amount of time before it's deleted. In #10289 (comment) I wrote

I don't know if there's a "policy", but my own opinion is that rulesets are sometimes marked default_off instead of deleted because we want to give the site owner some time to fix their site before we just delete the work that a ruleset represents.

How do we know when a ruleset should be deleted after being default_off? If the site is still broken the next time someone, like you, happens to look at the ruleset, then it's probably ready for deletion.

@cschanaj
Copy link
Collaborator Author

@J0WI In addition to the reasons @jeremyn pointed out, the rulesets I delete are trivial ones which only rewrite ^ and www. They can easily be reproduced anytime using make-trivial-rule. I see little values keeping such default_off rulesets.

@J0WI
Copy link
Contributor

J0WI commented Jun 21, 2017

@cschanaj would it be to much overhead to differ between rules that have been turned off and rules that have been created with the default_off flag?

@cschanaj
Copy link
Collaborator Author

@J0WI I do not know which rules are created with default_off attribute, so I do not distinguish them currently. If a list of such ruleset is given, I can take them away from the list of candidate pending removal.

@J0WI
Copy link
Contributor

J0WI commented Jun 21, 2017

I fear git log/git blame is the only way to do this.

@cschanaj
Copy link
Collaborator Author

cschanaj commented Jun 21, 2017

git log and git blame take a long time to generate output, I would like to avoid the huge overhead.

@cschanaj
Copy link
Collaborator Author

@J0WI a relatively quick way to check default_off ruleset can be

$ git log --oneline -- Example.com.xml | head -n 2 | wc -l

if the output is 1 then the ruleset is created with default_off, but I am afraid this will give a huge amount of false results.

@jeremyn
Copy link
Contributor

jeremyn commented Jun 21, 2017

I'm still not sure why we would want to keep a default_off ruleset around, other than the case of giving the site owner time to fix the problem. I'm even less sure why we would create a ruleset with default_off.

@J0WI
Copy link
Contributor

J0WI commented Jun 22, 2017

e.g. because of default_off="Only avaiable in China"?

@jeremyn
Copy link
Contributor

jeremyn commented Jun 22, 2017

@J0WI I agree with that specific example. Looking at the list in #9906 (comment):

regional // let's use this temporarily
refused
timeout
ssl-error
cert-invalid // any other certificate problem
cors // CORS issues
legacy // failed ruleset test
at-website-admin-request
other // details should be stated in the ruleset comment

regional/only available in China is the only one I really agree with keeping around forever, and that should be a platform anyway.

@cschanaj
Copy link
Collaborator Author

Noted that Baidu_CN.xml is only one Only available in China ruleset now,

I believe the number of rulesets created with default_off is relatively small as compared to all default_off ruleset. The effect of deleting such rulesets, if any, should be minimal. Besides, I only submit mass delete PRs for rulesets with 2 or less non-wildcard target.

It should be totally fine in term of the level of protection we provide.

@jeremyn
Copy link
Contributor

jeremyn commented Jun 27, 2017

Given the discussion here and my discovery of the full-fetch-test tool that can reactivate default_off rulesets, I'm less comfortable deleting rulesets than I was before. I would like some clarification from @Hainish about when this is okay to do.

There have been several pages of new deletion PRs and it would be great to get clarification before we get too many more.

@cschanaj
Copy link
Collaborator Author

cschanaj commented Jun 27, 2017

AFAIK, the script can only re-activate default_off='failed ruleset test' , see test.sh#L4. @Hainish can you clarify the behavior of the script?

Noted that a number of deletion PRs come from #10595 where the candidate list is generated such that

  • keyword Disabled by https-everywhere-checker appear exactly once
  • Number of occurrence of keyword Could not resolve host: == number of non-wildcard target
  • Unlike Delete multiple default_off rulesets (^Z) #10516 (this PR), a secondary check with curl is NOT performed

as described in #10595 (comment)

For reference; #10378

@Bisaloo
Copy link
Collaborator

Bisaloo commented Jun 28, 2017

I am interested in this conversation. For your information, here is how I have been picking rules to remove until now:

  • Gone from DNS
  • Domain for sale
  • Website moved to a new domain (the domain listed in the rule only redirects to this new domain) and the certificate is invalid. I believed in this case, it is not likely that webmasters go through the process of renewing/creating new certificates for domain that only exist for legacy purposes.

I was also pondering the possibility of removing rulesets that have been failing for the exact same reason in the two https-everywhere-checker tests as this demonstrates a lack of intent from webmasters to fix it.

I think rules outside those cases should not be removed but simply default_off.

@cschanaj
Copy link
Collaborator Author

For the reference, there is around 230 rules failed for the exact same reasons in the comprehensive fetch test, see disabled-twice-same-reason-list.txt.

@J0WI
Copy link
Contributor

J0WI commented Jun 29, 2017

Linking this with #2088

@cschanaj
Copy link
Collaborator Author

cschanaj commented Jul 6, 2017

ping @Hainish

@Hainish
Copy link
Member

Hainish commented Jul 6, 2017

When the full fetch test tool is run on a regular / continual basis, it will be easier to follow a standardized rule for deleting rulesets that fail subsequent runs. If we decide to run it every week, for instance, two fails for the same reason would be a hasty criteria for deletion.

Deleting rulesets which fail the fetch test over an extended period of time (say 6 months) makes a lot of sense to me. I think 6 months would also be a good default for how often we run the full fetch test, so streamlining the deletion of rules with the periodic runs of the tool strikes me as reasonable, especially if we have a fetch whitelist (#10227) in place.

If this seems like a reasonable course of action, we should not only implement it but document it in CONTRIBUTING.md.

@Hainish
Copy link
Member

Hainish commented Jul 6, 2017

Note that in the above comment, when I say "run the full fetch test" I mean follow the "3-colocation" methodology I outline in #10538 (comment), so we don't get overaggressive disabling rulesets..

@jeremyn
Copy link
Contributor

jeremyn commented Jul 6, 2017

It would be nice if full-fetch-test stored the default_off enabling date somewhere in the ruleset with a timestamp, perhaps in a special tag just for that purpose. It could then check the date when deciding when to delete the ruleset entirely.

@Hainish What do you want to do with the many recent open PRs that delete rulesets? Do you want to say that only full-fetch-test should delete rules other than in really exceptional cases, which case we'll close most/all of these PRs?

@cschanaj
Copy link
Collaborator Author

cschanaj commented Jul 7, 2017

@Hainish I agree that ruleset fail the fetch test for the exact same reason can be removed safely. Since we have the new whitelist since #10900, I think that we can run the full fetch test quite often. Some ruleset got disabled because some of its target hosts are gone from DNS, removing such ruleset because they failed multiple fetch test session could be problematic.

As mentioned in #10538 (comment), for default_offrulesets, the fetch test only check and re-activate default_off='failed ruleset test'. Is there any guideline for removing disabled ruleset failed for other reasons, e.g. mismatched, no longer supports SSL (this is an actual one)?

Related #9906.

@Hainish
Copy link
Member

Hainish commented Jul 8, 2017

@jeremyn I think there is some value in manually reviewing the deletion of rules, but it may not be worth the time if we're going to auto-disable rulesets. Does human overview give enough added value, in your experience, to warrant continued close inspection? If not, we shouldn't continue doing it. I haven't been watching the process closely enough to have insight here.

One thing we may want to do is after every run of the fetch tester, create a list of the rulesets that would be deleted due to consecutive disables with the same reason, and then manually review if they actually should be deleted. Instead of auto-deleting these rulesets, if there is a failure with just one host, for example, this would give us a chance to fix it. This would allay some of @cschanaj's concerns.

@Hainish
Copy link
Member

Hainish commented Jul 8, 2017

@cschanaj no, the full-fetch-test was intended to be self-correcting upon subsequent runs, rather than corrective of rulesets that had been manually disabled, so as not to interfere with the human overview process. If needed, we can either create a new default_off value that indicates that the full-fetch-test should re-enable this ruleset before checking it again. But if this is manually added, I would rather it be different than failed ruleset test - maybe instead we can add the value ruleset test must recheck.

@jeremyn
Copy link
Contributor

jeremyn commented Jul 9, 2017

@Hainish I've made the very occasional comment on deletion PRs. I don't think human review is critical for the simple cases.

@Hainish
Copy link
Member

Hainish commented Jul 25, 2017

@jeremyn in that case we can probably safely close the deletion PRs and leave that to the full-fetch-test upon subsequent runs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants