Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scanner integration] Support landing page redirects on the same domain #494

Closed
eloquence opened this issue May 25, 2018 · 4 comments
Closed
Assignees

Comments

@eloquence
Copy link
Member

eloquence commented May 25, 2018

Some news organizations prefer to advertise URLs like https://nytimes.com/tips, which they'd also like us to use for their directory entries, and which redirect to the ultimate landing page destination. To avoid false scan results, the scanner should:

  • follow 301/302 redirects on the same domain
  • store the redirect destination in the scan result

This is part of epic #488 as non-200 statuses may otherwise trigger the delisting of a SecureDrop instance (#493).

@harrislapiroff
Copy link
Contributor

What's the desired behavior if it redirects to a different domain? Should we count redirects to subdomains as the same domain?

@eloquence
Copy link
Member Author

What's the desired behavior if it redirects to a different domain?

IMO this should be a hard failure, i.e. in the same category as, say, the site not being on HTTPS. We should not include such redirects in the directory.

Should we count redirects to subdomains as the same domain?

I think so. Of course we would penalize the use of subdomains as usual (#497).

@chigby
Copy link
Contributor

chigby commented Sep 18, 2018

I'm thinking that a good solution for this is:

  1. The landing_page_url field remains as it is now. If the NYT wants to set this to https://nytimes.com/tips then that is fine.

  2. We add a field to the Scan Result called, say, redirection_target.

  3. When scanning a directory entry, we will follow redirects and save the URL from the final destination as the redirection target.

  4. Use pshtt to scan the domain for the redirection target.

  5. Replace the http_no_redirects scan result field with something like no_cross_domain_redirects which would be set to False if any of the intermediate redirections are not on the same top-level domain as the landing_page_url. This would also trigger a delisting.

Can anyone think of something I might be missing here?

@chigby chigby self-assigned this Sep 18, 2018
@chigby
Copy link
Contributor

chigby commented Oct 15, 2018

This issue is resolved by #549.

@chigby chigby closed this as completed Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants