Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Unnecessary overlapping targets in rulesets #12322

Closed
RReverser opened this issue Sep 1, 2017 · 9 comments
Closed

Unnecessary overlapping targets in rulesets #12322

RReverser opened this issue Sep 1, 2017 · 9 comments

Comments

@RReverser
Copy link
Contributor

RReverser commented Sep 1, 2017

#12320 helped to find a list of rulesets where targets overlap, and so unnecessarily duplicate each other. Reducing number of those might help with #12232:

  • 16163.com.xml: *.16163.com also covers www.16163.com, app.16163.com, bbs.16163.com, m.16163.com
  • 3min.xml: *.3min.de also covers www.3min.de
  • Acenet.xml: *.acenet-inc.net also covers *.esupport.acenet-inc.net
  • Advertising.com.xml: *.ace.advertising.com also covers secure.ace.advertising.com
  • African-Network-Information-Center.xml: *.afrinic.net also covers *.meeting.afrinic.net
  • Airtricity.xml: *.airtricity.com also covers www.airtricity.com
  • AliceDSL.xml: *.alice-dsl.de also covers www.alice-dsl.de
  • American-University.xml: *.american.edu also covers *.wcl.american.edu
  • American_Society_of_Media_Photographers.xml: *.asmp.org also covers www.admin.asmp.org
  • Argonne-National-Laboratory.xml: *.anl.gov also covers www.*.anl.gov
  • Argos.xml: *.argos.co.uk also covers image.email.argos.co.uk
  • Audiko.xml: *.audiko.net also covers css.cdn.audiko.net, jpg.st.audiko.net
  • Bauhaus-University_Weimar.xml: *.uni-weimar.de also covers *.webmail.uni-weimar.de
  • Bennetts.xml: *.bennetts.co.uk also covers *.quotes.bennetts.co.uk
  • BreNet.xml: *.brenet.de also covers *.webmail.brenet.de
  • Caller.com.xml: *.caller.com also covers www.caller.com, login.caller.com
  • Catalog-of-Domestic-Federal-Assistance.xml: *.cfda.gov also covers *.www.cfda.gov
  • Champs_Sports.xml: *.champssports.com also covers *.e.champssports.com, *.www.champssports.com
  • City_University_London.xml: *.city.ac.uk also covers www.soi.city.ac.uk, *.www.city.ac.uk
  • Claranet.xml: *.clara.net also covers webmail.bln.de.clara.net, nswebmail.uk.clara.net, portal.uk.clara.net
  • Cloudhexa_Network.xml: *.cloudhexa.com also covers *.www.cloudhexa.com
  • Columbia_University-problematic.xml: *.hr.columbia.edu also covers *.managers.hr.columbia.edu
  • Compendium.xml: *.compendiumblog.com also covers cdn2.content.compendiumblog.com, cdn.content.compendiumblog.com, global.content.compendiumblog.com
  • Council_on_Foreign_Relations.xml: *.cfr.org also covers secure.www.cfr.org
  • Cox_Communications.xml: *.cox.com also covers *.store.cox.com
  • Cox_Communications.xml: *.cox.net also covers idm.east.cox.net
  • Crain-Communications.xml: *.adage.com also covers www.amiga.adage.com
  • Dailymotion.xml: *.dailymotion.com also covers ak2.static.dailymotion.com
  • Digia.xml: *.digia.com also covers blog.qt.digia.com
  • Digital_Photography_Review.xml: *.img-dpreview.com also covers *.static.img-dpreview.com
  • Digitaria.xml: *.insightmgr.com also covers *.digi.insightmgr.com, *.www.insightmgr.com
  • DotCOM-host.xml: *.dotcomhost.com also covers *.webmail.dotcomhost.com
  • dpfile.com.xml: *.dpfile.com also covers qcloud.dpfile.com, www.dpfile.com
  • Eastbay.xml: *.eastbay.com also covers *.teamsales.eastbay.com, *.www.eastbay.com
  • Economic-Policy-Institute.xml: *.epi.org also covers *.secure.epi.org
  • Electronic-Arts.xml: *.thesims3.com also covers *.store.thesims3.com
  • Epson.xml: *.epson.com also covers global.latin.epson.com
  • Eyeviewads.com.xml: *.eyeviewads.com also covers track.eyeviewads.com
  • Familie-redlich.xml: *.familie-redlich.de also covers www.systeme.familie-redlich.de
  • Federal_Business_Opportunities.xml: *.fbo.gov also covers *.www.fbo.gov
  • Film-Threat.xml: *.filmthreat.com also covers *.www.filmthreat.com
  • Final-Score.xml: *.final-score.com also covers *.e.final-score.com, *.www.final-score.com
  • Footaction_USA.xml: *.footaction.com also covers *.www.footaction.com
  • Freelancer.xml: *.freelancer.com also covers *.www.freelancer.com
  • Freelancer.xml: *.freelancer.co.uk also covers *.www.freelancer.co.uk
  • FreeWheel.xml: *.fwmrm.net also covers 2912a.v.fwmrm.net
  • FutureQuest.net.xml: *.futurequest.net also covers www.service.futurequest.net
  • GaiaOnline.xml: *.gaiaonline.com also covers *.cdn.gaiaonline.com
  • Game-Show-Network.xml: *.gsn.com also covers www.tv.gsn.com
  • garmin.xml: *.garmin.com also covers www.garmin.com
  • Gigaserver.xml: *.gigaserver.cz also covers *.www.gigaserver.cz
  • Gizmodo.com.xml: *.gizmodo.com also covers 20khz.gizmodo.com, es.gizmodo.com, factually.gizmodo.com, fieldguide.gizmodo.com, homeofthefuture.gizmodo.com, indefinitelywild.gizmodo.com, io9.gizmodo.com, lego.gizmodo.com, offworld.gizmodo.com, paleofuture.gizmodo.com, reframe.gizmodo.com, space.gizmodo.com, sploid.gizmodo.com, throb.gizmodo.com, toyland.gizmodo.com, us.gizmodo.com, www.gizmodo.com, cache.gizmodo.com
  • GoogleImages.xml: google.* also covers google.com
  • GoogleImages.xml: images.google.* also covers images.google.com
  • GoogleServices_Complex.xml: *.googleusercontent.com also covers *.corp.googleusercontent.com
  • GoogleVideos.xml: google.* also covers google.com
  • GoStats.xml: *.gostats.com also covers www.ssl.gostats.com
  • Griffith-University.xml: *.griffith.edu.au also covers *.secure.griffith.edu.au
  • HitBTC.com.xml: *.hitbtc.com also covers affiliate.hitbtc.com, auth.hitbtc.com, blog.hitbtc.com, demo.hitbtc.com, forum.hitbtc.com, www.hitbtc.com
  • Home.pl.xml: *.home.pl also covers *.akcje.home.pl, *.panel.home.pl, *.poczta.home.pl, *.m.poczta.home.pl
  • Home.pl.xml: *.poczta.home.pl also covers *.m.poczta.home.pl
  • Honest.com.xml: *.honest.com also covers blog.honest.com, img.honest.com, www.honest.com
  • Hot_Pics_Amateur.xml: *.hotpics-amateur.com also covers www.collection.hotpics-amateur.com
  • ImageShack.xml: *.imageshack.us also covers www.imageshack.us, imagizer.imageshack.us, post.imageshack.us, a.imageshack.us
  • kantonalbanken.xml: *.bkb.ch also covers integration.quotes.bkb.ch
  • Kids_Foot_Locker.xml: *.kidsfootlocker.com also covers *.www.kidsfootlocker.com
  • KIXEYE.xml: *.kixeye.com also covers *.cdn.kixeye.com
  • Lady_Foot_Locker.xml: *.ladyfootlocker.com also covers *.www.ladyfootlocker.com
  • LastPass.com.xml: *.lastpass.com also covers www.lastpass.com, 0.lastpass.com, account.lastpass.com, accounts.lastpass.com, blog.lastpass.com, download.lastpass.com, enterprise.lastpass.com, forums.lastpass.com, helpdesk.lastpass.com, localvault.lastpass.com, m.lastpass.com, manda.lastpass.com, pollserver.lastpass.com, portable.lastpass.com, rodan.lastpass.com, service.lastpass.com, teams.lastpass.com, uber.lastpass.com, vaul.lastpass.com
  • Linux-New-Media.xml: *.linuxnewmedia.com also covers *.shop.linuxnewmedia.com
  • Linux-New-Media.xml: *.linuxnewmedia.de also covers *.shop.linuxnewmedia.de
  • Liquid-Web.xml: *.liquidweb.com also covers media.cdn.liquidweb.com
  • London-2012.xml: *.london2012.com also covers www.festival.london2012.com, tickets.london2012.com, www.tickets.london2012.com
  • Loopia.xml: *.loopia.se also covers *.www.loopia.se
  • Lumosity.xml: *.lumosity.com also covers static.sl.lumosity.com
  • Mail.ru.xml: *.foto.mail.ru also covers avt.foto.mail.ru, content.foto.mail.ru
  • Mail.ru.xml: *.my.mail.ru also covers content.foto.my.mail.ru, stat.my.mail.ru, videoapi.my.mail.ru
  • Maricopa-Community-Colleges.xml: *.maricopa.edu also covers *.sis.maricopa.edu
  • Maxymiser.xml: *.maxymiser.com also covers *.www.maxymiser.com
  • MediaFire.com.xml: *.mediafire.com also covers staticcdn.mediafire.com, www.mediafire.com, www1.mediafire.com, www2.mediafire.com, cdn.mediafire.com, cdnssl.mediafire.com, m.mediafire.com
  • MeiTuan.com.xml: *.meituan.com also covers www.meituan.com, analytics.meituan.com, b.meituan.com, daili.meituan.com, hotel.meituan.com, mos.meituan.com, p0.meituan.com, p1.meituan.com, passport.meituan.com, report.meituan.com, s0.meituan.com, s1.meituan.com, waimaie.meituan.com, i.meituan.com, waimai.meituan.com, kaidian.waimai.meituan.com
  • Mentor-Graphics.xml: *.mentor.com also covers *.store1.mentor.com
  • MIC-Gadget.xml: *.micgadget.com also covers *.store.micgadget.com
  • MSN-mismatches.xml: *.msnbc.msn.com also covers *.today.msnbc.msn.com
  • mytalkdesk.com.xml: *.mytalkdesk.com also covers www.mytalkdesk.com
  • National_Park_Service.xml: *.nature.nps.gov also covers science.nature.nps.gov
  • NAVTEQ.xml: *.navteq.com also covers css.mapreporter.navteq.com
  • NetMediaEurope.xml: *.itespresso.fr also covers quiz.itespresso.fr
  • Oberlin_College.xml: *.oberlin.edu also covers *.cs.oberlin.edu, oncampus.csr.oberlin.edu
  • ONEsite.xml: *.onesite.com also covers *.admin.onesite.com
  • OnSugar.xml: *.onsugar.com also covers secure.*.onsugar.com, www.*.onsugar.com
  • Pair-Networks.xml: *.pair.com also covers *.webmail.pair.com
  • Polytechnic-University-of-Catalonia.xml: *.upc.edu also covers *.blog.upc.edu
  • Poppy-Sports.xml: *.poppysports.com also covers *.www.poppysports.com
  • Prxy.com.xml: *.prxy.com also covers www.prxy.com
  • Radboud-University-Nijmegen.xml: *.ru.nl also covers *.cmbi.ru.nl, *.hosting.ru.nl, *.portalhelp.hosting.ru.nl
  • Radboud-University-Nijmegen.xml: *.hosting.ru.nl also covers *.portalhelp.hosting.ru.nl
  • Royal_Mail.xml: *.royalmail.com also covers *.shop.royalmail.com
  • RTEMS.xml: *.rtems.org also covers devel.rtems.org, docs.rtems.org, git.rtems.org, lists.rtems.org, wiki.rtems.org, www.rtems.org
  • Secunet.xml: *.secunet.com also covers www.secunet.com
  • SexNarod.xml: *.superforum.org also covers *.dating.superforum.org
  • SexNarod.xml: *.sxnarod.com also covers wap.dating.sxnarod.com
  • Sheet-Music-Plus.xml: *.sheetmusicplus.com also covers ssl.assets.sheetmusicplus.com
  • Skrill.com.xml: *.skrill.com also covers account.skrill.com, help.skrill.com, sso.skrill.com, www.skrill.com
  • Snagajob.xml: *.snagajob.com also covers www.snagajob.com
  • Spoki.xml: *.spoki.lv also covers *.www.spoki.lv
  • StumbleUpon.xml: *.stumbleupon.com also covers *.b9.stumbleupon.com
  • Symantec.xml: *.symanteccloud.com also covers buy.symanteccloud.com, static.symanteccloud.com, static1.symanteccloud.com, static2.symanteccloud.com, static3.symanteccloud.com
  • Target_Performance.xml: *.ad-srv.net also covers *.ad.ad-srv.net
  • Telefonica.xml: *.o2.cz also covers *.www.o2.cz
  • Textbooks.com.xml: *.textbooks.com also covers *.www.textbooks.com
  • The-Escapist-Expo.xml: *.escapistexpo.com also covers www.sec.escapistexpo.com
  • UBS.xml: *.ubs.com also covers *.ibb.ubs.com
  • UCSD.edu.xml: *.ucsd.edu also covers a4.ucsd.edu, acs-webmail.ucsd.edu, altng.ucsd.edu, aventeur.ucsd.edu, cinfo.ucsd.edu, facilities.ucsd.edu, gradapply.ucsd.edu, graduateapp.ucsd.edu, jacobsstudent.ucsd.edu, myucsdchart.ucsd.edu, sdacs.ucsd.edu, shs.ucsd.edu, ted.ucsd.edu, ucsdbkst.ucsd.edu, a.ucsd.edu, acms.ucsd.edu, bookstore.ucsd.edu, www.bookstore.ucsd.edu, cs.ucsd.edu, www.cs.ucsd.edu, cse.ucsd.edu, www.cse.ucsd.edu, ece.ucsd.edu, www.ece.ucsd.edu, hdh.ucsd.edu, www.hdh.ucsd.edu, hds.ucsd.edu, www.hds.ucsd.edu, maeweb.ucsd.edu, nanoengineering.ucsd.edu, www.nanoengineering.ucsd.edu, ne-web.ucsd.edu, ne.ucsd.edu, neweb.ucsd.edu, roger.ucsd.edu, se.ucsd.edu, structures.ucsd.edu, www.structures.ucsd.edu, uxt.ucsd.edu, www-cs.ucsd.edu, www-cse.ucsd.edu, www-ne.ucsd.edu, www-structures.ucsd.edu, act.ucsd.edu, health.ucsd.edu, libraries.ucsd.edu, studenthealth.ucsd.edu, www-act.ucsd.edu, accesslink.ucsd.edu, acs.ucsd.edu, cri.ucsd.edu, desktop.ucsd.edu, financiallink.ucsd.edu, iwdc.ucsd.edu, marketplace.ucsd.edu, mytritonlink.ucsd.edu, www.mytritonlink.ucsd.edu, resnet.ucsd.edu, software.ucsd.edu, sysstaff.ucsd.edu, tritonlink.ucsd.edu, www.tritonlink.ucsd.edu, uclearning.ucsd.edu, webmail.ucsd.edu, www-acs.ucsd.edu
  • United-States-Department-of-Energy.xml: *.doe.gov also covers www.*.doe.gov
  • UniversalSubtitles.xml: *.universalsubtitles.org also covers s3.www.universalsubtitles.org
  • University-of-Alaska.xml: *.alaska.edu also covers www.*.alaska.edu, biotech.inbre.alaska.edu, lib.uaa.alaska.edu, *.vpn.alaska.edu
  • University-of-Bern.xml: *.unibe.ch also covers www.*.unibe.ch
  • University-of-Delaware.xml: *.udel.edu also covers *.facilities.udel.edu, *.nss.udel.edu
  • University-of-Groningen.xml: *.rug.nl also covers www.astro.rug.nl
  • University-of-Idaho.xml: *.uidaho.edu also covers www.*.uidaho.edu, www2.sites.uidaho.edu
  • University-of-Massachusetts-Amherst.xml: *.umass.edu also covers *.oit.umass.edu, *.spire.umass.edu, *.umii.umass.edu
  • University-of-South-Florida.xml: *.usf.edu also covers *.stpete.usf.edu
  • University-of-Southampton.xml: *.soton.ac.uk also covers www.*.soton.ac.uk
  • University_of_Houston.xml: *.uh.edu also covers www.*.uh.edu, fp.my.uh.edu, *.nsm.uh.edu
  • University_of_Maine.xml: *.umaine.edu also covers www.*.umaine.edu
  • University_of_Salford.xml: *.salford.ac.uk also covers *.www.salford.ac.uk
  • University_of_Waikato.xml: *.waikato.ac.nz also covers tools.its.waikato.ac.nz, www.mngt.waikato.ac.nz
  • University_of_Wisconsin-Madison.xml: *.wisc.edu also covers *.library.wisc.edu
  • uptodown.com.xml: *.uptodown.com also covers www.uptodown.com, api.uptodown.com, blog.uptodown.com, dw.uptodown.com, feeds.uptodown.com, gstatic.uptodown.com, img.uptodown.com, stat.uptodown.com, stc.uptodown.com
  • US-Dept-of-Veterans-Affairs.xml: *.vaforvets.va.gov also covers www.*.vaforvets.va.gov
  • US-Dept-of-Veterans-Affairs.xml: *.vba.va.gov also covers www.*.vba.va.gov
  • US_State_Department.xml: *.history.state.gov also covers www.history.state.gov
  • UserEcho.com.xml: *.userecho.com also covers www.userecho.com, blog.userecho.com, feedback.userecho.com
  • Vdopia.xml: *.vdopia.com also covers mobile.sb.vdopia.com
  • Wikidot.xml: *.wdfiles.com also covers 1.*.wdfiles.com, 2.*.wdfiles.com, 3.*.wdfiles.com, 4.*.wdfiles.com, 5.*.wdfiles.com, 6.*.wdfiles.com, 7.*.wdfiles.com, 8.*.wdfiles.com, 9.*.wdfiles.com
  • Wikinvest.xml: *.wikinvest.com also covers *.www.wikinvest.com
  • Wiley.xml: *.wiley.com also covers onlinelibrarystatic.wiley.com, sp.onlinelibrary.wiley.com
  • Wolfram_Alpha.xml: *.wolframalpha.com also covers api.wolframalpha.com, api-cn.wolframalpha.com, api-maps.wolframalpha.com, api-tw.wolframalpha.com, developer.wolframalpha.com, m.wolframalpha.com, preview.wolframalpha.com, products.wolframalpha.com, volunteer.wolframalpha.com, wc.wolframalpha.com, www1.wolframalpha.com, www3.wolframalpha.com, www4b.wolframalpha.com, www4c.wolframalpha.com, www4d.wolframalpha.com, www4f.wolframalpha.com, www5a.wolframalpha.com, www5b.wolframalpha.com
  • Woot.xml: *.woot.com also covers images.deals.woot.com, gzip.static.woot.com

cc @Hainish @cschanaj @koops76 @Bisaloo

@RReverser
Copy link
Contributor Author

Updated to group output, it's less noisy this way.

@RReverser
Copy link
Contributor Author

Not sure if I should attempt to remove these automatically or going manually one by one will be easier...

@Bisaloo
Copy link
Collaborator

Bisaloo commented Sep 1, 2017

For some of them, conceptually, it kind of makes sense to be that way. For example, UserEcho.com.xml lists "actual" subdomains first and then the wildcard is used to cover customer subdomains.

What I like about it is that the day custom subdomains support is removed, we are just one line away to fix this ruleset, we don't have to find and check every "regular" subdomain again.

A similar alternative would be:

	<target host="userecho.com"/>
	<target host="*.userecho.com"/>

		<!-- All regular subdomains -->
		<test url="http://www.userecho.com/"/>
		<test url="http://blog.userecho.com/"/>
		<test url="http://feedback.userecho.com/"/>

		<!-- Customer subdomains examples -->
		<test url="http://imgur.userecho.com/"/>
		<test url="http://unchecky.userecho.com/"/>

But I have no idea how big an impact this overlapping targets have on performance.

@Bisaloo
Copy link
Collaborator

Bisaloo commented Sep 1, 2017

That's interesting, looking at cdc90c4, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

@RReverser
Copy link
Contributor Author

Well, I noticed them as part of hunt for wildcard-in-the-middle regexps, and thought that it makes sense to remove overlaps completely as they're somewhat confusing.

On the other hand, I see your point too, even though 1) *.somehost hosts are better for both CPU and memory performance than long lists of precise subdomains (especially with reverse-trie approach) and 2) it's future-proof for cases when domain gets new subdomains that also need http: -> https: rewrite and so I'd rather wish to keep these than long lists of specific targets.

But I know not everyone in HTTPS Everywhere team agrees with this 😄

Anyway, in cases where they overlap, I think using <test url /> makes more sense if it's just for documentation purposes, although by the time custom subdomains are gone, ruleset will likely have to update list of targets anyway as new subdomains might appear or host might get HSTS etc.

@RReverser
Copy link
Contributor Author

RReverser commented Sep 1, 2017

That's interesting, looking at cdc90c4, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

Yup, I got the same impression.

@cschanaj
Copy link
Collaborator

cschanaj commented Sep 2, 2017

But I have no idea how big an impact this overlapping targets have on performance.

AFAIK, the lookup performance will be improved as the complicated operations in rules.js#L347-L354 can be avoid.

That's interesting, looking at cdc90c4, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

I do not know this until recently when I create #11998 looking at rules.js. Maybe we should document this behavior.

@RReverser
Copy link
Contributor Author

It is documented at https://www.eff.org/https-everywhere/rulesets like some other details, but yes, looks like that information is out of sync with contributing guidelines in the repo.

@RReverser
Copy link
Contributor Author

Automated fix was merged, so closing this as resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants