Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wappalyzer - Cookie Compliance results observations #2292

Closed
rockeynebhwani opened this issue Aug 4, 2021 · 12 comments
Closed

Wappalyzer - Cookie Compliance results observations #2292

rockeynebhwani opened this issue Aug 4, 2021 · 12 comments
Assignees
Labels
bug Something isn't working question Further information is requested
Milestone

Comments

@rockeynebhwani
Copy link
Contributor

While browsing some sites using Wappalzyer chrome extension, I was seeing multiple Cookie Consent Solutions being reported for a site. So, I ran this query on July 2021 mobile table

  SELECT
    url,
    COUNT(*) as count
  FROM
    `httparchive.technologies.2021_07_01_mobile`
  WHERE
     category = 'Cookie compliance'    
  GROUP BY
    url
  having count > 1   
  order by count desc  
   

15,109 sites found where more than one Cookie consent solutions are being reported. Maximum being 4 such solutions on a single site. Output of the query is attached below -

More than 1 Cookie consent solutions.xlsx

Now, this will be unusual for a site to have more than one cookie consent management solutions. This indicates two possibilities -

  1. Detection in Wappalyzer is incorrect

OR

  1. Sites are including multiple consent management solutions by mistake which is possibly hurting their performance metrics. I have seen this happening on Shopify sites where uninstalling and app doesn't delete the code and residual code/scripts are left and hurts the performance. In any case, if that's what we are seeing in Wild, it will be good to call this out.

At this stage, I don't know the exact reason behind this and need more time to look into this... but this is what I found so far.

Let's take an example of a site which is reporting 4 CMPs - http://www.formadoresit.es/

image

Looking at this, I think detection for 'QuanCast Choice' may be incorrect - https://github.com/AliasIO/wappalyzer/pull/4115/commits/0c761309c1b0ae8f124b8c5fc9973a98523bb6e4 as it's just looking for __tcfapi. I had reported the original issue against which this commit was done but I didn't realise that presence of '__tcfapi' was added in detection.

I am raising this issue if others have any observations and knowledge of these CMPs. If I notice, anything I will comment.

Tagging @max-ostapenko and @ydimova as they have plan to cover CMPs in Privacy chapter

@rockeynebhwani rockeynebhwani added bug Something isn't working writing Related to wording and content labels Aug 4, 2021
@rockeynebhwani rockeynebhwani changed the title Wappalyzer - Cookie Compliance results Questions Wappalyzer - Cookie Compliance results observations Aug 4, 2021
@rockeynebhwani rockeynebhwani added question Further information is requested and removed writing Related to wording and content labels Aug 4, 2021
@max-ostapenko
Copy link
Contributor

@rockeynebhwani Thanks for raising it.

Indeed I see the following issues:

  1. Quantcast Choice: __tcfapi is not related to this technology
  2. and Hubspot Consent Banner: _hsp and script don't define that consent banner is activated, it is rather always loaded with Hubspot tracking library. Would you maybe try something like document.getElementById('hs-eu-cookie-confirmation')?
  3. obvious UX issue with NextRoll: it loads its own consent banner on a page in parallel with other integrated CMP solutions. I'm not sure if it can be integrated into existing CMP policies.

Let me know if you need assistance fixing 1 and 2.

We will consider these in our analysis in #2149.

@rockeynebhwani
Copy link
Contributor Author

Thanks @max-ostapenko

1 - I agree and I have raised a fix to issue this - https://github.com/AliasIO/wappalyzer/issues/4301
2 - Thanks for insights on hubspot but I am in two minds here. Clearly the JS being loaded is for cookie consent management and it's not ideal for HubSpot to load this script even though it's not in use... This insight actually helps us identify issues like this and may be highlight as part of our analysis. What do you say?

@rockeynebhwani
Copy link
Contributor Author

@max-ostapenko

When I query for 'HubSpot Analytics', I get 72,813 results -

  SELECT
    COUNT(*) as count
  FROM
    `httparchive.technologies.2021_07_01_mobile`
  WHERE
  category = 'Analytics'
  and app = 'HubSpot Analytics'   

But querying for 'HubSpot Cookie', I get 15,246 results -

  SELECT
    COUNT(*) as count
  FROM
    `httparchive.technologies.2021_07_01_mobile`
  WHERE
  category = 'Cookie compliance'
  and app = 'HubSpot Cookie'   

This makes me think that HubSpot by default doesn't include the 'consent' script. I am not familiar with HubSpot but is it a possibility that site owners are leaving some setting in HubSpot ON by mistake which results in inclusion of cookie consent script without site owners realising this? Just a thought..

@max-ostapenko
Copy link
Contributor

@rockeynebhwani I still see the library is included by default and provides some core consent management API.
If we look into the websites that don't list 'HubSpot Cookie' technology we will still have the _hsp variable and banner script loaded:

For the needs of the privacy chapter, I would be interested in what consent management is actually used on the website, not what functionality is loaded and the JS performance insights.

I also see that even when the policy banner is shown to visitors it's not identified correctly:

Please comment on the adjustments proposed: https://github.com/AliasIO/wappalyzer/pull/4335

@rockeynebhwani
Copy link
Contributor Author

@max-ostapenko - I am seeing 'HubSpot Cookie banner' being identified by Wappalyzer for following sites -

https://blog.ardesia.it/
https://holitrees.lt/
https://www.infoguard.ch/

Sometimes it just take time for chrome extension to pick this up depending upon how script is loaded. I typically wait for some time before clicking on chrome extension.

@max-ostapenko
Copy link
Contributor

I didn't know there is a Wappalyzer extension, thanks.

So hopefully the state of technology just was changed on all those websites since July, which is the cause of the difference between extension and dataset.

P.S. Did you saw the case when the Wappalyzer extension shows the Hubspot Analytics but doesn't show Hubspot Cookie?

@rviscomi
Copy link
Member

rviscomi commented Oct 1, 2021

Thanks for raising this issue @rockeynebhwani. Is there any remaining work on our end or is it ok to close it?

@rviscomi rviscomi added this to the 2021 Backlog milestone Oct 1, 2021
@rockeynebhwani
Copy link
Contributor Author

@rviscomi - I think @max-ostapenko should query the Sep-21 tables and check if he is still seeing anything odd. If not, we can close this.

@rockeynebhwani
Copy link
Contributor Author

@max-ostapenko - I ran my original query against Sep-21 tables and now I see 5,374 origins with more than one cookie compliance solution compared to 15,109 origins against Jul-21 tables

@max-ostapenko
Copy link
Contributor

max-ostapenko commented Oct 5, 2021

@rockeynebhwani I've checked the apps in the Cookie Compliance category:

Most of apps have expected statistic:

  • Hubspot Cookie (now HubSpot Cookie Policy Banner) usage. There was ~3K websites decline in usage since July. The definition is still up-to-date.
  • Quantcast Choice declined due to the fix we discussed above (88K to 25K)
  • AdRoll CMP System grew from 6K to 27K
  • Usercentrics: increased 2K to 14K

The only issue:

  • Borlabs Cookie had 12K websites in Jul, but I don't see anything in Sept, though extension is able to identify it successfully.
    SELECT * FROM `httparchive.technologies.2021_09_01_desktop` WHERE app='Borlabs Cookie'
    I'd like to push the fix if there is some bug, but I don't understand the cause...

@rviscomi may it be the issue on parsing or BigQuery side?

@rviscomi
Copy link
Member

rviscomi commented Oct 5, 2021

Doesn't seem like a BigQuery issue. For example https://www.apk-ag.de/ has this app in August but not in September. Loading the page manually with the extension shows the app as expected. The rest of its detections seem to be working fine.

Seems like an upstream detection bug with WebPageTest. In this test of the page I'm seeing similar results as the September data, with Borlabs Cookie omitted.

@pmeenan could you investigate if anything has changed between Wappalyzer and WPT detections?

@max-ostapenko
Copy link
Contributor

Borlabs Cookie information data is available starting Nov 2022.

Other issues discussed above also seem fine throughout past months.
@rockeynebhwani @pmeenan Can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants