-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alexa sunset. Priority rethinking. #3656
Comments
Would be worth to add this to 2022H1 |
One thing to add, since we'll need to rethink this, wonder if it makes sense to change the priority importance to priority importance per country (it could be a different label too). This could help us prioritize diagnosis. Two examples: |
Yes it's a good opportunity to revise and improve the script here, which was trying to play with locales too. webcompat.com/tools/topsites.py Lines 131 to 143 in 408d803
|
https://tranco-list.eu/ can be used - and it's free and has a nice Python API (https://pypi.org/project/tranco/) - there's also an HTTP API if that's preferred.
I guess it will lose Alexa data eventually, but the data will still provide some signal. |
Thanks for the suggestion, Mike :) I've looked at 2 Tranco lists (with and without Alexa) and going to document my findings here. There are two things I've noticed so far that are worth considering when making the switch to Tranco. Also keeping in mind this rule in #1533 (comment) from @MDTsai :
1) Ranking per country Countries list that we fetch ranking for: 'US', 'FR', 'IN', 'DE', 'TW', 'ID', 'HK', 'SG', 'PL', 'GB', 'RU'. It's worth mentioning, there is a checkbox "Only include domains included in the Chrome User Experience Report of February 2022" on https://tranco-list.eu/configure, which allows filter by country. It doesn't seem to be accurate though - it's weighted towards global sites rather than local. 2) For some sites ranking is lower than current Alexa's ranking and "perceived" ranking For certain sites, the ranking is lower on the global level (especially in the list without Alexa). This matters once a site is out of the top 100 / top 1000. In this screenshot, in the Tranco list without Alexa all the sites that are considered to be in Alexa top 100, are ranked lower. In the Tranco list with Alexa some are still in the top 100 and etsy.com is out. Also just saw a message on Tranco's website (so my observations may not be relevant soon 🙂 ):
|
An update here: I've looked at https://pypi.org/project/tranco/ and it is downloading a csv with 1 million domains, which seems a bit too much for our need as we only require 10000 max. So I wrote a script that fetches tranco's API and gets a recent list by date, (for example https://tranco-list.eu/api/lists/date/2022-03-26) and then downloading a csv with 10k (https://tranco-list.eu/download/GZ6NK/10000). As for storing it, I'm thinking to create 2 tables, one would contain data from this csv (domain and ranking) and the second one would contain top domains per country from Alexa (domain, ranking, country). There is probably no need to store priority, as it can be determined in the code, since it primarily depends on the rank. So the ranking would be determined as follows (will have to join two tables to get the rank):
Thinking of going 2 tables route because when fetching updates from Tranco it would be easy to archive the old table and just create a new one (in the same it's done right now) without the need to search and update a rank for each domain. And the per country rank table will not be updated once Alexa turns off their API. Regions that we're fetching at the moment: 'US', 'FR', 'IN', 'DE', 'TW', 'ID', 'HK', 'SG', 'PL', 'GB', 'RU'. I'm probably missing something, so any insight or correction is appreciated :) |
Fixes #3656 - Top sites priority change
On May 1st, 2022, Amazon will sunset Alexa.
https://support.alexa.com/hc/en-us/articles/4410503838999
The priority flag for our bug is defined according to Alexa ranking. We need to rethink this strategy.
The text was updated successfully, but these errors were encountered: