Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] alexa webhook to prioritize the issues based on their domain name #1533

Closed
3 tasks done
MDTsai opened this issue Apr 26, 2017 · 15 comments
Closed
3 tasks done

Comments

@MDTsai
Copy link
Contributor

MDTsai commented Apr 26, 2017

I would like to implement an alexa webhook. While a new issue created, the webhook can get the ranking of the website on alexa and attach the information to the issue. This could help us to prioritize the issue when triage.

@MDTsai MDTsai self-assigned this Apr 26, 2017
@MDTsai
Copy link
Contributor Author

MDTsai commented May 3, 2017

I think in helpers.py, I can extend parse_and_set_label to get the base URL from URL, then query the alexa ranking, set to a new label (?)

@zoepage
Copy link
Member

zoepage commented May 3, 2017

@karlcow ^ ?

@karlcow
Copy link
Member

karlcow commented May 3, 2017

@MDTsai this is a good idea. Interesting Project. We crossed the bridge of the tests flood that we need to handle every day. So we really need to prioritize. This will require unittests and probably performance tests.

First - Evaluation

As a first test, we should grab the list of all individual domains we have currently and make a one time script querying alexa for all these domains and define how we would set meaningfully the priority for each domain. So we have a better idea if it really help us to prioritize.

Implementation ideas

There are probably two ways of doing this.
These services might require a requests queue management.
It has to be async so we don't wait on the Alexa answer before publishing the information.

WebHooks

  1. a new issue is created
  2. Webhook listening on IssueEvent for the action: opened.
  3. Grab the URI of the issue and the payload (both are in the event)
  4. Send a request to Alexa for the rank (listen for success and failure). Log failures.
  5. Convert the rank in a priority scale (probably a scale of 2 "It's important./It's not important")
  6. Ask webcompat-bot to add the label.

Internally

  1. A new issue is created
  2. we get the URI back from GitHub and we know the payload.
  3. we send a query to an async function in the current Flask (using threading? async? or maybe Flask Signals TO THINK)
  4. Send a request to Alexa for the rank (listen for success and failure). Log failures.
  5. Convert the rank in a priority scale (probably a scale of 2 "It's important./It's not important")
  6. Ask webcompat-bot to add the label.

@karlcow karlcow changed the title [meta] alexa webhook [meta] alexa webhook to prioritize the issues based on their domain name May 3, 2017
@miketaylr
Copy link
Member

think in helpers.py, I can extend parse_and_set_label to get the base URL from URL, then query the alexa ranking, set to a new label (?)

You could do it this way, or just create a new webhook endpoint and operate only on the URL.

@miketaylr
Copy link
Member

miketaylr commented May 4, 2017

Convert the rank in a priority scale (probably a scale of 2 "It's important./It's not important")

I think we're getting ahead of ourselves. First step is just to leave a comment, or add a label that reflects what the Alexa ranking is. I think we need a human process to figure out what the priorities are, and once that's understood, teach robots how to help.

@karlcow
Copy link
Member

karlcow commented May 5, 2017

I think we're getting ahead of ourselves. First step is just to leave a comment, or add a label that reflects what the Alexa ranking is.

But to get there you are already too fast in developing the mechanics. label is not practical. Let's say you get an alexa-000001, alexa-000002, etc. A comment would be the sensible thing to do. But as I said in my initial comment, it's too soon already. Small steps. 👣

@miketaylr About

I think we need a human process to figure out what the priorities are, and once that's understood, teach robots how to help.

And It's why I was saying:

As a first test, we should grab the list of all individual domains we have currently and make a one time script querying alexa for all these domains and define how we would set meaningfully the priority for each domain. So we have a better idea if it really help us to prioritize.

This means: Create a script, nothing into our mechanic, and test with the current list of domains so we can learn something. I don't even think we should go ahead without an idea of what our data are and if having an Alexa rank helps. :)

@miketaylr
Copy link
Member

et's say you get an alexa-000001, alexa-000002, etc. A comment would be the sensible thing to do.

Yeah, that kind of label wouldn't be interesting, it's too granular. Something more like like alexa-top-100, alexa-top-1000, alexa-top-10000, alexa-top-100-mexico, or whatever.

That said, I think we should let @MDTsai have the freedom to experiment and do research (some script like you're describing could be useful). Let's discuss f2f in our next team meeting about priority triage, I have some other thoughts on how it might be done.

@MDTsai
Copy link
Contributor Author

MDTsai commented May 5, 2017

There are 2 alexa APIs provide by amazon, 1st is alexa top sites. It gives a list by request, we can give start ranking, count and country code. Per URL request return costs $0.0025.

2nd API is alexa web information service. This API provides detail information like here mentioned. I don't think it's a good idea to query each website then cache it, we don't need that detailed information. This API doesn't require minimum-fee, first 1000 request is free.

The purpose of this idea is to save time handling issues, so my idea is to give these priorities:

  1. Critical: alexa top 100 in worldwide
  2. Important: alexa top 101-1000 in worldwide or alexa top 100 in tier 1 countries/regions
  3. Normal: alexa top 1001-10000 or alexa top 101-1000 in tier 1 countries/regions
  4. Others: others
    Numbers are not fixed, it's just a concept and we can change that as our wish. We can cache site for first 3 priorities and update every week or month, to increase the response time.

@karlcow
Copy link
Member

karlcow commented May 5, 2017

@MDTsai thanks for the clarification. the 1st API makes it possible to do caching indeed.
Currently we have close of 6000 bugs, maybe 10000 with Tech Evangelism/Mozilla, with some duplicates.

@softvision-sergiulogigan
Copy link
Collaborator

A question from our meeting on May 9th:
How will the Alexa thing work? Sites may be on a low position on a global scale, but very high in a country list.

@MDTsai
Copy link
Contributor Author

MDTsai commented May 10, 2017

@softvision-sergiulogigan thanks for the question. For the Alexa thing work, while an issue opened, the webhook will add a label or leave a comment with Alexa ranking. It's not decided yet. I prefer labels, easy to filter when do diagnosis.
For 2nd question, in my previous comment, if it's top 100 in tier 1 countries/regions, it's also important for us, could handle that faster than others. This remind me to find tier 1 list.

@karlcow
Copy link
Member

karlcow commented May 11, 2017

Related to this discussion the minutes of the meeting this week.
https://wiki.mozilla.org/Compatibility/Meetings/2017-05-09#webcompat_Priority_triage_.28miketaylr.29

 Mike: we still need a bit more information. Do you think we should add the labels today, or should we wait until we have stuff like the alexa bot? (Team agrees that we can start now)
Sergiu: I have a question: How does the Alexa thing work? Sites may be on a low position on a global scale, but very high in a country list.
Mike: I think it's unknown. We have a specific GitHub issue (https://github.com/webcompat/webcompat.com/issues/1533), can you raise that question in the issue?
Seriu: Sure. 

@karlcow
Copy link
Member

karlcow commented Jun 14, 2017

@MDTsai I arrange a list of issues with the ones you opened in your first comment. It will be easier to see the progress and if we missed anything.

@MDTsai
Copy link
Contributor Author

MDTsai commented Jun 15, 2017

Thanks @karlcow !

@miketaylr
Copy link
Member

This seems done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants