Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out irrelevant feed URLs #3780

Merged
merged 1 commit into from
Dec 14, 2022
Merged

Conversation

Eakam1007
Copy link
Contributor

Issue This PR Addresses

Resolves #3688

Type of Change

  • Bugfix: Change which fixes an issue
  • New Feature: Change which adds functionality
  • Documentation Update: Change which improves documentation
  • UI: Change which improves UI

Description

The feed discovery services can return feed URLs that are not relevant. One such example is the wordpress.com's comments feed: https://blog.wordpress.com/comments/feed.
We want to filter out the irrelevant URLs for four hosts - dev.to, medium.com, wordpress.com, blogspot.com.
This PR adds filters to remove the following:

  • wordpress comments feed
  • wordpress json oembed feed - Wordpress already returns an rss+xml feed url. So, we skip the json+oembed feed url
  • blogspot.com service.post url - There are two other feed urls for atom+xml and rss+xml which are returned

Steps to test the PR

  • Ensure you have login component running locally
  • Go through the sign up process until you reach the blog URL form
  • Test different blog URLs from different hosts and check if the feed URLs mentioned above are filtered out

Checklist

  • Quality: This PR builds and passes our npm test and works locally
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not (if applicable)
  • Documentation: This PR includes updated/added documentation to user exposed functionality or configuration variables are added/changed or an explanation of why it does not(if applicable)

manekenpix
manekenpix previously approved these changes Dec 13, 2022
Copy link
Member

@manekenpix manekenpix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Blogspot URLs still return an extra feedUrl that I'm not sure if we need.
Maybe cc @humphd can clarify this.

  • Wordpress:
    Before:
{
    "feedUrls": [
        {
            "feedUrl": "https://neilong31.wordpress.com/feed/",
            "type": "blog"
        },
        {
            "feedUrl": "https://neilong31.wordpress.com/comments/feed/",
            "type": "blog"
        },
        {
            "feedUrl": "https://public-api.wordpress.com/oembed/?format=json&url=https%3A%2F%2Fneilong31.wordpress.com%2F&for=wpcom-auto-discovery",
            "type": "blog"
        },
        {
            "feedUrl": "https://public-api.wordpress.com/oembed/?format=xml&url=https%3A%2F%2Fneilong31.wordpress.com%2F&for=wpcom-auto-discovery",
            "type": "blog"
        }
    ]
}

After

{
    "feedUrls": [
        {
            "feedUrl": "https://neilong31.wordpress.com/feed/",
            "type": "blog"
        }
    ]
}
  • Blogspot
    Before
{
    "feedUrls": [
        {
            "feedUrl": "https://whataboutopensource.blogspot.com/feeds/posts/default",
            "type": "blog"
        },
        {
            "feedUrl": "https://whataboutopensource.blogspot.com/feeds/posts/default?alt=rss",
            "type": "blog"
        },
        {
            "feedUrl": "https://www.blogger.com/feeds/6561714874239648778/posts/default",
            "type": "blog"
        }
    ]
}

After

{
    "feedUrls": [
        {
            "feedUrl": "https://whataboutopensource.blogspot.com/feeds/posts/default",
            "type": "blog"
        },
        {
            "feedUrl": "https://whataboutopensource.blogspot.com/feeds/posts/default?alt=rss",
            "type": "blog"
        }
    ]
}

@humphd
Copy link
Contributor

humphd commented Dec 13, 2022

So it's giving both Atom and RSS versions of the feed. I guess we could filter out the ?alt=rss version if there are two?

@Eakam1007
Copy link
Contributor Author

Yea, I ended up leaving the atom or rss versions in. I can add another filter to remove the ?alt=rss version

Add tests for feed URL filtering

Add filter for blogpost.com alterate rss feed
@Eakam1007
Copy link
Contributor Author

I have added a filter to skip the ?alt=rss feed url

@humphd
Copy link
Contributor

humphd commented Dec 14, 2022

@manekenpix let me know if you want to review this, otherwise I'll merge this week.

@manekenpix manekenpix merged commit f2ae958 into Seneca-CDOT:master Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skip returning irrelevant feeds from feed-discovery service
3 participants