Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how the New tab is generated on the Pull screen for external connections #811

Merged
merged 5 commits into from
Nov 8, 2021

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Oct 18, 2021

Description of the Change

When building the content that shows on the New tab of the Pull screen, we determine what content has already been pulled and what content has already been skipped. This is then trimmed down to 200 items and used in a post__not_in query.

This can cause two different problems:

  1. Since we limit this to 200 items, if you have more than 200 items, you could end up seeing already pulled or skipped items on the New tab
  2. You might see performance problems if you have lots of items that need excluded. This will mostly impact external connections and even more so, WordPress.com VIP sites, since we use a lower timeout value in those requests

This PR attempts to fix both of those problems in the following ways:

  1. Create a new custom endpoint that is used to generate the content for the New tab on the Pull screen
  2. Utilize this endpoint if we are on the Pull screen of an external connection and we have no previously pulled or skipped items
  3. Make a POST request to this endpoint (instead of GET), which allows us to pass data via the body parameter instead of in the URL
  4. Remove the 200 item limit on excluded items, since those are now passed in the body

Because we remove that 200 item limit, we won't run into issues with previously pulled or skipped items showing on the New tab. Fixes #808

Because we are using a custom endpoint with a POST request, fixes potential timeout issues when on a WordPress.com VIP site because we no longer have the strict 3 second timeout limit. Fixes #809

Alternate Designs

Ultimately we are still running a post__not_in query, which may not be the most performant. Other approaches were considered but ultimately rejected:

  1. Instead of using post__not_in, increase our posts_per_page value and then remove excluded items after. This can still cause performance issues and causes pagination issues
  2. To get around the performance issues of a large posts_per_page, could set an upper limit and break out into new queries once that limit is reached. Still have pagination problems and could still have performance issues because we are running more queries
  3. To fix pagination issues, would have to potentially run multiple extra queries to determine proper offset. For instance, if someone wants page 10 of the results, we can't assume page 10 of the query matches what we want. We would have to run queries for each page leading up to page 10 to determine what offset we need. This would have performance implications for higher paginated pages
  4. Instead of excluding items based on their post ID, have a piece of meta associated with all external pulled items that we can use to exclude. This would fix things going forward but isn't backwards compatible

Benefits

All excluded items are properly excluded from the New tab. Better performance in certain situations

Possible Drawbacks

I can't think of any but maybe some drawback around switching from a GET request to a POST request?

Verification Process

  1. Set up a site with at least one internal connection and one external connection
  2. Ensure the Pull screen works for both
  3. Try pulling and skipping content from both
  4. Ensure things still work. Be sure to test pagination, search, skipping, pulling, etc.
  5. Ideal is to test with a large amount of content pulled in, though this can be tedious to do

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

#808
#809

Changelog Entry

Fixed - change how the Pull screen New tab is populated for external connections

Darin Kotter added 3 commits October 15, 2021 16:58
…l connections. When making a request for new content with previously pulled or skipped items and when using an external connection, use this new endpoint via a POST request
@dkotter dkotter self-assigned this Oct 18, 2021
@dkotter dkotter requested a review from dinhtungdu October 25, 2021 22:42
@jeffpaul jeffpaul added this to the 1.6.7 milestone Oct 26, 2021
@jeffpaul jeffpaul requested a review from helen October 28, 2021 16:51
@jeffpaul
Copy link
Member

@dkotter its probably worthwhile to add some explicit comments in here as to why we took this approach especially for the case where VIP or someone else might flag it as a concern (thus there would be some comments for them to reference and better understand the why maybe even with a link to the PR or issue with our analysis and stats on the topic).

Copy link
Contributor

@dinhtungdu dinhtungdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkotter Awesome work! 🎉 This is looking good to me. I just have some minor comments, please take a look : )

}

// Filter documented in includes/classes/ExternalConnections/WordPressExternalConnection.php
$query = new \WP_Query( apply_filters( 'dt_remote_get_query_args', $args, $request ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why do we need this filter in the callback, which will be executed on the remote sites, not from the site making the request?

Also, how can we distinguish between normal remote_get calls and pull content calls? I'm thinking about checking for the existence of the third argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and removed this filter as it isn't needed. We run the same filter before we end up making our remote POST request, so this would just result in filtering the same values twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And after thinking about it a bit more, not sure it makes sense to run the first filter, as that's meant to filter the remote_get query args. I've moved that code to run after our remote post call and added a new filter in the remote post method around those arguments. So to answer your second question, if you only want to filter the arguments for a Pull call, you'd use this new filter instead

@jeffpaul jeffpaul requested a review from dinhtungdu November 1, 2021 18:53
Copy link
Contributor

@dinhtungdu dinhtungdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Copy link
Contributor

@helen helen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@dinhtungdu dinhtungdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I didn't approve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants