Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cron event to validate individual URLs #6520

Merged
merged 49 commits into from
Oct 21, 2021

Conversation

dhaval-parekh
Copy link
Collaborator

@dhaval-parekh dhaval-parekh commented Aug 9, 2021

Summary

  • Update URLValidationCron to implement a queue for validating URLs on an hourly basis. Once a week the URLs returned by ScannableURLProvider are queued up, and then on an hourly basis each URL is validated one-by-one until there are none left. If the validated environment changes at any point (e.g. changing the template mode) then the queue is cleared and new URLs are queued up for validation.
  • Update RecurringBackgroundTask to support re-scheduling when the interval changes.
  • Allow passing $options into AMP_Theme_Support::get_supportable_templates() to obtain results for options which are not saved in the DB.
  • Add more options to the validated environment, including what templates are suppoerted and what Reader theme is selected.
  • Deprecate the obsolete --force flag on wp amp validation run which no longer has any effect.
  • Refactor ScannableURLProvider to eliminate the need for URLScanningContext. Ensure that only URLs for supported templates (and AMP-enabled posts) are returned by get_urls.
  • Eliminate the unused amp_url_validation_sleep_time filter.
  • Use the year archive URL as returned by get_year_link() rather than manually constructing it with query parameters. Ensure it will have posts on it.

Fixes #5750

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@dhaval-parekh dhaval-parekh requested a review from pierlon August 9, 2021 15:11
@dhaval-parekh dhaval-parekh marked this pull request as ready for review August 16, 2021 07:23
@github-actions
Copy link
Contributor

github-actions bot commented Aug 16, 2021

Plugin builds for 284e489 are ready 🛎️!

@westonruter westonruter added this to the v2.2 milestone Sep 10, 2021
src/BackgroundTask/CronBasedBackgroundTask.php Outdated Show resolved Hide resolved
src/BackgroundTask/CronBasedBackgroundTask.php Outdated Show resolved Hide resolved
src/BackgroundTask/CronBasedBackgroundTask.php Outdated Show resolved Hide resolved
src/Validation/URLValidationQueueCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationQueueCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationQueueCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationQueueCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationQueueCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationCron.php Outdated Show resolved Hide resolved
src/Validation/URLValidationCron.php Outdated Show resolved Hide resolved
@dhaval-parekh dhaval-parekh force-pushed the dev-tool/5750-refactor-url-validation-cron branch from ad654b9 to 2ce519a Compare October 1, 2021 09:14
@dhaval-parekh dhaval-parekh self-assigned this Oct 7, 2021
dhaval-parekh and others added 8 commits October 7, 2021 20:17
…5750-refactor-url-validation-cron

* 'develop' of github.com:ampproject/amp-wp: (28 commits)
  Use assertContains for array
  Use DependencyInjectedTestCase
  Add test assertions
  Update comments
  Update unit test cases
  Remove unwanted style, Fix notice style in mobile
  Revert some changes and update unit test case
  Remove changes from .github
  Remove template change notice from other pages
  Remove template change notice from other pages
  Fix unefined index issue in test case
  Fix unit test cases
  Fix unit test cases
  Fix unit test cases
  Fix unit test cases
  Cache result of page cache check and use that when needed
  Fix unit test cases
  Update test cases
  Remove unwanted localize variable
  Only show notice for flush page cache if page cache is enable
  ...
@@ -124,7 +134,7 @@ public function test_get_event_name() {
/** @covers ::get_interval() */
public function test_get_interval() {
$this->assertEquals(
URLValidationCron::DEFAULT_INTERVAL_DAILY,
URLValidationCron::DEFAULT_INTERVAL_HOURLY,
Copy link
Member

@westonruter westonruter Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this change is problematic because there is an existing daily schedule, and because of that the change to hourly is not resulting in the re-scheduling. It seems we need to do a call to wp_get_scheduled_event() instead of using wp_next_scheduled() in RecurringBackgroundTask::schedule_event(), and then re-schedule it if the schedule does not match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took my first stab at addressing this in 422d0f2. I'm not sure it's best, however. Namely, if there was a daily schedule that is going to happen in 23 hours, but then the schedule gets changed to hourly, then the initial timestamp remains 23 hours in the future. I guess it would be better to unschedule it enturely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There, I think this is better: bf25b0c.

@dhaval-parekh dhaval-parekh changed the title Dev tool/5750 refactor url validation cron Refactor cron event to validate individual URLs Oct 13, 2021
}

// @todo The URLs should be contextual based on the selected template mode, in particular only singular URLs should be included if using legacy Reader mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also relates to site scanning in general in #6610. Only the available templates should be scanned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5bbadea

Comment on lines +88 to +89
// If there are no URLs queued, then obtain a new set.
if ( empty( $data['urls'] ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change in the validated environment (and enabled post types and templates) should also trigger this, since we need to obtain a new set of URLs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 55c0ea7 and ab2bff1

…5750-refactor-url-validation-cron

* 'develop' of github.com:ampproject/amp-wp: (25 commits)
  Skip test_get_theme_info when no active theme (in WP 4.9)
  Fix logic in get_theme_info to return parent and child theme both
  Use wp_get_theme() to obtain theme version on demand; account for sources with deleted themes/plugins
  Only obtain theme versions for parent theme and child theme
  Omit parent_theme field if child theme not active
  Use closure to call normalize_plugin_info
  Use class name constant
  Add since tag and sort imports
  Use imported WP_Query
  Ensure data is array before sending it through array_map
  Update message in AMP Support page
  Apply suggestions from PR
  Use class injector instead of fetching from Service class
  Add delayed in SupportLink class and update the unit test cases
  Add more unit test cases for SupportData()
  Apply suggestions from PR and update unit test cases
  Update unit test case
  Fix unit test cases
  Fix fatal error in WP 4.9
  Update logic to get error log
  ...
* @param bool $enabled Enabled.
* @internal
*/
return apply_filters( 'amp_temp_validation_cron_tasks_enabled', false );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking in a subsequent PR to just remove the SavePostValidationEvent entirely. We'll have a weekly cron to re-check the most recently-published post aside from the user-initiated site scan in #6610. If a user has dev tools turned off, this will do a periodic health check without the overhead of doing validation for every single post when that data may not ever be used. What's more is that there is currently no garbage collection of validated URLs (although there needs to be), and without this the database would just keep silently getting larger and larger with validation data.

cc @delawski

@dhaval-parekh
Copy link
Collaborator Author

Changes look good to me.

…5750-refactor-url-validation-cron

* 'develop' of github.com:ampproject/amp-wp:
  Reset ini config at tearDown
  Use temp filename for error log and reset ini before assertion
  Add test case for SupportData()
  Cover missing last line of error log
  Fix spacing issue
  Optimize function that reads error log.
  Remove unused variable
  Add test for amp-story-captions
  Update tests to fix failures
  Update amphtml spec to 530a457
@westonruter
Copy link
Member

Not sure why E2E would be failing here.

@westonruter
Copy link
Member

The E2E error:

TimeoutError: Element .done__links-container a:not([class*="--active"]) not found

This element is missing in the test:

image

@westonruter
Copy link
Member

I haven't gotten the tests to run locally myself, so I'm shooting in the dark a bit. I suspect the issue is that now that \AmpProject\AmpWP\Validation\ScannableURLProvider::get_urls() only returns URLs on which AMP is enabled given the current template mode, if legacy Reader mode was initially selected then only singular posts will be returned among the URLs.

@delawski So this is something else that needs to be added to #6650. In addition to re-fetching the scannable URLs for site scanning after changing the template mode, the Done screen of the Onboarding Wizard will also need to fetch the preview URLs after the settings have been saved instead of exporting them up-front here:

'PREVIEW_URLS' => $this->get_preview_urls( $this->scannable_url_provider->get_urls() ),

@westonruter
Copy link
Member

Alas, my supposed fix in fce116d didn't work. @delawski I believe you'd be able to quickly identify what is going on.

@westonruter westonruter requested a review from delawski October 20, 2021 18:38
There needs to be at least one post or page for the "Browse Site" button to redirect to an AMP document. We're creating a test post before all tests and clean up the environment afterwards.
@delawski
Copy link
Collaborator

@delawski So this is something else that needs to be added to #6650. In addition to re-fetching the scannable URLs for site scanning after changing the template mode, the Done screen of the Onboarding Wizard will also need to fetch the preview URLs after the settings have been saved instead of exporting them up-front here

This has been already addressed in #6610 (ffe1c0e). We'll just have to combine changes done there with the update (43899d3) I pushed to this PR. What is does is it makes sure that no preview-related elements are rendered if there are no preview URLs (e.g. when there are no posts or pages).

Alas, my supposed fix in fce116d didn't work. @delawski I believe you'd be able to quickly identify what is going on.

Your intuition here was correct. However, I just learnt that createNewPost doesn't actually publish a post - it just opens up an editor and set some defaults.

We create and publish posts using REST API and that's the approach I took in 8f2985a.

Note that besides creating a test post, we also create a test page. It's because in the Reader mode only singular templates are listed in the nav menu. It means that it would contain only a single element that would be active by default. It would lead to the same E2E error:

TimeoutError: Element .done__links-container a:not([class*="--active"]) not found

The changes I've introduced above (trashing all the post and pages, in particular) resulted in side-effects in the AMP Settings Review panel. They've been fixed similarly to the ones in the Onboarding Wizard (8376583).

@westonruter
Copy link
Member

Thank you!

@@ -40,7 +40,7 @@ export function SiteReview() {
return null;
}

const previewPermalink = STANDARD === themeSupport ? HOME_URL : pairedUrlExamples[ pairedUrlStructure ][ 0 ];
const previewPermalink = STANDARD === themeSupport ? HOME_URL : pairedUrlExamples?.[ pairedUrlStructure ]?.[ 0 ] ?? HOME_URL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed in another PR, but even when in Standard mode we shouldn't assume that the HOME_URL has AMP enabled for it. The user could turn off the homepage template.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the way we get the preview permalink changes in #6610 quite a bit. I guess this will likely need to be changed in context of your comment.

const previewPermalink = useMemo( () => {
const pageTypes = themeSupport === READER ? [ 'post', 'page' ] : [ 'home' ];
return scannableUrls.find( ( { type } ) => pageTypes.includes( type ) )?.[ urlType ] || homeUrl;
}, [ homeUrl, scannableUrls, themeSupport, urlType ] );

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, since scannableUrls that we're now getting are context-aware, I think we won't have to guess which page to use for a preview. We could just grab the first one from the list, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!

Co-authored-by: Piotr Delawski <[email protected]>
@westonruter westonruter enabled auto-merge October 21, 2021 14:21
@westonruter westonruter merged commit ef0548b into develop Oct 21, 2021
@westonruter westonruter deleted the dev-tool/5750-refactor-url-validation-cron branch October 21, 2021 14:25
@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelogged Whether the issue/PR has been added to release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor URLValidationCron to schedule events to validate individual URLs
3 participants