Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor URLValidationCron to schedule events to validate individual URLs #5750

Closed
westonruter opened this issue Dec 30, 2020 · 2 comments · Fixed by #6520
Closed

Refactor URLValidationCron to schedule events to validate individual URLs #5750

westonruter opened this issue Dec 30, 2020 · 2 comments · Fixed by #6520
Assignees
Labels
Changelogged Whether the issue/PR has been added to release notes. Developer Tools Groomed P0 High priority Site Scanning
Milestone

Comments

@westonruter
Copy link
Member

westonruter commented Dec 30, 2020

Feature description

As noted in #5515 (comment):

We can address this in a subsequent PR, but I have an idea for how this could be modified to not make use of sleep(). Instead of there being one single event that scans all of the URLs, we can instead register a separate event for each for each URL, and offset the time() for each event by 10 minute intervals (for example). In this way only one URL would be validated during any given cron request, and there would be no need for sleep().

In other words, this would be similar to SavePostValidationEvent in which one URL is scheduled for validation. In fact, that class could be refactored to pass the URL as the schedule argument as opposed to the $post_id. It could then be used for scheduling validation after editing a post as well as for individually re-validating the sample set of URLs.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • Only one URL is validated per process (cron run). This will prevent timeouts from occurring.

Implementation brief

  1. Create a new option to store a queue of URLs to validate (e.g. amp_url_validation_queue).
  2. Once a day (or week), the URLValidationCron service should enqueue the URLs returned by \AmpProject\AmpWP\Validation\ScannableURLProvider::get_urls() into this stored option.
  3. Register a recurring event (every 10 minutes) to dequeue a URL from the stored option. If there is no URL, then abort.
  4. Fetch the validation results for the URL and store the results.
  5. Remove Conditional from the URLValidationCron service (and thus remove the amp_temp_validation_cron_tasks_enabled filter).

QA testing instructions

Demo

Changelog entry

@westonruter westonruter added Developer Tools WS:UX Work stream for UX/Front-end labels Dec 30, 2020
@westonruter westonruter added this to the v2.1 milestone Dec 30, 2020
@westonruter westonruter added the P1 Medium priority label Jan 15, 2021
@kmyram kmyram added the Groomed label Jan 19, 2021
@westonruter
Copy link
Member Author

Per @schlessera: We can rather use a queue instead of staggering the scheduled events. We can have a scheduled event that runs every tick, and if there is a something in the queue, then validate.

We should also have better logging.

@westonruter westonruter modified the milestones: v2.1, v2.2 Feb 19, 2021
@westonruter westonruter added P0 High priority and removed P1 Medium priority labels Jul 3, 2021
@pierlon pierlon removed the WS:UX Work stream for UX/Front-end label Jul 29, 2021
@westonruter westonruter changed the title Refactor URLValidationCron to schedule events to validate individual URLs Refactor URLValidationCron to schedule events to validate individual URLs Jul 29, 2021
@dhaval-parekh dhaval-parekh self-assigned this Aug 5, 2021
@milindmore22
Copy link
Collaborator

QA Passed
Testing a default run on a site populated with the theme unit test data (264 URLs) The runs were all completed successfully.

wp amp validation run
Crawling the site for AMP validity.
Validating 264 URLs...  100% [=======================================================================================================================================] 6:26 / 5:22
Success: 0 crawled URLs have invalid markup kept out of 3 total with AMP validation issue(s); 264 URLs were crawled.
+-------------------------+-----------+---------------+
| Template                | URL Count | Validity Rate |
+-------------------------+-----------+---------------+
| is_home                 | 1         | 100%          |
| is_singular[post]       | 66        | 100%          |
| is_singular[page]       | 24        | 100%          |
| is_singular[attachment] | 38        | 100%          |
| is_tax[category]        | 67        | 100%          |
| is_tax[post_tag]        | 63        | 100%          |
| is_author               | 3         | 100%          |
| is_date                 | 1         | 100%          |
| is_search               | 1         | 100%          |
+-------------------------+-----------+---------------+
For more details, please see: https://amp-local.test/wp-admin/edit.php?post_type=amp_validated_url

Testing individual URLs was also completed successfully.

Noticed Hourly Cron event for

wp cron event list
+------------------------------------+---------------------+-----------------------+---------------+
| hook                               | next_run_gmt        | next_run_relative     | recurrence    |
+------------------------------------+---------------------+-----------------------+---------------+
| wp_privacy_delete_old_export_files | 2021-11-25 14:00:03 | 51 minutes 58 seconds | 1 hour        |
| amp_validate_urls                  | 2021-11-25 14:00:04 | 51 minutes 59 seconds | 1 hour        |
| amp_monitor_css_transient_caching  | 2021-11-25 14:04:44 | 56 minutes 39 seconds | 1 day         |
| amp_validated_url_stylesheet_gc    | 2021-11-25 14:04:44 | 56 minutes 39 seconds | 1 hour        |
| wp_https_detection                 | 2021-11-25 14:49:43 | 1 hour 41 minutes     | 12 hours      |
| wpseo-reindex                      | 2021-11-25 16:33:55 | 3 hours 25 minutes    | 1 day         |
| wp_version_check                   | 2021-11-25 22:00:03 | 8 hours 51 minutes    | 12 hours      |
| wp_update_plugins                  | 2021-11-25 22:00:03 | 8 hours 51 minutes    | 12 hours      |
| wp_update_themes                   | 2021-11-25 22:00:03 | 8 hours 51 minutes    | 12 hours      |
| recovery_mode_clean_expired_keys   | 2021-11-26 10:00:03 | 20 hours 51 minutes   | 1 day         |
| wp_scheduled_delete                | 2021-11-26 10:00:22 | 20 hours 52 minutes   | 1 day         |
| delete_expired_transients          | 2021-11-26 10:00:22 | 20 hours 52 minutes   | 1 day         |
| wp_scheduled_auto_draft_delete     | 2021-11-26 10:00:25 | 20 hours 52 minutes   | 1 day         |
| wp_site_health_scheduled_check     | 2021-11-30 10:00:03 | 4 days 20 hours       | 1 week        |
| wpseo_ryte_fetch                   | 2021-11-30 16:33:56 | 5 days 3 hours        | 1 week        |
| publish_future_post                | 2030-01-01 12:00:18 | 8 years 1 month       | Non-repeating |
+------------------------------------+---------------------+-----------------------+---------------+

@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelogged Whether the issue/PR has been added to release notes. Developer Tools Groomed P0 High priority Site Scanning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants