-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor URL validation CLI script to make parts reusable #5306
Conversation
Plugin builds for 9c627b3 are ready 🛎️!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement. I think the naming should be revised between ValidationURLProvider
and URLValidationProvider
, as these are almost guaranteed to confuse people. I think you've even confused yourself in the tests, as you've used the constructor from one on the other... ;)
There are lots of suggestions in the review, but a large part of it is just nit-picking.
* | ||
* @since 2.1 | ||
*/ | ||
final class ValidationURLProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion with the UrlValidationProvider
, I suggest renaming this to ScannableUrlProvider
.
|
||
/** @coversDefaultClass URLValidationProvider */ | ||
final class URLValidationProviderTest extends WP_UnitTestCase { | ||
use PrivateAccess, AssertContainsCompatibility; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, both of these traits are not needed, as you're also not using assertStringContains()
.
use PrivateAccess, AssertContainsCompatibility; |
@@ -8,11 +8,14 @@ | |||
use AmpProject\AmpWP\Option; | |||
use AmpProject\AmpWP\Tests\Helpers\AssertContainsCompatibility; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use AmpProject\AmpWP\Tests\Helpers\AssertContainsCompatibility; |
This is not needed.
You need to remove the use
statement for this as well below here, I can't reach it with the PR review tools.
Co-authored-by: Alain Schlesser <[email protected]>
Co-authored-by: Alain Schlesser <[email protected]>
…ampproject/amp-wp into feature/refactor-url-validation
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
Co-authored-by: Alain Schlesser <[email protected]>
@pierlon @westonruter @schlessera I've addressed all feedback and gotten the new GHA workflow to pass. This is ready for another round of review. My not-quite-finished PR adding the validation cron events (#5515) depends on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some misc. changes to be made and then all should be well.
*/ | ||
public $force_crawl_urls = false; | ||
private $validation_url_provider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if $validation_url_provider
and $validation_provider
are misnomers.
I would expect $validation_url_provider
to be named something like $scannable_url_provider
since it is an instance of ScannableURLProvider
. Likewise, I would expect $validation_provider
to be named something like $url_validation_provider
since it is an instance of URLValidationProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. This was a product of shifting class names. Updated in 4c0238d
Thanks, @pierlon. Updated and re-requesting review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
…efactor-url-validation * 'develop' of github.com:ampproject/amp-wp: (33 commits) Ensure global post is set for test_decorate_shortcode_and_filter_source Extract to plugins dir when unzipping PWA plugin Add tests for new markup Use fixed layout for amp-story-360 example Undo intrinsic layout for video since WP core uses responsive Use intrinsic layout of video Update amphtml spec to 2011140244000 Update stable tag and tested up to Bump stable tag to 2.0.6; update docs Bump autoprefixer from 10.0.1 to 10.0.2 Bump babel-loader from 8.1.0 to 8.2.1 Setup composer via the shivammathur/setup-php action Bump core-js from 3.6.5 to 3.7.0 Bump copy-webpack-plugin from 6.3.0 to 6.3.1 Remove deprecated option "--no-suggest" Update composer.lock Bump eslint-plugin-jest from 24.1.0 to 24.1.3 Bump eslint from 7.12.1 to 7.13.0 Bump @wordpress/block-editor from 5.1.2 to 5.1.3 Bump mini-css-extract-plugin from 1.2.1 to 1.3.1 ...
$result = $url_validation_provider->with_lock( | ||
function () { | ||
$this->validate_urls(); | ||
} | ||
); | ||
|
||
if ( is_wp_error( $result ) ) { | ||
WP_CLI::error( 'The site cannot be crawled at this time because validation is running in another process.' ); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why is a lock necessary? What problems will occur if validation is happening in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the validate_urls
method ultimately leads to AMP_Validation_Manager::validate_url
, which is an expensive function using wp_remote_get
. In theory, without the lock, if someone runs the CLI command (or even multiple instances of the command) there could be multiple processes running a string of wp_remote_get
calls at the same time. I suspect some hosting providers wouldn't like that. Do you think the lock is overcautious? Removing it would certainly simplify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to note here, the lock is looking ahead to having the cron task in place, as well as any URL validation taking place within the plugins settings screen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm still not sure on the need for locking but we can revisit that later. It may be preferable for any new invocation of the validation provider to abort any existing validation tasks. Then you wouldn't have the experience of validation being performed in cron, and then being blocked when attempting to initiate validation from the admin screen.
Nevertheless, validation happening concurrently generally should not have negative consequences other than additional server load. Even if the same URL is being validated at the exact same time, what would go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be following up shortly with the cron PR that uses the lock more. That will be a good time to revisit. My fundamental assumption is that hosting platforms like VIP might see multiple background processes running wp_remote_get
and not liking that too much, but I might be overcomplicating things because I'm sure this is not the only plugin that does things like that.
|
||
$number_urls_to_crawl = $this->count_urls_to_validate(); | ||
$number_urls_to_crawl = count( $scannable_url_provider->get_urls() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the obtained URLs be stored in a variable and check the count()
? The URLs could then be passed down to the validate_urls()
method. Otherwise, as it stands right now the get_urls()
method is being called twice, with the same data (or not) being queried twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in de33111
* | ||
* @param int $timeout Time in seconds. Default 300 seconds. | ||
*/ | ||
return apply_filters( 'amp_validation_lock_timeout', 5 * MINUTE_IN_SECONDS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for the filter? Should this be @internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I don't actually see an immediate use case for this filter, so I removed it in favor of a class constant in 13917b3
* | ||
* @param string $url The URL to validate. | ||
* @param string $type The type of template, post, or taxonomy. | ||
* @param string $flag Flag determining whether the URL should be revalidated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A flag makes me think it is going to be a bitmask. Maybe $mode
would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, passing an arg for FLAG_FORCE_REVALIDATE
vs FLAG_NO_REVALIDATE
behavior may not be the most ergonomic. What if there were other methods for the other behaviors that wrap this one? If the flag is empty, then what is the expected behavior? Validate if it doesn't already exist? I think I see that now, but I had to look at it awhile to understand how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised this in d87ad9d. My solution for now is to simply use a third argument for $force_revalidate
. The reason for this is I found FLAG_NO_REVALIDATE
is not being used on this branch at all. The no-revalidate use case came up in one of the subsequent features that depend on this branch, and if needed I will revisit when I get back to that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great.
(Sorry for the delay in reviewing.) |
Co-authored-by: Weston Ruter <[email protected]>
Co-authored-by: Weston Ruter <[email protected]>
…ampproject/amp-wp into feature/refactor-url-validation
Summary
This refactors the URL validation CLI script. The functions that generate URLs to validate are in the new class
ScannableURLProvider
, and the functions that actually validate URLs are in the new classURLValidationProvider
.This fixes #5307 and is a precursor to several open issues, including anything that involves validating URLs without the loopback request -- e.g. via cron tasks or in client-side requests via REST. In the closed PR #5228, where most of this code comes from, I attempted to refactor the CLI script and create REST wrappers and create a Cron service. In hindsight, that was doing too much, and I think it would be better to get this change reviewed and merged before doing any work that depends on it.
While most of the code in this PR is from #5228, it is in turn mostly from the CLI script. Even the PHPUnit tests were mostly copied into new classes. For that reason, the new classes are still architected with the CLI script in mind, so I believe they may evolve more as we work on the features that depend on them.
Testing
As mentioned above, most of the unit tests here are slightly modified versions of the tests covering the CLI scripts, so their passing is a big indicator that nothing has broken. For QA steps, see #5307.
Checklist