-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track and rebuild sitemaps for deleted posts #135
base: main
Are you sure you want to change the base?
Conversation
Note: This needs unit tests, but I'm afk this week so I wanted to give the team a chance to review this without it being blocked until I wrote tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments
$sitemap_last_run = get_option( 'msm_sitemap_update_last_run', false ); | ||
|
||
$date = date( 'Y-m-d H:i:s', ( current_time( 'timestamp', 1 ) - 3600 ) ); // posts changed within the last hour | ||
$date = date( 'Y-m-d H:i:s', ( $start_time - 3600 ) ); // posts changed within the last hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a comment to why we're adding the start time. Either inline or with the doc block, whatever makes most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 414ce0e
|
||
if ( $sitemap_last_run ) { | ||
$date = date( 'Y-m-d H:i:s', $sitemap_last_run ); | ||
} | ||
|
||
$post_types_in = self::get_supported_post_types_in(); | ||
|
||
$modified_posts = $wpdb->get_results( $wpdb->prepare( "SELECT ID, post_date FROM $wpdb->posts WHERE post_type IN ( {$post_types_in} ) AND post_modified_gmt >= %s LIMIT 1000", $date ) ); | ||
$modified_posts = $wpdb->get_results( $wpdb->prepare( "SELECT ID, post_date FROM $wpdb->posts WHERE post_type IN ( {$post_types_in} ) AND post_modified_gmt >= %s LIMIT 1000", $date ), ARRAY_A ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would changing this to ARRAY_A cause problems with
Line 602 in d4b04cc
$dates[] = date( 'Y-m-d', strtotime( $post->post_date ) ); |
@@ -599,7 +626,7 @@ public static function get_last_modified_posts() { | |||
public static function get_post_dates( $posts ) { | |||
$dates = array(); | |||
foreach ( $posts as $post ) { | |||
$dates[] = date( 'Y-m-d', strtotime( $post->post_date ) ); | |||
$dates[] = date( 'Y-m-d', strtotime( $post['post_date'] ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol. I see that's fixed here. I'm still unclear as to why we're changing the behaviour :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When storing the deleted posts for future processing ( https://github.com/Automattic/msm-sitemap/pull/135/files#diff-67d63994628dd99d09072eee489c7487R585 ), we're using an array for ease of processing. It all in all matches the logic already used, but switching the modified post query to output an array means we're able to match the data types.
Since the posts are deleted, we can't pull a WP_Post object without mocking up one, but we can change the query return to use arrays. In effect, I'm looking and using the least common denominator.
Doing this to me seems like a more direct route of combining stored deleted post identifiers and a query return. While we could also just store all modified posts, that introduces more possibilities of race conditions and creates a second, less reliable source of truth on what posts have been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, perhaps we should mention it in the commit message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll be sure to include that in the merge commit.
Just thinking out loud about some edge cases here @kraftbj ... what happens if there are a very large number of posts deleted? Like perhaps a site is trimming its archive or something and deletes 10s of 1000s of posts? I'm asking in the sense of "would anything bad happen" but also in the sense of would the plugin work under those conditions and regenerate the right sitemaps? Otherwise this looks like a good approach to me. |
515e321
to
8df6975
Compare
When a post is deleted, the post ID and date is added to an option which is used to augment the existing process of seeking sitemaps to regenerate based on a post's last modified time. In this instance, we are opting for keeping the post_modified process since it requires less processing as posts are moved—no added filters or functions during that action.
Avoids potential situations where post actions occur exactly during processing being left out of future operations.
8df6975
to
c4e97f4
Compare
Previously, there was an array item for each post deleted. Since the purpose of tracking deleted posts is to know which daily sitemaps to regenerate, we only need to keep one item per day. On sites with a large number of posts, the previous behavior could result in an incredibly large option being saved with excessive number of db writes.
@mattoperry That's a good point. I reworked it a little in c3fccc1 . Previously if we had 10,000 posts deleted covering 30 days, we would have a 10k item array to inform us of 30 days we needed to regenerate. The revised approach only keeps one array item per date, keeping the option down to 30 under this scenario. The solution we have right now is 5.5+. What is VIP's policy on doing something like a Composer dependency or just including a library? https://github.com/ramsey/array_column provides a backfill and is what WP-CLI uses to polyfill |
Opted to not polyfill and have a suboptimal experience pre-5.5. MSM should be declared "5.5+ recommended", but will still function on 5.2+ |
f2bd76b
to
c3fccc1
Compare
Fixes #112
When a post is deleted, the post ID and date is added to an option which is used to augment the existing process of seeking sitemaps to regenerate based on a post's last modified time.
In this instance, we are opting for keeping the post_modified process since it requires less processing as posts are moved—no added filters or functions during that action.
Additionally, this PR adds a specified start time in a couple of places to remove the possibility that a post is modified in the short time between when processing started and when processing ended on rebuilding for modified posts.