Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync: Full Sync: Send immediately. #13963

Merged
merged 126 commits into from
Nov 29, 2019
Merged

Sync: Full Sync: Send immediately. #13963

merged 126 commits into from
Nov 29, 2019

Conversation

gravityrail
Copy link
Contributor

@gravityrail gravityrail commented Nov 5, 2019

Full Sync has until this PR relied on the "full_sync" "queue" to organise and keep track of which objects have been sent. The queue mechanism has its benefits but it is also complex and prone for errors.
This PR intends to introduce a simpler way of "full syncing" the content of a site.

Changes proposed in this Pull Request:

New Full Sync immediately mode.

Testing instructions:

  • turn on the new queueless sync by adding a filter to your site like this:
	foreach ( $modules as $key => $module ) {
		if ( in_array( $module, [ 'Automattic\\Jetpack\\Sync\\Modules\\Full_Sync', 'Jetpack_Sync_Modules_Full_Sync' ] ) ) {
			$modules[ $key ] = 'Automattic\\Jetpack\\Sync\\Modules\\Full_Sync_Immediately';
		}
	}
	return $modules;
}

add_filter( 'jetpack_sync_modules', 'jetpack_full_sync_immediately_on' );
  • play with Full Sync, if possible Full Sync against your wpcom sandbox and tail for errors on both sides.

@gravityrail gravityrail requested a review from a team November 5, 2019 17:38
@gravityrail gravityrail requested a review from a team as a code owner November 5, 2019 17:38
@github-actions github-actions bot added the [Status] Needs Package Release This PR made changes to a package. Let's update that package now. label Nov 5, 2019
@jetpackbot
Copy link

jetpackbot commented Nov 5, 2019

Warnings
⚠️

pre-commit hook was skipped for one or more commits

⚠️ "Proposed changelog entry" is missing for this PR. Please include any meaningful changes

This is an automated check which relies on PULL_REQUEST_TEMPLATE. We encourage you to follow that template as it helps Jetpack maintainers do their job. If you think 'Testing instructions' or 'Proposed changelog entry' are not needed for your PR - please explain why you think so. Thanks for cooperation 🤖

Generated by 🚫 dangerJS against 3461fa2

@lezama lezama force-pushed the try/full-sync-send-immediately branch from bbd0630 to 2d3811f Compare November 6, 2019 17:00
@lezama lezama force-pushed the try/full-sync-send-immediately branch from 2b8d913 to 49ee683 Compare November 7, 2019 14:46
@lezama lezama changed the title full sync send immediately Sync: full sync send immediately Nov 7, 2019
@kraftbj kraftbj force-pushed the try/full-sync-send-immediately branch from bbcd97f to 7f537d5 Compare November 7, 2019 15:37
@lezama lezama requested a review from mdbitz November 7, 2019 19:56
@lezama lezama changed the title Sync: full sync send immediately Sync: Full Sync: Send immediately. Nov 7, 2019
packages/sync/src/modules/Module.php Outdated Show resolved Hide resolved
packages/sync/src/modules/Full_Sync.php Outdated Show resolved Hide resolved
@mdbitz
Copy link
Contributor

mdbitz commented Nov 7, 2019

Miguel / Dan, I'll run through the code tomorrow AM. Conceptionally we are bypassing the queue so on shutdown or a request to pull content we are using the current state to build the chunks and send that data package. Do we maintain a checkout/checkin approach to verify the data sent is processed as well or are there any checks built in to make sure that data is being processed.

json-endpoints/jetpack/json-api-jetpack-endpoints.php Outdated Show resolved Hide resolved
class.jetpack-cli.php Outdated Show resolved Hide resolved
packages/sync/src/Listener.php Outdated Show resolved Hide resolved
packages/sync/src/Settings.php Outdated Show resolved Hide resolved
packages/sync/src/modules/Callables.php Outdated Show resolved Hide resolved
packages/sync/src/modules/Full_Sync.php Outdated Show resolved Hide resolved
packages/sync/src/modules/Full_Sync.php Outdated Show resolved Hide resolved
@gravityrail
Copy link
Contributor Author

I have a concern that this patch adds a lot of complexity. For example, we have a lot of duplication between send_ids... and enqueue_ids...

I think this is really valuable, but maybe could use some refactoring before we merge. What do you think? Or maybe we say that if this works, we're going to switch completely to the new technique and so refactoring would be a waste of time and we'll just end up deleting the old code later.

@lezama
Copy link
Contributor

lezama commented Nov 8, 2019

Or maybe we say that if this works, we're going to switch completely to the new technique and so refactoring would be a waste of time and we'll just end up deleting the old code later.

☝️

@lezama
Copy link
Contributor

lezama commented Nov 8, 2019

@gravityrail what do you think about merging to master while making the new way the default? and ask friends for testing?

@lezama
Copy link
Contributor

lezama commented Nov 8, 2019

Miguel / Dan, I'll run through the code tomorrow AM. Conceptionally we are bypassing the queue so on shutdown or a request to pull content we are using the current state to build the chunks and send that data package.

Correct. 😁

Do we maintain a checkout/checkin approach to verify the data sent is processed as well or are there any checks built in to make sure that data is being processed.

In this new mode we are sending one chunk/item at a time, if it fails we don't update the status, so we retry in the next attempt. It should give the same results as the "queue mode", but I think it gives us room to go more granular and maybe retry per object instead of per chunk.

@lezama
Copy link
Contributor

lezama commented Nov 8, 2019

TODO: update the checkout endpoint.

@gravityrail
Copy link
Contributor Author

@gravityrail what do you think about merging to master while making the new way the default? and ask friends for testing?

I would like to see a couple of really big sites sync using the new setting before I'm willing to make it the default in master. We are not completely certain of the approach until that happens.

@gravityrail
Copy link
Contributor Author

@lezama can you please add a parameter to the sync API request so that we know when we receive it whether this is a queued sync request or an immediate sync request?

We may end up changing the behaviour on the WPCOM side when it's an immediate sync request, e.g. putting the request into a queue to be processed in the background. In this way, we can provide the benefits of both async and queue-less sending.

An unrelated concern: What do you think about 100 posts as the upper limit? Might we face some memory issues doing that, with sites that have lots of large posts? How can we detect that we have hit a memory limit and perhaps back off? Should the number of rows we're sending be a setting the DB?

@lezama
Copy link
Contributor

lezama commented Nov 8, 2019

Should the number of rows we're sending be a setting the DB?

I think it should be a setting per module 👍

@lezama
Copy link
Contributor

lezama commented Nov 10, 2019

@lezama can you please add a parameter to the sync API request so that we know when we receive it whether this is a queued sync request or an immediate sync request

they already have queue=immediate-send 👍

@lezama lezama force-pushed the try/full-sync-send-immediately branch from 3dd4150 to bcaac91 Compare November 11, 2019 18:10
@mdbitz
Copy link
Contributor

mdbitz commented Nov 13, 2019

@lezama how do you feel about stability of the above?

Is it at a place where I can apply this PR to the latest JP release on VIP and do some Full Sync tests? If so i'll work to get testing going so we can get some data.

@lezama lezama force-pushed the try/full-sync-send-immediately branch from 69933ce to d3ce760 Compare November 28, 2019 16:01
@zinigor
Copy link
Member

zinigor commented Nov 29, 2019

Testing right now by updating my site, sandboxing it to a wpcom sandbox that's up to date with current SVN, installing the snippet and scheduling a Full Sync using the JP debugger. I'm getting these errors on the sandbox side:

[29-Nov-2019 10:53:26 UTC] WordPress database error Duplicate entry '3027-13' for key 'PRIMARY' for query INSERT INTO `wp_102812540_term_relationships` (`object_id`, `term_taxonomy_id`) VALUES (3027, 13) made by wp_set_object_terms from SANDBOXURL jetpack.wordpress.com/xmlrpc.php?sync=1&codec=deflate-json-array&timestamp=1575024706.176&queue=immediate-send&home=https%3A%2F%2Flousy.site&siteurl=https%3A%2F%2Flousy.site&cd=0.0000&pd=0.0281&idc=1&timeout=30&for=jetpack&wpcom_blog_id=102812540

What should I try? Is it my site, or is it the patch?

Edit: thanks to @david-binda 's help we figured out that the problem was caused by several jobs running simultaneously.

@lezama
Copy link
Contributor

lezama commented Nov 29, 2019

@zinigor seems unrelated with this PR, I have also seen those randomly, can you reproduce?

@lezama lezama merged commit 01b62d9 into master Nov 29, 2019
@lezama lezama deleted the try/full-sync-send-immediately branch November 29, 2019 15:42
@matticbot matticbot added [Status] Needs Changelog and removed [Status] Needs Review To request a review from fellow Jetpack developers. Label will be renamed soon. labels Nov 29, 2019
@jeherve jeherve removed the [Status] Needs Package Release This PR made changes to a package. Let's update that package now. label Dec 2, 2019
@jeherve
Copy link
Member

jeherve commented Dec 2, 2019

Cherry-picked to branch-8.sync in b150939

jeherve added a commit that referenced this pull request Dec 13, 2019
zinigor added a commit that referenced this pull request Dec 30, 2019
* Changelog: 8.1 additions

* Changelog: add #13858

* Changelog: add #13963

* Changelog: add #14174

* Changelog: add #14178

* Changelog: add #14175

* Changelog: add #14192

* Changelog: add #14196

* Changelog: add #14182

* Changelog: add #14218

* Changelog: add #14214

* Changelog: add #13757

* Changelog: add #14190

* Changelog: add #14131

* Changelog: add #14101

* Changelog: add #14203

* Changelog: add #14211

* Changelog: add #14224

* Changelog: add #14230

* Changelog: add #14241

* Changelog: add #14249

* Changelog: add #14264

* Changelog: add #14263

* Changelog: add #14256

* Changelog: add #10189

* Changelog: add #14240

* Changelog: add #14239

Also added some new entries to the testing file.

Co-authored-by: Igor Zinovyev <[email protected]>
zinigor added a commit that referenced this pull request Dec 30, 2019
* Changelog: 8.1 additions

* Changelog: add #13858

* Changelog: add #13963

* Changelog: add #14174

* Changelog: add #14178

* Changelog: add #14175

* Changelog: add #14192

* Changelog: add #14196

* Changelog: add #14182

* Changelog: add #14218

* Changelog: add #14214

* Changelog: add #13757

* Changelog: add #14190

* Changelog: add #14131

* Changelog: add #14101

* Changelog: add #14203

* Changelog: add #14211

* Changelog: add #14224

* Changelog: add #14230

* Changelog: add #14241

* Changelog: add #14249

* Changelog: add #14264

* Changelog: add #14263

* Changelog: add #14256

* Changelog: add #10189

* Changelog: add #14240

* Changelog: add #14239

Also added some new entries to the testing file.

Co-authored-by: Igor Zinovyev <[email protected]>
roccotripaldi pushed a commit that referenced this pull request Jan 15, 2020
The new full sync pattern introduced in #13963 should now be the default way to perform a full sync. We will still support the original full sync by allow sites to use a filter.
roccotripaldi added a commit that referenced this pull request Jan 16, 2020
* Sync Package: use Full_Sync_Immediately by default

The new full sync pattern introduced in #13963 should now be the default way to perform a full sync. We will still support the original full sync by allow sites to use a filter.

* [not verified] Sync Unit Tests

Use the legacy full sync module for default tests.
@jeherve jeherve removed the [Status] Needs Testing We need to add this change to the testing call for this month's release label Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.