Skip to content
This repository has been archived by the owner on Dec 14, 2021. It is now read-only.

Handle SyncTelemetryPing in DataStore #870

Open
2 tasks
eliserichards opened this issue Aug 13, 2019 · 3 comments
Open
2 tasks

Handle SyncTelemetryPing in DataStore #870

eliserichards opened this issue Aug 13, 2019 · 3 comments
Labels
blocked effort-XS Expected to take less than a day for engineering to complete. feature-sync metrics

Comments

@eliserichards
Copy link
Contributor

Requires #865

#865 updated AS and A-C to the latest versions with all required Megazord changes. The sync call in DataStore.kt now returns a SyncTelemetryPing. This should be returned in the case of a successful completion.

Todo

  • open question: in which cases will we receive the SyncTelemetryPing? Only on a successful sync?
  • send SyncTelemetryPing as value or extra in all applicable telemetry events
@eliserichards eliserichards added feature-sync metrics effort-XS Expected to take less than a day for engineering to complete. labels Aug 13, 2019
@eliserichards eliserichards added this to the 1.2.0 - Q3 milestone Aug 13, 2019
@jhugman
Copy link
Contributor

jhugman commented Sep 9, 2019

@irrationalagent

a) do we still need this? especially considering #862— TL;DR: we found the missing sync_start events!— and that we're not able to add a sync id to the sync start, only successful sync_ends.
b) do we need one per SyncTelemetryPing? i.e. uid; or the one per SyncInfo?
c) where should this go in the event ping? as a value or an extra?

@irrationalagent
Copy link
Contributor

irrationalagent commented Sep 12, 2019

Sorry for the late response on this. I'm a bit out of the loop here, so bear with me.

it looks like you have the whole sync ping payload accessible here? if that's the case its going to be a pretty gnarly bit of json. I'm reasonably certain it will be too long to fit into the extra field of an event, and I'm not sure we would want to do that anyway (it def wont fit in the value field).

So, if you found the sync start events that would be a good start towards helping us understand frequency of syncing and its success rate, and depending on a couple things that might be all we need here. a few questions about that:

  1. should there be at most one one sync start event per sync_end/sync_timeout/sync_error event
  2. conversely: for every sync start, will there always be at least one corresponding sync_end/sync_timeout/sync_error event? if not, could we know with some certainty what a missing "sync end" event would mean?
  3. would sync_end fire only on success, or would it fire on failure/error as well? e.g. would we expect a sync error event to generate both sync_end and sync_error, or just the latter?

Without a sync_id we would need to be sure that the above conditions hold in order for the sync events to be useful in analysis. What I'm getting at - we need to ensure that we (somewhat) reliably have a one-to-one mapping of events indicating sync is starting to events indicating its done (whether successfully or not). If we had a sync_id, meeting these requirements would be less important.

IF we can ensure that ^ then I think we will at least have enough data to understand high-level sync success and failure rates. Of course, we still won't know why. some of that info might be in the sync ping, e.g. engine_failure_reason. But we would need to get that out somehow and into the existing sync_error events somehow, e.g. into their extra field.

@ddurst ddurst modified the milestones: 1.2.0 - Q3 , 2019 Q4 backlog Sep 13, 2019
@jhugman
Copy link
Contributor

jhugman commented Sep 16, 2019

@irrationalagent

My read of the code is that:

  • count(sync_start) == count(sync_end) + count(sync_error). If and only if exactly-one-(sync_end OR sync_error) is produced for each sync_start, unless the app is killed before sync stops.
  • we do not have a specific sync_timeout events. We have removed this as we were generating our own timeouts, not relying on the megazord to do it.
  • the sync_error has a value of unlocalized detail message of the exception.
  • we only get a SyncTelemetryPing on a successful sync (i.e. a sync_end). Otherwise we get a null.
  • we, i.e. lockwise, could generate a UUID for each sync start, and pair it with sync_end and sync_error. However, this is platform specific work. It isn't an insignificant amount of work: the telemetry for sync_start happens someway away from the actual sync, so their relative positions would have to be rethought.

Perhaps we should re-look at this once the next release has bedded in a little? (it's just gone out)

@jhugman jhugman removed their assignment Oct 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked effort-XS Expected to take less than a day for engineering to complete. feature-sync metrics
Projects
None yet
Development

No branches or pull requests

4 participants