Add CE to Eng handover docs #1521

dadlerj · 2020-09-04T21:18:46Z

Given the recent discussion, I wanted to get this written down. This is a WIP document, and will be living for some time, especially as more clarity comes to the roles and responsibilities of the CE team. However, I thought it would be beneficial to document the "current state", even if it's not long-lived.

I only added one "new" thing to the process here: a priority tagging system that CE can use to communicate to Eng. Curious to hear your thoughts.

CC @nicksnyder @christinelovett @tistru for visibility

nicksnyder · 2020-09-04T21:26:44Z

handbook/ce/ce_to_eng_handover.md

+
+**Engineering should only feel a responsibility to get involved if tagged in by CE.**
+
+**However, once someone is "assigned" to a ticket (whether formally, or they informally take over the conversation), it is up to them to either (1) see the issue through to resolution or (2) assign a new owner.**


Can we change the wording so it is clear that if someone "informally" takes over the conversation, then they should be formally assigned the ticket? I want to avoid grey areas on who is responsible.

nicksnyder · 2020-09-04T21:28:19Z

handbook/ce/ce_to_eng_handover.md

+1. If the issue is clearly a bug or a feature request (rather than a question that can be clarified or answered on the spot), [the CE will file or add on to a GitHub issue](customer_issues.md).
+1. The CE will add a prioritization label to the issue, from `user/p0` to `user/p4`, based on a combination of (1) the severity of the issue, and (2) the prioritization of the reporting company. These labels mean the following:
+  1. `user/p0`: The issue results in the company's Sourcegraph instance being unusable and the company is a [Tier 1 prospect or customer](../sales/index.md#segmentation).
+  1. `user/p1`: The issue results in partial loss of functionality or serious disruption and the company is a [Tier 1 or Tier 2 prospect or customer](../sales/index.md#segmentation).


So a tier 2 customer (and below) can never cause a p0?

This is just a quick and dirty set of definitions though, and Julia needs to weigh in. The definition of what the eng team is supposed to do when a p0 lands is also not clear here, so Gonza should weigh in on that side! This is just a first attempt at providing guidance to new CE team members based on how I internally think about communicating prioritization currently.

I believe this list is an extension or update to the one described here https://github.com/sourcegraph/about/blob/main/handbook/ce/support.md#slas.

Here is a draft idea on how we could handle this, I would PR this to incident document:

p0: All hands on deck, notify on #dev-ops. The incident is the highest priority for all engineering teams and should drop any other tasks to work on the incident as required by the incident owner (not all at once)

p1: An engineer should prioritize this over all other tasks and transfer to other teams as required.

p2: We will resolve in a best-efforts basis while we continue to work on our planned release.

p3: We should consider prioritizing this over other planned work, or schedule it for our next iteration.

p4: We will evaluate this along with our other features to be planned in a future release.

I would also suggest extending incident response and including some ideas from Google and PagerDuty.

Particularly the different roles (PagerDuty, Google) for p0 incidents and the idea I mentioned about having a /on-call teamX which I found perfectly described here.

The indecent response page by PagerDuty is a great resource for general ideas an guidelines.

This is a great and necessary convo, and the goal of this initial PR is to document how we do this today. So for now I support @dadlerj 's documented process as a start, since this is how he's been operating. A clear next step is to iterate on this based on @pecigonzalo 's suggestions.

Thanks Julia! Agreed with that approach.

To be transparent, I'm not directly using this methodology yet, but when I started to document my process, this is roughly how I think about my own method for communicating prioritization of issues. This just adds an official tag onto it (versus my handling in a one-off way). I'll merge this now and get feedback from Tion and Christine once it's live.

@pecigonzalo thanks for the feedback. A few notes:

I believe this list is an extension or update to the one described here https://github.com/sourcegraph/about/blob/main/handbook/ce/support.md#slas.

It's actually quite different... That page reflects the promises we make to customers—i.e., the minimum service level required per our contract—while this list reflects our internal prioritization. There clearly is an intimate connection between the two, but they're not quite the same in terms of what they specify and how we want to describe them (and never quite will be).

Here is a draft idea on how we could handle this, I would PR this to incident document:

This is a great conversation—"what do we do about an issue that was described as a given level of seriousness by CE"—but it is slightly different from what this doc is adding. I'd like to start with our definition of how serious an issue is, and you (and Julia and others of course) can nail down the translation of this into the "so what".

It's actually quite different... That page reflects the promises we make to customers—i.e., the minimum service level required per our contract—while this list reflects our internal prioritization. There clearly is an intimate connection between the two, but they're not quite the same in terms of what they specify and how we want to describe them (and never quite will be).

I disagree, they are not that different given the agreed response and resolution times are directly related to the communicated prioritization as ce/pX and how we respond to that priority internally. The intended audience is different, but even the description wording on both is quite similar, only more detailed on this document as we have more levels.
We could provide this by having a single matrix, a linked relationship between the two or just the same name across both pages but holding different information to match each intended audience.

pecigonzalo

Thanks for taking this initial draft Dan. I will start working on the incident response updates and backlink here from the PR.

pecigonzalo · 2020-09-07T14:12:33Z

handbook/ce/ce_to_eng_handover.md

+Exceptions to the principles above:
+
+- This does not apply to our public GitHub issue tracker; instead, it only applies to [official support channels](support.md). Issues filed in GitHub are the responsibility of Engineering and/or Product.
+- Certain customers pay for dedicated support from a member of the Engineering team. Responding to issues filed by these customers is a shared responsibility for the assigned Engineer and CE (whoever sees it first).


We should add a link to the document that outlines this customers.

pecigonzalo · 2020-09-07T14:58:45Z

handbook/ce/ce_to_eng_handover.md

+1. If the issue is clearly a bug or a feature request (rather than a question that can be clarified or answered on the spot), [the CE will file or add on to a GitHub issue](customer_issues.md).
+1. The CE will add a prioritization label to the issue, from `user/p0` to `user/p4`, based on a combination of (1) the severity of the issue, and (2) the prioritization of the reporting company. These labels mean the following:
+  1. `user/p0`: The issue results in the company's Sourcegraph instance being unusable and the company is a [Tier 1 prospect or customer](../sales/index.md#segmentation).
+  1. `user/p1`: The issue results in partial loss of functionality or serious disruption and the company is a [Tier 1 or Tier 2 prospect or customer](../sales/index.md#segmentation).


I believe this list is an extension or update to the one described here https://github.com/sourcegraph/about/blob/main/handbook/ce/support.md#slas.

Here is a draft idea on how we could handle this, I would PR this to incident document:

p0: All hands on deck, notify on #dev-ops. The incident is the highest priority for all engineering teams and should drop any other tasks to work on the incident as required by the incident owner (not all at once)

p1: An engineer should prioritize this over all other tasks and transfer to other teams as required.

p2: We will resolve in a best-efforts basis while we continue to work on our planned release.

p3: We should consider prioritizing this over other planned work, or schedule it for our next iteration.

p4: We will evaluate this along with our other features to be planned in a future release.

I would also suggest extending incident response and including some ideas from Google and PagerDuty.

Particularly the different roles (PagerDuty, Google) for p0 incidents and the idea I mentioned about having a /on-call teamX which I found perfectly described here.

The indecent response page by PagerDuty is a great resource for general ideas an guidelines.

juliasourceress · 2020-09-29T22:03:27Z

handbook/ce/ce_to_eng_handover.md

+
+- This does not apply to our public GitHub issue tracker; instead, it only applies to [official support channels](support.md). Issues filed in GitHub are the responsibility of Engineering and/or Product.
+- Certain customers pay for dedicated support from a member of the Engineering team. Responding to issues filed by these customers is a shared responsibility for the assigned Engineer and CE (whoever sees it first).
+- If an engineer sees a new question or issue come in from a company that they've already been introduced to, or if the question is in their direct area of expertise, they are encouraged to jump in directly.


An engineer should always check in with a CE before replying if it's not a conversation they're already having (but a new thread or ticket). If it's an existing thread the engineer should be the one to triage and either continue responding or let the CE owner know that despite it being the same ticket or thread, it's a new request, and they'd like to hand it back over to the CE for ownership.

juliasourceress · 2020-09-29T22:10:30Z

handbook/ce/ce_to_eng_handover.md

+
+## Engineering responsibilities
+
+1. If an Engineer agrees to take on an issue or a ticket, they must be willing to follow-through on the problem until it is addressed. If they are not willing or able to do so, they must notify the CE as soon as possible so someone else can be assigned.


Or alternatively they can find someone else to tap in, and let the CE know.

tistru · 2020-09-30T18:12:24Z

Hi @dadlerj . Great work putting this together and I believe it will really helpful. I didn't have any feedback on the current version of the document. I understand we will iterate on it going forward but I am perfectly fine to operate under these guidelines going forward :)

dadlerj requested review from pecigonzalo and juliasourceress September 4, 2020 21:18

dadlerj requested a review from sqs as a code owner September 4, 2020 21:18

nicksnyder reviewed Sep 4, 2020

View reviewed changes

dadlerj mentioned this pull request Sep 4, 2020

Update support docs #1522

Merged

sqs changed the base branch from master to main September 5, 2020 04:37

pecigonzalo reviewed Sep 7, 2020

View reviewed changes

pecigonzalo mentioned this pull request Sep 14, 2020

Distribution 3.20 Tracking issue sourcegraph/sourcegraph-public-snapshot#12836

Closed

49 tasks

juliasourceress reviewed Sep 29, 2020

View reviewed changes

juliasourceress approved these changes Sep 29, 2020

View reviewed changes

dadlerj added 4 commits September 29, 2020 22:13

Add CE to Eng handover docs

e31467f

Add to index

721624e

Update

1f6b1e9

Tweaks for comments

5555cfa

dadlerj force-pushed the ce-eng branch from 33cea9c to 5555cfa Compare September 30, 2020 05:29

Tweak

1f1f4f8

dadlerj merged commit 8605d07 into main Sep 30, 2020

dadlerj deleted the ce-eng branch September 30, 2020 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CE to Eng handover docs #1521

Add CE to Eng handover docs #1521

dadlerj commented Sep 4, 2020

nicksnyder Sep 4, 2020

nicksnyder Sep 4, 2020 •

edited

Loading

dadlerj Sep 4, 2020

dadlerj Sep 4, 2020

pecigonzalo Sep 7, 2020

juliasourceress Sep 29, 2020

dadlerj Sep 30, 2020

pecigonzalo Sep 30, 2020

pecigonzalo left a comment

pecigonzalo Sep 7, 2020

pecigonzalo Sep 7, 2020

juliasourceress Sep 29, 2020

juliasourceress Sep 29, 2020

tistru commented Sep 30, 2020


		Engineering should only feel a responsibility to get involved if tagged in by CE.

		However, once someone is "assigned" to a ticket (whether formally, or they informally take over the conversation), it is up to them to either (1) see the issue through to resolution or (2) assign a new owner.


		## Engineering responsibilities

		1. If an Engineer agrees to take on an issue or a ticket, they must be willing to follow-through on the problem until it is addressed. If they are not willing or able to do so, they must notify the CE as soon as possible so someone else can be assigned.

Add CE to Eng handover docs #1521

Add CE to Eng handover docs #1521

Conversation

dadlerj commented Sep 4, 2020

Choose a reason for hiding this comment

nicksnyder Sep 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pecigonzalo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tistru commented Sep 30, 2020

nicksnyder Sep 4, 2020 •

edited

Loading