-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Distribution 3.21 Tracking issue #13675
Comments
Last week Next week |
This week This week I spent most of my time providing support internally and to customers, much more than usual. I did not make much headway against my planned work, but did add issues extensively for everything that came up to this milestone. For customers, I provided extensive resource allocation advice to two major customers, and followed up extensively on ~7-8 more medium-sized customer issues before ultimately passing them off to other individuals or teams in order to reduce the number assigned to me. Internally, I created a dev/testing managed instance and shared knowledge of them with the rest of the team in the form of updated docs, a recorded screencast, and improved tooling. I investigated ops issues with sourcegraph.com and multiple dev deployments with the team. 1:1s I had ran much longer than usual, leading to longer-form ongoing conversations. I also wrote a high-level progress summary on the Dhall work. Next week: I am hoping to be more heads-down and make substantial headway against my planned work, but acknowledge I have many more extensive conversations ahead of me which will be time consuming. Focus is key. |
Last week This was an extra short week for me because I took one of the mental health day things. I got k8s.sgdev.org running smoothly, and helped a bit with migrating campaigns over from the old deployment. During this I found that the deploy-sourcegraph overlay for namespaces wasn't set up for cAdvisor, so I made a PR to add one and try and improve the docs around that a bit. Also found and fixed a bug in prom-wrapper that was causing custom alerts usernames to not be set correctly. This week I'm a little behind on getting started with 3.21 stuff so I'll be spending extra time this week to make up for that. I'll also ping #dev-chat to ask for objections about spinning down the old k8s.sgdev.org and go ahead and do that. |
Last week This week |
This week Deployed demo.sourcegraph.com - last step to this is awaiting #ce followup, and made some docs updates for managed instances while at it. Opened up a couple of PRs related to 1-day releases and reducing the steps required there. Discussed the future of Cloud deployment in this thread and RFC 239. Next week Find out who to ping for review for release-tool PRs (would still like one for https://github.com/sourcegraph/sourcegraph/pull/14240) and use that to start working out the rest of the tasks I've picked up for the 1-day releases project. Given the frequency of requests for clarification regarding Cloud deployments, would also like to help @daxmc99 if possible with polishing up RFC 239. |
Last week e2e now running in a non-blocking capacity on main which I hope is now jsut a case of ironing out the last few bugs with some help from web ( I am confident in the infra and base image set up now). Helping out with a security scare, and the rest of my time was spent helping out on a big customer issue. Also a quick quality of life PR to manage aws service accounts with terraform. A bit of other troubleshooting here and there. Next week Finish e2e with the help of the web team and I am going to sync with uwe around regression testing and see how much different they are, and what effort is required to get that into a pipeline as well. I predict some significant time spent helping on customer issues too. |
This week Was sick from Sat <-> Thu. On Friday I spent 90% of my time catching up on things, and did other minor work like adjusting 1password permissions for managed instances, helping to debug one customer issue, and investigating critical alerts at https://app.hubspot.com/contacts/2762526/company/407948923/ Next week Hoping to get to what I did not this week, i.e. heads-down on my planned work with >=50% of my time. |
this week one quality of life issue (https://github.com/sourcegraph/sourcegraph/issues/13191) done, one dhall issue almost done (https://github.com/sourcegraph/sourcegraph/issues/14133), pitched in on token rotation and had debug sessions for customer issue next week all the stars will align and i will work on dhall code |
this week:
next week:
|
Last week I mostly worked on the GCP Split project, deleted BigCluster and cleanup disks, deleted Megakube and moved Tooling resources to the Dogfood cluster as they are used there (Phabricator, GHE, Bitbucket, Gitolite). This including porting a bunch infrastructure to Terraform. This week Finish the Tooling cluster/resources move cleanup and update any relevant documentation. I need to switch back to updating our long-term goals, integrating the roadmap provided by Stephen into our goals and finishing the Distribution growth PR. |
This week I played catch-up on PRs, reviews, etc. after being out sick last week. I followed-up on minor tasks, like setting up demo.sourcegraph.com with Robert and restructuring our 1password vaults. I had lots of 1:1 / career growth discussions, etc. I then began to hammer out my actual planned work, removing non-OSS syntax highlighting languages and creating a super extensive/tedious license report on syntect_server and dealing with some update pains/segfaults there. To finish off my week, I took a deep dive into the QA (formerly "e2e regression") test suite and pulled in others to help address 3 release blockers I identified in the process. Next week We are seeing lots of QA test suite failures, some of which look like real release-blocking regressions. I will be isolating those, filing issues, and pulling in more people to fix them. At the same time, I will be focused on 3.22 planning and working with Dave and Uwe to improve QA test suite reliability. |
Last week Some small contributions to the CNCF repopage project: blackbox, CSS change to the logo. Landed improvements to changelog automation, deploy-sourcegraph release automation, and general release steps reductions and dry-runs for the release tool. Added support for regex silencing in This week Main thing I have in mind this week is keep an eye on the release process and see if any of the changes needs clarification/improvement |
last week
this week
|
@uwedeportivo could you add which components? |
last week
this week Hopefully customer issues will settle down and we can focus on internal issues. Last week uwe ran us through e2e/regression testing and I gained a lot of insight into what is infra related in the failures vs the tests themselves. This week the plan will be to get as much running as we can then identify what are issues for others teams to fix. |
Last Week This Week |
Last week:
Next week:
|
This week A lot of conversations: changing my direction/focus, interviewing candidates, syncing with https://app.hubspot.com/contacts/2762526/company/407948923/ (alerts, upgrades, etc.) and https://app.hubspot.com/contacts/2762526/company/557692805/ (search, stability), syncing with Christina about state of product & opportunities. A fair amount of time spent heads-down trying to debug/improve QA tests, but with few results. Its been hard for me to make progress here with lots of interruptions throughout my day and the test suite itself being so dang confusing (but also quite extensive.) I caught up with Uwe and did some pairing up on it with him. Wanting to feel as though I made some progress other than just conversations, I switched away from QA tests mid-Thur and put my thoughts/questions around Cloud on paper, documented when to introduce new services, and merged some updates from Rob and Rijnard to improve syntax highlighting colors + add back GraphQL support. Next week Focus, get more heads-down time on QA tests and push the release through ASAP with Uwe and Dave. |
This week Some last minute tweaks and adjustments to release process for 3.21 (both on the release tool, and the checklist), debugged the deploy-sourcegraph CI pipeline, reviewed some monitoring PRs after noticing some flakey critical alerts on k8s.sgdev.org Next week Keep tabs on release process, start exploring other parts of the release pipeline (e2e, etc) and the possibilities there. Will also be exploring our options with the upcoming deployment UX project meeting. Am also adjusting my work schedule a bit, but no major changes to meetings availabilities for the most part. |
this week Chased down a couple of issues with a big customer (https://github.com/sourcegraph/customer/issues/111, disk space distribution of index space, https://github.com/sourcegraph/customer/issues/116). Pitched in on release process by running regression tests and fixing them up. My Dhall language proposal hit a road block (dhall-lang/dhall-lang#1081) :-). Still working on Dhall components, progress has not been as fast as I would like. next week Getting 3.21 out the door is priority for the beginning of the week. Afterwards I will probably go on vacation. |
Last week:
This week:
|
Last week This week |
Last week Fighting fires with uwe on a large customer (sourcegraph/customer#111) and really battling with regression tests. Uwe and Stephen have been a big help in digging through some of this with me. I have the infrastructure in a good working state, with automation now to setup the sourcegraph instance prior to running the tests. I am still confused as to why things don't work consistently between environments, and why some tests needs to be run twice in order to work. Next week Top priority will be to get 3.21 released, however the regression tests are run (local or in CI). After that a write up that really identifies where the gaps are, what is broken and what can be automated. |
Last Week Vacation 🌴 🚵♂️ This Week Finish up remaining Cloud SQL work https://github.com/sourcegraph/sourcegraph/issues/11496, |
Plan
Support new and existing deployments
This is an ongoing expense, we anticipate this taking no more than 10d of work spread across the entire team.
Support Security in deploying a log analysis tool
Security is planning to deploy a centralized logging and analysis system and will require our assistance to setup and review this new infrastructure.
Implement 2+ sourcegraph.com services using dhall
sourcegraph.com
sees the highest amount of Kubernetes changes out of all of our deployments +deploy-sourcegraph
. Scoping a single component limits the customizations that we need to implement and allows for easier onboarding other engineers.Releases are created in a single day
We have a goal of reducing the time it takes to create releases, and this current several-day system has encouraged us to view releases as “baked” rather than “snapshots of the main branch”, leading to situations where main is broken and we have to retrospectively fix it or adding last minute features.
Split infrastructure into separate GCP projects
GCP utilizes project wide roles and permissions, to ensure resources are isolated from each other and reduce the blast radius of changes, we should split resources into separate projects. Additionally, this will grant us more insight into our infrastructure costs and will become more important as we grow and expand it.
Availability
Period is from September 20th to October 19th (21 working days). Please write the days you won't be working and the number of working days for the period.
Tracked issues
@unassigned: 5.00d
Completed: 5.00d
#13876) 5.00d@bobheadxi: 8.50d
on-call: document actions to follow up on critical alerts(#1468)Completed: 8.50d
#13842)#13871) 2.00d#13792) 1.00d#13873) 2.00d#13872) 0.50dyarn run release release:publish
(#14242) 1.00d#13604) 1.00d#13875) 0.50d#13869) 0.50d#14623)@davejrt
Run QA tests on bare-metal Buildkite agents on every commit to master (non-blocking)(#12340)blackbox exporter & site 24/7 next steps(#13627) 🧶sourcegraph/customer(#111) 👩Completed
#11717)#12339)@daxmc99: 4.00d
explore making it easier to run Kubernetes cluster QA tests (or relax to just smoke tests)(#13878) 4.00d@efritz
@ggilmore
dhall: use dhall on sourcegraph.com(#13340)Completed
#110) 👩@pecigonzalo: 23.00d
blackbox exporter & site 24/7 next steps(#13627) 🧶sourcegraph/customer(#108) 👩Completed: 23.00d
#13919) 1.00d#13920) 3.00d#13916) 2.00d#13918) 5.00d#13792) 1.00d-tooling
cluster from the production project (#13917; PRs:#1719) 3.00d#105) 8.00d 👩@slimsag: 15.00d
sourcegraph/customer(#71) 👩sourcegraph/customer(#49) 0.50d 👩sourcegraph/customer(#97) 👩Completed: 14.50d
sourcegraph/customer(#104) 👩#14075)#12339)#13933)#13868) 0.50d#11269) 1.00d 👩Document when to introduce new services or not(#5487)#1769)#1221)#13880) 12.00d#14632) 1.00d@uwedeportivo: 9.50d
sourcegraph.com: write bot to incorporate image tag updates in dhall pipeline(#14133) 1.50dadd gitserver to deploy-sourcegraph-dhall, with support for sourcegraph.com customizations(#14131) 4.00ddhall: generate separate yaml files for each "component" instead of one large one(#13338) 2.00ddeploy-sourcegraph: restricted integration test fails with Kubernetes 1.16+(#14728)dhall: use dhall on sourcegraph.com(#13340)Completed: 1.00d
#13191) 1.00d 👩🎩Legend
The text was updated successfully, but these errors were encountered: