Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PVC] Prebuild without PVC, user A account with PVC and trigger rebuild, user B account without PVC #12718

Closed
Tracked by #7901
jenting opened this issue Sep 7, 2022 · 16 comments · Fixed by #13117
Closed
Tracked by #7901
Assignees
Labels
team: workspace Issue belongs to the Workspace team type: bug Something isn't working

Comments

@jenting
Copy link
Contributor

jenting commented Sep 7, 2022

Bug description

The prebuild project without PVC.

The user A account with PVC, add new branch X and commit, push branch X to the repo, trigger prebuild on Gitpod. The prebuild runs without PVC.
The user B account without PVC, open that branch X, the user B uses PVC.

Steps to reproduce

The prebuild projects without PVC.

  • The user A account with PVC.
  • The user B account without PVC.
  • User A adds new branch X and commit, pushes branch X to the repo, and triggers prebuild on Gitpod. The prebuild runs without PVC. 👍
  • User B opens branch X, the user B uses PVC (The prebuild project and the user's account are both without PVC).

Workspace affected

No response

Expected behavior

User B should open the workspace without PVC because the prebuild project is configured without PVC, also the user's account does not enable PVC.

Example repository

No response

Anything else?

#7901

@jenting jenting added the type: bug Something isn't working label Sep 7, 2022
@jenting jenting added the team: workspace Issue belongs to the Workspace team label Sep 7, 2022
@kylos101
Copy link
Contributor

kylos101 commented Sep 8, 2022

👋 @jenting I just tested this, and got same result. Here are supporting details:

project has pvc disabled, https://gitpod.io/t/prebuild-experiment/prebuild-experiment

user has PVC enabled, https://gitpod.io/admin/users/8df3495b-685d-46e0-9820-009cc3b4afd8

push a code change to a branch

prebuild will not use PVC, which is expected, because it's disabled on the project. Details:

node: headless-ws-us64-internal-xl-pool-nhjr
workspace: kylos101-prebuildexperi-5p1e6ngpkxf
pod: prebuild-880d87db-2e38-45e1-8ada-5bfa79f94210

Same user has PVC disabled now

PVC is still disabled on the project

A new workspace is started from the branch's prebuild, but has PVC. Details:

node: workspace-ws-us64-large-pvc-pool-kfwz
workspace: kylos101-prebuildexperi-biberrbv8vk
pod: ws-f3d2657c-4e47-4ee1-aca3-c557e6bcc9e3 

I think there are two contributing factors:

  1. We're sending the persistent_volume_claim feature flag when triggering prebuild
  2. We're persisting the persistent_volume_claim feature flag to the prebuild workspace

This query shows the feature flag is being persisted:

SELECT JSON_EXTRACT(dbw.config, "$._featureFlags") 
FROM gitpod.d_b_workspace dbw 
where dbw.id = 'kylos101-prebuildexperi-5p1e6ngpkxf';

If we avoid sending the persistent_volume_claim feature flag on the startWorkspace request for prebuild (unless it is enabled on the project), would that help? Or do you think it would be better to avoid persisting that particular feature flag to the DB and continue sending? Or something else? @sagor999 for 👀

Either way, this is a good catch, nice find! 💪

@sagor999 sagor999 self-assigned this Sep 8, 2022
@sagor999 sagor999 moved this to In Progress in 🌌 Workspace Team Sep 8, 2022
@jenting
Copy link
Contributor Author

jenting commented Sep 9, 2022

I think when open the workspace, we should check the prebuild project PVC enabled or the user's PVC enabled.

If one of the condition hits, we send start workspace request with feature flag persistent_volume_claim. Otherwise, the start workspace request must not contains feature flag persistent_volume_claim.

@kylos101
Copy link
Contributor

kylos101 commented Sep 9, 2022

👋 @jenting ,

I think when we open the workspace, we should check the prebuild project PVC enabled or the user's PVC enabled.

A couple questions:

  1. Do you mean prebuild workspace or regular workspace?
  2. Why check prebuild project PVC setting, or user PVC feature flag?

Considerations:

  1. the PVC setting on the project (we check this now, it controls if the prebuild runs with PVC or not)
  2. the PVC feature flag on the user (we check this now, it controls if new regular workspaces start with PVC or not)
  3. the feature flag persisted to the prebuild workspace config field in the database (if this is set, we should start regular workspaces with PVC, I'd argue the problem is we're wrongly setting this...especially given the scenario I shared ☝️, and to fix, maybe we have to avoid persisting it in the first place when the project has PVC disabled )
  4. the feature flag persisted to the regular workspace config field in the database (this should control whether a regular workspace restart uses PVC or not 🤞 )

It seems like the problem is, when we run prebuilds, if the startWorkspace request contains persistent_volume_claim, and the project has PVC disabled, we're persisting persistent_volume_claim to the workspace's config JSON object in the database. So, as a result, when we start regular workspaces from the prebuild, they use PVC (because the config object says it was used).

@jenting
Copy link
Contributor Author

jenting commented Sep 9, 2022

  1. Do you mean prebuild workspace or regular workspace?

regular workspace

  1. Why check prebuild project PVC setting, or user PVC feature flag?

Because if the project was built with PVC, then for any user who open the regular workspace from that prebuild will force to use PVC.

Because the user enables PVC, open workspace from prebuild (with PVC or without PVC) or from git clone, the regular workspace should use PVC.


From my point of view, we should save the persistent_volume_claim feature flag in DB (d_b_workspace?) if the prebuild workspace was built with the PVC. Otherwise, we do not save persistent_volume_claim feature flag in DB (d_b_workspace?).

Then, the start workspace request persistent_volumn_claim feature flag determine by

  • the prebuild workspace with PVC, add it.
  • the prebuild workspace without PVC:
    • the user account with PVC enabled, add it.
    • the user account without PVC enabled, do not add it.

@kylos101
Copy link
Contributor

kylos101 commented Sep 9, 2022

I like your idea, @jenting . My only concern with user feature flag in its current form, is that it is hard to roll out. For example, we have to set it via Admin page, or SQL query.

Ideally we'd set the persistent_volume_claim feature flag for regular workspace start from the server component using rules that we define in configcat. This way its not exclusively at the user level.

Checkout #12745 for the idea, should help us rollout in a more flexible way.

@sagor999 sagor999 moved this from In Progress to Scheduled in 🌌 Workspace Team Sep 12, 2022
@sagor999 sagor999 assigned sagor999 and unassigned sagor999 Sep 12, 2022
@sagor999 sagor999 moved this from Scheduled to In Progress in 🌌 Workspace Team Sep 14, 2022
@sagor999 sagor999 moved this from In Progress to Scheduled in 🌌 Workspace Team Sep 14, 2022
@sagor999 sagor999 moved this from Scheduled to In Progress in 🌌 Workspace Team Sep 14, 2022
@sagor999
Copy link
Contributor

sagor999 commented Sep 14, 2022

Hey @geropl
We need webapp help on this one.
tl;dr
starting prebuild via gitpod.io/#prebuild/
incorrectly (in my opinion) bundles prebuild and workspace together, specifically, feature flags from workspace are somehow getting coalesced with prebuild itself.

Here is a correct scenario:
User pushes a commit into repo
User triggers prebuild (or it gets triggered automatically).
Prebuild completes. All is good.

Broken scenario:
User pushes a commit into repo
User triggers prebuild via #prebuild/ url
It runs prebuild, and then immediately opens workspace from that prebuild.
Another user opens that prebuild, and feature flags from workspace get embedded into prebuild.

This is where it happens:

const prebuiltWorkspace = await this.findPrebuiltWorkspace(ctx, user, context, mode);

This seems to me like a bug, and we should not tangle prebuild and workspace together in that scenario, as we are leaking feature flags of a user into prebuild.

Can you provide insight? If it is indeed a bug, would you mind scheduling a fix for it?

@geropl
Copy link
Member

geropl commented Sep 16, 2022

Hey @sagor999, 👋

let me re-phrase, to make sure we're on the same page:

  • you see a prebuild configured by - and running with credentials from - user A
  • that prebuilds is using all features flags/config from user A (<-- incl. PVC flag)
  • when user B gets around and opens a workspace based on that prebuild, it breaks for them, because they cannot use the PVC workspace class

If that's a correct summary this is expected behavior. 😕 I don't say it's ideal, and we need to fix the hard prebuild <-> user coupling (see "domain model" discussion where having soon).

I'm not sure if we support the PVC -> old workspace backup path, but from the context I assume not. Then I see two ways:

  • we limit the PVC workspace flag to regular workspace
  • we allow "propagation" of the PVC flag, e.g. it becomes infectious (I guess that's what we don't want to do to not loose control over the rollout)

@sagor999
Copy link
Contributor

sagor999 commented Sep 16, 2022

Yeah, correct summary.
Actually, I don't think this is entirely correct.
If prebuild is triggered automatically, then we don't have that coupling.
It is only when prebuild is triggered via #prebuild url path.

@kylos101
Copy link
Contributor

kylos101 commented Sep 16, 2022

@geropl @sagor999

Given this block

if (workspace.type === "prebuild") {
if (pvcEnabledForPrebuilds === true) {
featureFlags = featureFlags.concat(["persistent_volume_claim"]);
} else {
// If PVC is disabled for prebuilds, we need to remove the PVC feature flag.
// This is necessary to ensure if user has PVC enabled on their account, that they
// will not hijack prebuild with PVC and make everyone who use this prebuild to auto enroll into PVC feature.
featureFlags = featureFlags.filter((f) => f !== "persistent_volume_claim");
}
}

It seems like the problem could be that we're passing pvcEnabledForPrebuilds as true in the options, when really, it's false for the project, but true the user has the persistent_volume_claim feature flag set.

I see options are set here:

But the only place where they are being set explicitly is here:

pvcEnabledForPrebuilds: usePVC,

And @sagor999 I assume you were testing with a project that had PVC disabled, so this should return false:

protected shouldUsePersistentVolumeClaim(project?: Project): boolean {

🤔 I am confused as to why we're persisting the PVC feature flag in this case. What am I missing?

@sagor999
Copy link
Contributor

@kylos101 no, this is not correct.
pvcEnabledForPrebuilds is set to true ONLY when project has PVC enabled.

So in our case prebuild is correctly ran without PVC (since project has it disabled), but then it would open workspace with PVC enabled since user has PVC flag enabled, and then that flag would get baked into prebuild somehow. And this only happens when running prebuild via #prebuild url.

@kylos101
Copy link
Contributor

👍 ah, okay, I was unsure if I had the right block/path/flow, thank you @sagor999 for clarifying! @sagor999 it might help to sync with @geropl or another on webapp team. Could you share a calendly link with Gero?

@jankeromnes
Copy link
Contributor

jankeromnes commented Sep 19, 2022

And this only happens when running prebuild via #prebuild url.

Whoops, the #prebuild URL prefix is deprecated and should be removed soon, because it bypasses pretty much every prebuild feature we've built in the past few years: #4353

Using that prefix is unsupported, and might result in undefined behavior / a broken experience.

The supported ways to trigger prebuilds are (by order of preference):

  1. Pushing a new commit to a branch with prebuilds enabled
  2. Clicking on the "Run Prebuild" button in Gitpod's UI on a failed or cancelled prebuild

@jenting
Copy link
Contributor Author

jenting commented Sep 19, 2022

🤔 but I remember my reproduce steps are

  • User A with PVC feature flag, User B without PVC feature flag, prebuild project without PVC feature flag.
  • User A pushes a new commit to a branch X (it did not trigger prebuild automatically)
  • User A clicks the "Run Prebuild" button in Gitpod's UI to trigger prebuild
  • User B opens the workspace with the branch X URL

@sagor999
Copy link
Contributor

@jenting can you try to repro this? I was only able to repro this when using #prebuild url. 🤔

@geropl
Copy link
Member

geropl commented Sep 20, 2022

While removing the #prebuild/ prefix might be the ideal solution, we're still blocked by other issues, so I don't think we can solve this soon enough.
To unblock you here, now, I will be looking into putting a bandaid on for this particular problem.

@geropl geropl moved this to Scheduled in 🍎 WebApp Team Sep 20, 2022
@geropl geropl moved this from Scheduled to In Progress in 🍎 WebApp Team Sep 20, 2022
@geropl geropl self-assigned this Sep 20, 2022
@geropl
Copy link
Member

geropl commented Sep 20, 2022

I think I found the bug; it is as @jenting described.

The problem:

  • here, we define persistent_volume_claim as "persistent" for a workspace
  • here it (consequently) copies over from prebuild -> regular workspace (based on that prebuild)
  • ( here we break the idea of "feature flags persistent for the lifetime of a workspace" ... but fixing this might be out-of-scope for in this context )
  • here we don't filter that PVC feature flag from regular workspaces

Just to make sure I understand how user.featureFlags and project?.settings?.usePersistentVolumeClaim are meant to interact:

  • user's control how whether they want to use PVC or not
  • projects control whether prebuilds use PVC or not
  • we support the transition "bucket backup" -> "PVC backup"
  • we do not support the reverse

D'accord? 🤔

Based on this I'll create PR.

Update: here it is. #13117

@sagor999 I think it make sense to talk through the PR in sync to make sure we do not miss anything. Let's figure out a time in Slack.

Repository owner moved this from In Progress to Done in 🍎 WebApp Team Sep 22, 2022
Repository owner moved this from In Progress to Awaiting Deployment in 🌌 Workspace Team Sep 22, 2022
@jenting jenting moved this from Awaiting Deployment to In Validation in 🌌 Workspace Team Sep 30, 2022
@jenting jenting moved this from In Validation to Done in 🌌 Workspace Team Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: workspace Issue belongs to the Workspace team type: bug Something isn't working
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants