Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load test Single Cluster Reference Architecture #10869

Closed
lucasvaltl opened this issue Jun 23, 2022 · 7 comments
Closed

Load test Single Cluster Reference Architecture #10869

lucasvaltl opened this issue Jun 23, 2022 · 7 comments
Assignees
Labels
self-hosted: reference-architecture team: delivery Issue belongs to the self-hosted team

Comments

@lucasvaltl
Copy link
Contributor

lucasvaltl commented Jun 23, 2022

Summary

Scale test the production reference architecture in order to be able to give info on how far it can scale

Context

We are using our reference architectures as the way to communicate what Gitpod can and cannot do given a specific set of infrastructure. It makes sense to also talk about scale in this context.

Value

  • Assurance in the scale that the prod reference architecture can handle

Acceptance Criteria

  • We have a measure of how far the reference architecture can scale that is relevant to our users. E.g. it could be the number of workspaces of size X (given N nodes)

Measurement

  • We have a metric that tracks scale, and we can re-run these tests to see if something has changed.

Implementation Ideas:

Additional Context

@lucasvaltl
Copy link
Contributor Author

This is likely not just a team self-hosted thing but a Gitpod thing :)

@lucasvaltl lucasvaltl changed the title Epic: Load test Prod Reference Architcture Load test Prod Reference Architcture Jun 24, 2022
@lucasvaltl lucasvaltl moved this from 🤝Proposed to 🧊Backlog in 🚚 Security, Infrastructure, and Delivery Team (SID) Jul 6, 2022
@lucasvaltl
Copy link
Contributor Author

We should move this to scheduled once we have the terraform scripts ( #11027) and maybe a werft command to spin up a self-hosed environment.

@lucasvaltl lucasvaltl moved this from 🧊Backlog to 📓Scheduled in 🚚 Security, Infrastructure, and Delivery Team (SID) Aug 5, 2022
@lucasvaltl lucasvaltl changed the title Load test Prod Reference Architcture Load test Single Cluster Reference Architcture Aug 10, 2022
@Pothulapati Pothulapati self-assigned this Aug 29, 2022
@adrienthebo adrienthebo added the team: delivery Issue belongs to the self-hosted team label Aug 29, 2022
@lucasvaltl lucasvaltl moved this to In Progress in 🌌 Workspace Team Aug 29, 2022
@atduarte atduarte changed the title Load test Single Cluster Reference Architcture Load test Single Cluster Reference Architecture Aug 29, 2022
@lucasvaltl lucasvaltl moved this from 📓Scheduled to ⚒In Progress in 🚚 Security, Infrastructure, and Delivery Team (SID) Aug 30, 2022
@Pothulapati
Copy link
Contributor

👋🏼

Update:

I've worked on the EKS side of things in the single-cluster reference architecture, this week. This required working on #12577 as we need to scale the cluster based on the number of workspaces. Once that change was merged, The loadgen just worked like you would expect.

Because the images were part of gitpod-dev registry (which is a private registry), We used #12174 that was recently added to pass docker credentials into the cluster for the same. :)

Result: We started with the config in the prod-benchmark.yaml which spuns up 100 workspaces. In this, We found a success rate of around 0.95. The remaining 5 workspaces were terminated spuriously (reason being issues with node getting marked as not ready). Because these were pods, They are not re-applied automatically, and hence lost. For this scale, The autoscaler spun up 16 nodes in total. https://www.loom.com/share/b7fa5beef4134051984f4c157ae47552

@lucasvaltl suggested if we could do more i.e 500 workspaces. Will posts update here on the same.

@kylos101
Copy link
Contributor

kylos101 commented Sep 2, 2022

@Pothulapati could you share a link to the loadgen config that you are using? 🙏

@Pothulapati
Copy link
Contributor

@kylos101 I'm using the configs in the repo, specifically the prod-benchmark.yaml

For the 500 workspaces test, I'm just running the same config but with more scores i.e number of workspaces.

@Pothulapati
Copy link
Contributor

Update on the 500 workspaces load suggestion:

We started out well until 250 workspaces.

image

But once we reached the 60m mark, A bunch of workspaces timed out. As I can't find any errors on the components, This is probably a timeout thing. So, Even though the loadgen applied 500, We ended up with only 350 running workspaces by the end.

Screenshot 2022-09-02 at 7 12 17 PM

So, Results on EKS seem pretty good! 👍🏼

@kylos101
Copy link
Contributor

👋 hey there, I am going to load test the autoscaling for GKE in https://github.com/gitpod-io/website/issues/2888. Closing this issue in favor of that one (a load test was also recently done as part of the September release for GKE and EKS).

Repository owner moved this from In Progress to Awaiting Deployment in 🌌 Workspace Team Oct 17, 2022
Repository owner moved this from 🕶In Review / Measuring to ✨Done in 🚚 Security, Infrastructure, and Delivery Team (SID) Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
self-hosted: reference-architecture team: delivery Issue belongs to the self-hosted team
Projects
No open projects
Development

No branches or pull requests

4 participants