Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: @JGAntunes to share some of his wisdom on kubernetes, mininet and most importantly, containernet #250

Closed
daviddias opened this issue Dec 12, 2019 · 4 comments
Assignees
Labels
kind/discussion Kind: Discussion

Comments

@daviddias
Copy link
Contributor

daviddias commented Dec 12, 2019

Hi @JGAntunes! I've a bunch of questions for you and I thought it would be better to just open an issue an invite you to shed some light into our thinking. tl;dr;

  • We started with Nomad, then switch to Docker Swarm
  • We created an utility, sidecar, that enables us to write tests that request network configurations (create network interfaces, subnets, traffic shapping (latency, jitter, bandwidth))
  • We then realized that Docker Swarm is kind of slow for large amounts of nodes (10K++)
  • We are now experimenting with kubernetes, we do have tests running in DO with 10K++ in less than 10 minutes
  • We are now wondering how to provide the same functionalities we achieved with sidecar, using kubernetes.

I know you built https://github.com/JGAntunes/js-pulsarcast and to test it, you also built:

Can you help us understand how we can achieve the requirements described at https://github.com/ipfs/testground/blob/master/docs/SPEC.md#sidecar-agent ?

Huge THANK YOU in advance!

@raulk
Copy link
Contributor

raulk commented Dec 12, 2019

We started with Nomad, then switch to Docker Swarm

Clarification: We picked Nomad as as the endgame scheduler, but we didn't want to incur in a learning curve so early in the project in PoC phase. Docker Swarm was very easy to set up and hit the ground running, and allowed us to break through the next frontier, and build the primitives we needed to synchronise distributed workloads and execute remote tests -- regardless of what the underlying scheduler/orchestration engine happens to be.

@JGAntunes
Copy link

Hey @daviddias (and everyone) 👋

I feel like I need to provide a bit of history to the whole thing 😅 so cope with me through this first few paragraphs.

History

So when I started working on Pulsarcast I also started working on a way to test it accordingly. I started by looking at mininet and afterwards containernet as it was easier to deal with containerised hosts (my pulsarcast-test-harness using containernet at the time and the nodejs wrapper for containernet). This however presented a challenge in terms of orchestration, running specific workloads, collecting relevant output data, monitoring, etc. The solutions I found for running containernet at scale (multiple nodes) seemed also fairly complex and not that straight forward. Finally a lot of the stuff containernet provided me were not that relevant to the scope of my project (I essentially needed to simulate latency, jitter and other relevant network faults), so it seemed like going through the process of simulating a whole virtual network was a bit too much.

Finally I settled on Kubernetes to work as my orchestration engine for my testbed. I've created a custom helm chart which consists of a deployment of js-ipfs with a toxiproxy sidecar that proxies all the TCP traffic coming to the js-ipfs container. I've then created ipfs-testbed which has all the configurations to bootstrap a Kubernetes cluster ready to run my test workloads. The testbed setup consists of N ipfs deployments and an ELK cluster, responsible for collecting all the relevant metrics and logs from these. Finally I've created a cli tool that interacts with the IPFS HTTP API and the Toxiproxy HTTP API running in each pod and runs commands/injects faults and latency. The pulsarcast test harness ended up being just the datatsets/scripts that I use to test pulsarcast specifically using all of the above (documentation is not up to date 😄)

Actual answer/help

So for what is worth, and if I may, I would say Kubernetes is the way to go in terms of orchestration here (community support, maintainability, tooling, etc.) so you're definitely on the right path. As for your actual question, I guess what you would be looking for is to create your own custom operator which would then, through the Kube API, be able to monitor and take action if needed be on the IPFS nodes running in the cluster. This would run as part of the lifecycle of your cluster, which means you get access to all the internal state of it in your controller. You get the ability to define custom resources that become first class citzens in the Kube API and consequently you can interact with them through the kubectl cli. Any metrics that you see as relevant can be exposed in your controller and will be scraped and part of the global metricset that is exposed by the cluster, meaning you'll only need to monitor your cluster in order to monitor your testbed. If you're interested there are projects like Kube builder that are focused on making this process easier. Tbh, I've personally considered building something like this myself but it presented to be an endeavour that would be too time consuming taking into account Pulsarcast is my main focus 😅 I have however started doing an operator for Toxiproxy, where the idea would be to, based on annotations in deployments, inject Toxiproxy sidecars and allow configuring faults through the Kube API (it is mostly WIP though).

Other (probably) relevant stuff

While speaking with @daviddias he asked me if there's anything similar to containernet/mininet for Kubernetes. Short answer, afaik, something that gives us this straight out of the box, no. Kubernetes internally does not define a network standard, it does however define a set of requirements that must be fulfilled in order for the cluster to function. These requirements are usually implemented through network overlays such as calico, flannel and others. So, I take it that it may be possible that some overlays through configuration and fine tuning may provide some mininet like properties. Or, given that mininet/containernet are Open Flow based, some kind of integration might be possible. However I would say some investigation would need to take place (I watched a talk in Kubecon a while back about running Open Flow based Virtual Network setups and test runs in k8s, I'll see if I can find something and leave it here).

Hope I was able to help, shout if you need anything else 👍

@daviddias
Copy link
Contributor Author

Thank you so much for answering my prompt, @JGAntunes! Lot's of valuable insights here. //cc @raulk @Stebalien @nonsense .

I checked with the current Containernet maintainer (containernet/containernet#165) and it does seem that as you have experienced, Containernet is not ready to be run in a multiple VM set up. There is a project to enables you to do that, http://maxinet.github.io, but it isn't necessarily trivial to configure.

We've some more testing to do. So far K8s does seem to be the way with regards to scaling to multiple thousands of nodes and with the notes from your experience using Toxiproxy, we should be able to hit the majority of our goals for now. Will keep you posted, if you are interested :)

@raulk
Copy link
Contributor

raulk commented Mar 9, 2020

Thank you for your input, @JGAntunes! We have adopted k8s as our orchestrator, and flannel+weave orchestrated by CNI Genie for our network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Kind: Discussion
Projects
None yet
Development

No branches or pull requests

4 participants