-
Notifications
You must be signed in to change notification settings - Fork 13
[RTC-230] Net condition simulation POC #280
Conversation
Codecov Report
@@ Coverage Diff @@
## master #280 +/- ##
==========================================
- Coverage 63.08% 62.79% -0.29%
==========================================
Files 44 44
Lines 2129 2129
==========================================
- Hits 1343 1337 -6
- Misses 786 792 +6 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
x-browser-template: &browser-template | ||
image: test_videoroom_browser_image | ||
environment: | ||
ERL_COOKIE: "panuozzo-pollo-e-pancetta" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
@@ -0,0 +1,49 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about managing packet loss, starting dockers etc. from Elixir? I don't like the idea of creating a file to trigger packet loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean create another TestController
Elixir app instead of the shell script? I guess that could be done, but I don't think it'd solve the issue you're referring to.
Some means of synchronisation (i.e. creation of a file) is necessary because we don't know how long does the browser action :get_stats
take to complete. Before applying packet loss, there are supposed to be 15 such actions in 1 second intervals -- however, they usually take around 20-30 seconds to complete. Therefore, the test controller can't know in advance when it's supposed to apply the network effect, as it MUST be done after the first bunch of stat collecting and before the second one.
There are limited ways to accomplish this -- having one of the containers create a file in the right moment seemed like the easiest one. I can change the controller code to sync in another way (e.g. keep polling the containers for certain logs, and enable the loss whenever it finds them; or, possibly, another solution) if you believe it to be a better option...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that applying the packet loss is not possible from inside of the container? If so, the thing that comes to my mind is to create another elixir node, something like test runner, outside of the containers, but connected to the nodes in the containers via distributed erlang (like the containers between themselves), that's responsible for running the test and applying the packet loss when notified via simple message from the server
node/container. Im not sure if that's doable with the docker-compose network etc. and might be an overkill, just something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, technically, we could run pumba netem
in one of the containers (the one to apply loss on, or maybe even the server), but that would require us to expose the Docker daemon socket to the container itself, which to me seems like an anti-pattern at best, and a major security risk at worst...
I've thought about the approach with nodes, perhaps it really is the best one of the bunch. There'd need to be some fidgeting around with exposing epmd ports and such, but I've no doubt it's doable. I'll try to implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, we can move this to another PR.
I think, I would create a central point responsible for managing the whole test. This point would also be responsible for asking browsers for statistics 🤔
We can discuss this before implementing in more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, we can tackle this problem later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What comes to my mind is to create release with mix release
and use multi-stage Docker image like we do e.g. here. I'm not sure how would it play with the Playwright and browsers (or if it's possible at all), but it might make the image a little bit smaller and you could use the new jammy membrane_docker
in the build stage and not bother with installing some of the dependencies by yourself. Similar story with the server Dockerfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've tried doing that... obviously didn't succeed, seeing as I left the 'bloated' image Dockerfiles. Unfortunately, when installing Playwright, there are a lot of moving parts and a lot of places where things can break.
For now, I think I'd rather leave them as-is, and the container images will have to be made slimmer sometime in the future.
@@ -0,0 +1,49 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that applying the packet loss is not possible from inside of the container? If so, the thing that comes to my mind is to create another elixir node, something like test runner, outside of the containers, but connected to the nodes in the containers via distributed erlang (like the containers between themselves), that's responsible for running the test and applying the packet loss when notified via simple message from the server
node/container. Im not sure if that's doable with the docker-compose network etc. and might be an overkill, just something to consider.
@@ -0,0 +1,21 @@ | |||
# TestBrowser | |||
|
|||
**TODO: Add description** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something should be added here 😅
41f9c8a
to
5157189
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question, when I run the script locally it finished with logs like this:
FATA[0060] error running netem loss command: error running chaos command: failed to add packet loss for one or more containers: failed to stop netem container: failed to stop netem: failed to create exec configuration to check if command exists: Error response from daemon: Container f4f93f5eecf6bff6de163c295153f3390cdef1142ac7f38a593e89162c13567f is not running
Network condition simulation over. Waiting for the docker-compose job to complete...
Doesn't it mean that we stop containers too early?
Besides that PR looks good.
@Rados13 Well, depends on how you look at it 😄... It's unfortunate, but we'd need another synchronisation point to know when to stop/kill the netem command. This is what the script comment was referring to:
It's possible that this issue could be resolved with running the netem cmd as a job (along with other modifications), but I fear that it would make the script even more convoluted and confusing. I sincerely hope a better solution can be found once we rewrite the test controller in Elixir... |
spawn_link(fn -> | ||
Process.sleep(@max_test_duration) | ||
raise("Test duration exceeded!") | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use Process.send_after
POC: right now, the test doesn't check anything.
Current issues:
Resources for future reference: