Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/cloud: stop sending cloud metrics when limit reached #1130

Merged
merged 1 commit into from
Sep 24, 2019
Merged

Conversation

cuonglm
Copy link
Contributor

@cuonglm cuonglm commented Aug 29, 2019

Once the cloud test gets aborted by limit, we should not send metrics
anymore. The test and collector will still running, though.

Close #1074

@codecov-io
Copy link

codecov-io commented Aug 29, 2019

Codecov Report

Merging #1130 into master will increase coverage by 0.01%.
The diff coverage is 76.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1130      +/-   ##
==========================================
+ Coverage   73.61%   73.62%   +0.01%     
==========================================
  Files         144      144              
  Lines       10527    10547      +20     
==========================================
+ Hits         7749     7765      +16     
- Misses       2321     2325       +4     
  Partials      457      457
Impacted Files Coverage Δ
stats/cloud/collector.go 71% <76.92%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2077dd...450fa1e. Read the comment docs.

stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've written I don't think the way you are doing is ... good. I think you should stop collecting and pushing metric in general not just checking that you should no longer push metrics when it gets to it.

Also I don't see how this PR implements the other part of the #1074 mainly there being a way for this to stop the execution of the test. This probably should be with an option but still ...

stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
@cuonglm cuonglm force-pushed the fix/1074 branch 4 times, most recently from b57925e to 8dc8de7 Compare August 30, 2019 07:54
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from the unlikely panic ;)

Also are we going to leave the way to tell k6 that we won't write for a later commit ? I think that will definitely require an interface change and probably will be required for a release:
In most cases where people will be writing to cloud if they stop writing for whatever reason they will probably want to end the tests if there is no other output.

stats/cloud/collector.go Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the minor nitpicks

stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
na--
na-- previously approved these changes Sep 3, 2019
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again a possible panic .. I am thinking the design of this output isn't that great as well :(

stats/cloud/collector.go Show resolved Hide resolved
imiric
imiric previously approved these changes Sep 3, 2019
stats/cloud/collector.go Show resolved Hide resolved
stats/cloud/collector_test.go Show resolved Hide resolved
@cuonglm cuonglm dismissed stale reviews from imiric and na-- via 76a1c63 September 3, 2019 11:48
@cuonglm cuonglm force-pushed the fix/1074 branch 2 times, most recently from 76a1c63 to d479867 Compare September 4, 2019 06:37
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of the comments that I left:
Looking at the tests I don't see you testing, that we actually stop after the max writes ... if anything this require seems to indicate that the collector didn't get stopped ... which is expected because you are collecting metrics and than canceling the context, but you are nowhere sleeping in order for there to be at least 5 pushes.

stats/cloud/collector.go Outdated Show resolved Hide resolved
stats/cloud/collector.go Show resolved Hide resolved
stats/cloud/collector.go Outdated Show resolved Hide resolved
@cuonglm
Copy link
Contributor Author

cuonglm commented Sep 4, 2019

On top of the comments that I left:
Looking at the tests I don't see you testing, that we actually stop after the max writes ... if anything this require seems to indicate that the collector didn't get stopped ... which is expected because you are collecting metrics and than canceling the context, but you are nowhere sleeping in order for there to be at least 5 pushes.

The test is done with count and max, if we push more than max, then count will be greater than max.

@mstoykov
Copy link
Contributor

mstoykov commented Sep 4, 2019

Okay .. I guess this is fine for the test ...
although I would prefer if you specifically test that the channel is closed instead of that you will receive it's nil value.

Also .. I suppose the reason for this test to work at all without sleeps is that the generation of metrics takes long enough? So I would prefer if there is a sleep in one of two cycles that makes it impossible for the loop to end before the needed amount of pushes is done ...

@cuonglm
Copy link
Contributor Author

cuonglm commented Sep 4, 2019

although I would prefer if you specifically test that the channel is closed instead of that you will receive it's nil value.

Good idea, will fix.

Also .. I suppose the reason for this test to work at all without sleeps is that the generation of metrics takes long enough? So I would prefer if there is a sleep in one of two cycles that makes it impossible for the loop to end before the needed amount of pushes is done ...

Let me check.

mstoykov
mstoykov previously approved these changes Sep 4, 2019
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now .. still maybe adding time.Sleep in the cycle generating metrics will make the test more consistent but it doesn't look like a problem atm

@na-- na-- added this to the v0.26.0 milestone Sep 4, 2019
imiric
imiric previously approved these changes Sep 4, 2019
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait for approval from @na-- before merging.

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is broken when aggregation is not enabled, i.e. when AggregationPeriod is 0. Try to run a simple script with k6 run -o cloud and see that k6 won't ever exit, you'd have to kill the process. The reason for that behavior is that the quit channel won't be closed...

I didn't follow the previous discussion closely and I am not certain why that quit channel is necessary, but the current implementation is broken.

@cuonglm
Copy link
Contributor Author

cuonglm commented Sep 5, 2019

The current code is broken when aggregation is not enabled, i.e. when AggregationPeriod is 0. Try to run a simple script with k6 run -o cloud and see that k6 won't ever exit, you'd have to kill the process. The reason for that behavior is that the quit channel won't be closed...

I didn't follow the previous discussion closely and I am not certain why that quit channel is necessary, but the current implementation is broken.

FIxed it.

stats/cloud/collector.go Outdated Show resolved Hide resolved
Once the cloud test gets aborted by limit, we should not send metrics
anymore.

Close #1074
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
but update or create new issue for adding the option to abort the test by the output. This will also be useful in cases where any other output has problems or is not efficient enough and the user wants to abort the test in such cases.

@cuonglm
Copy link
Contributor Author

cuonglm commented Sep 16, 2019

This will also be useful in cases where any other output has problems or is not efficient enough and the user wants to abort the test in such cases.

Filed in #1160

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cuonglm cuonglm requested a review from na-- September 23, 2019 11:27
@cuonglm cuonglm merged commit f2377a6 into master Sep 24, 2019
@cuonglm cuonglm deleted the fix/1074 branch September 24, 2019 10:25
cuonglm added a commit that referenced this pull request Oct 7, 2019
In #1130, we stop sending metrics to cloud when limit reached. However,
after stop sending metrics, collector still collects data, causing the
memory usage quickly increase.

To fix it, do not collect data if stop sending metrics triggered.

While at it, also remove some useless condition checking that slice
length is greater than 0 before looping.

Fixes #1187
cuonglm added a commit that referenced this pull request Oct 7, 2019
* fix memory leak after stop sending metrics to cloud

In #1130, we stop sending metrics to cloud when limit reached. However,
after stop sending metrics, collector still collects data, causing the
memory usage quickly increase.

To fix it, do not collect data if stop sending metrics triggered.

While at it, also remove some useless condition checking that slice
length is greater than 0 before looping.

Fixes #1187
cuonglm added a commit that referenced this pull request Oct 15, 2019
* fix memory leak after stop sending metrics to cloud

In #1130, we stop sending metrics to cloud when limit reached. However,
after stop sending metrics, collector still collects data, causing the
memory usage quickly increase.

To fix it, do not collect data if stop sending metrics triggered.

While at it, also remove some useless condition checking that slice
length is greater than 0 before looping.

Fixes #1187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop sending data to the Load Impact cloud when the test gets aborted
6 participants