Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring HTTP downloader progress reporter to accept multiple observers #3542

Merged

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Oct 5, 2023

What does this PR do?

Prior to this PR, the HTTP downloader's progressReporter would report download progress in Agent logs. This functionality was implemented entirely within the progressReporter code.

This PR refactors the progressReporter code to accept multiple observers and introduces a new observer, loggingProgressObserver, that logs the observed progress in Agent logs, preserving the behavior prior to this PR.

Why is it important?

This PR is purely a refactoring PR; it does not change any functionality. Functionally, the HTTP downloader's progress will continue to be reported in Agent logs.

However, in #3527, we will need the progressReporter to also report progress to another observer. This PR makes it easy to accomplish that in a clean manner. More context here: #3527 (comment)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Related issues

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@ycombinator ycombinator mentioned this pull request Oct 5, 2023
7 tasks
@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 5, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-12T21:14:45.038+0000

  • Duration: 27 min 14 sec

Test stats 🧪

Test Results
Failed 0
Passed 6489
Skipped 59
Total 6548

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 5, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 67.224% (201/299) 👍 0.221
Classes 65.827% (366/556) 👍 0.185
Methods 53.06% (1153/2173) 👍 0.087
Lines 38.619% (13163/34084) 👍 0.041
Conditionals 100.0% (0/0) 💚

@ycombinator
Copy link
Contributor Author

While working on this PR, I found a bug with the progress reporter. It was reporting progress just once (after a delay) instead of reporting it periodically. As such, this PR here is now blocked on the bugfix PR: #3548.

@ycombinator ycombinator marked this pull request as draft October 5, 2023 18:36
@ycombinator ycombinator removed request for faec and pchila October 5, 2023 18:36
@ycombinator ycombinator removed the Team:Elastic-Agent Label for the Agent team label Oct 5, 2023
@ycombinator ycombinator force-pushed the refactor-upgrade-download-progress-tracker branch 2 times, most recently from f20ba32 to 3335534 Compare October 5, 2023 19:29
@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b refactor-upgrade-download-progress-tracker upstream/refactor-upgrade-download-progress-tracker
git merge upstream/main
git push upstream refactor-upgrade-download-progress-tracker

@ycombinator ycombinator force-pushed the refactor-upgrade-download-progress-tracker branch from 3335534 to c7618c9 Compare October 10, 2023 20:48
@ycombinator ycombinator added the Team:Elastic-Agent Label for the Agent team label Oct 10, 2023
@ycombinator ycombinator marked this pull request as ready for review October 10, 2023 20:52
@ycombinator ycombinator added Team:Elastic-Agent Label for the Agent team and removed Team:Elastic-Agent Label for the Agent team labels Oct 10, 2023
@ycombinator ycombinator force-pushed the refactor-upgrade-download-progress-tracker branch from 22ddf67 to 9c04408 Compare October 12, 2023 12:06
Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a very tiny comments, it looks good otherwise

@@ -208,8 +206,9 @@ func (e *Downloader) downloadFile(ctx context.Context, artifactName, filename, f
}
}

lpObs := newLoggingProgressObserver(e.log, e.config.HTTPTransportSettings.Timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wild name, i had to go up here to figure it what it means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e9fc097.

length := dp.length
interval := dp.interval

go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very unnecessary optimization from my side: no need to spin up goroutine or initiate tickers if progressObservers is []. as we're having at least log now, i consider this just a small optimization from component side of view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add, thanks!

Copy link
Contributor Author

@ycombinator ycombinator Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in da06f8c and c81477f.

defer t.Stop()
for {
select {
case <-ctx.Done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're counting on external elements to cancel this. we could introduce Done chan internally which would be closed on ReportComplete or ReportFailed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where, for some reason, the consumer wants to cancel while the download is in progress (so before it's completed or failed)? I guess we could make it required as part of the interface's contract that either ReportComplete or ReportFailed MUST be called? Then we could safely remove the ctx from Report and handle cancellation as you suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented your suggestion in 67f80f6. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about creating a subcontext that can be closed when the request is complete or cancelled (ofc we need to hold a ref to its cancellation function) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about creating a subcontext that can be closed when the request is complete or cancelled (ofc we need to hold a ref to its cancellation function) ?

I like this pattern better as it doesn't require the consumer to remember to necessarily call either ReportComplete or ReportFailed. Basically, it gives the consumer two options on how to cancel: either cancel the ctx passed to Report or call ReportComplete/ReportFailed.

As far as implementation goes, we could do a subcontext or we use an internal done channel as implemented in 67f80f6. Is one implementation necessarily better than the other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context cancellation is more "abandon processing" and the done channel is the simple solution to unblock a go routine.

The context has the advantage of propagation through the subcontexts if the parent context is cancelled/closed/expires, the done channel has to be managed/closed explicitly.

Since you have already implemented the done channel you can keep that, I am just quite surprised to see that we only send an empty struct instead of closing the channel and removing the reference

Copy link
Contributor Author

@ycombinator ycombinator Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, except I'm not seeing the benefit of using a subcontext over a done channel in this case, because (if I understood your suggestion correctly) we would do something like this with a subcontext:

  • create the subcontext with cancellation in the Report method ,
  • add a case for it to be done in the select in the Report method, replacing the current case for the ctx (parent context),
  • hold a reference to the cancellation function on the progressReporter struct,
  • and call this cancellation function from the ReportComplete and ReportFailed methods.

Whereas, using a done channel pattern, we:

  • create the done channel in the constructor
  • add a case for it to be done in the select in the Report method,
  • hold a reference to it on the progressReporter struct,
  • send an empty struct to the done channel or close it from the ReportComplete and ReportFailed methods.

So I'm not seeing much of a benefit in this case of using a subcontext instead of a done channel pattern. In particular, I don't see us taking advantage of the context propagation. Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila @michalpristas I updated the implementation such that there are now two ways to return from the goroutine inside Report:

  • Report is passed a context. When the context is done, the goroutine will return.
  • When the consumer calls ReportComplete or ReportFailed, we close an internal done channel, which will also cause the goroutine in Report to return.

return n, nil
}

func (dp *downloadProgressReporter) Report(ctx context.Context) {
Copy link
Contributor

@michalpristas michalpristas Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a comment that caller is responsible for cancelling context, otherwise we're leaking goroutine and ticker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 89ca86b.

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, nothing major.
Left a couple of questions...

@@ -46,13 +44,13 @@ const (

// Downloader is a downloader able to fetch artifacts from elastic.co web page.
type Downloader struct {
log progressLogger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I always prefer the logger as interfaces for mocking and loose coupling, instead of relying on a more concrete type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree but I'm also a fan of YAGNI. We can easily change this back to an interface if/when we need it, i.e. have more than one implementation.

@@ -253,21 +237,21 @@ func assertLogs(t *testing.T, logs []logMessage, minExpectedProgressLogs int, ex
// Verify that the first minExpectedProgressLogs messages are about the download progress (for the first file).
i := 0
for ; i < minExpectedProgressLogs; i++ {
assert.Equal(t, logs[i].record, expectedProgressMsg)
assert.Regexp(t, expectedProgressRegexp, logs[i].Message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are other logs mixed in with the progress logs, this assert will fail the test. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As things stand, the only other log that could get mixed up is this one:

if err := os.Remove(path); err != nil {
e.log.Warnf("failed to cleanup %s: %v", path, err)
}

This log should never get triggered but I suppose in some strange circumstances the os.Remove could fail.

Any suggestions on how to make this more robust to only consider progress logs? Maybe the logs could first be filtered to only keep ones that match expectedProgressRegexp or expectedCompletedRegexp before running any assertions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering and then asserting length is surely one option. The exact semantics depend on what we are trying to assert here: from the code you are asserting that at least the first minExpectedProgressLogs log entries must be the ones you expect: this will break as soon as an extra log entry is in the mix (it may even be added in the future).

Did you mean to assert that among the collected log entries you have at least minExpectedProgressLogs expected logs ? In this case filter/count and then assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me put up a commit showing what I meant by filtering and see what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila I implemented the filtering in a464077. Let me know what you think.

defer t.Stop()
for {
select {
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about creating a subcontext that can be closed when the request is complete or cancelled (ofc we need to hold a ref to its cancellation function) ?

@ycombinator ycombinator requested a review from pchila October 12, 2023 13:11
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I tried to find something to complain about, I was unsuccessful. ;-)

@ycombinator ycombinator force-pushed the refactor-upgrade-download-progress-tracker branch from a464077 to ecc4201 Compare October 12, 2023 16:56
@ycombinator ycombinator changed the title Refactoring HTTP downlader progress reporter to accept multiple observers Refactoring HTTP downloader progress reporter to accept multiple observers Oct 12, 2023
@elastic-sonarqube
Copy link

@ycombinator ycombinator merged commit 8284ce2 into elastic:main Oct 13, 2023
8 checks passed
@ycombinator ycombinator deleted the refactor-upgrade-download-progress-tracker branch October 13, 2023 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants