Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Improved verify experience #21573

Merged
merged 3 commits into from
Oct 6, 2020

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Oct 6, 2020

What does this PR do?

In this PR agent removes downloaded bits even if asc signature does not match (it was only the case for hash up until now)
Then fetch-verify is retried.

This helps mainly with sceanrio when artifacts are built without .asc files. So asc file downloaded from snapshot repository matches other build as self build binaries included in agent package.
IN this case after initial failure, agent removes self build binaries and downloads snapshot from repository including sha512 and asc files.

BUT there's a bit more going on in this PR. there's added lock for update method in emitter because when dynamic input called Set and Config was Loaded it resulted in two concurrent processing of configuration which tried to setup and run own set of beats and they were fighting over path.data location, crashing...

This problem sometimes manifested sometimes not, more probably during standalone scenario.

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@michalpristas michalpristas self-assigned this Oct 6, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 6, 2020
@michalpristas michalpristas marked this pull request as ready for review October 6, 2020 14:23
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@michalpristas
Copy link
Contributor Author

/package

@@ -117,7 +117,7 @@ func (a *Application) Name() string {

// Started returns true if the application is started.
func (a *Application) Started() bool {
return a.state.Status != state.Stopped
return a.state.Status != state.Stopped && a.state.Status != state.Crashed && a.state.Status != state.Failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

I think Started() is trying to say if the application was started at all. If it is crashed or failed you don't want Start() to be called again, because that state is handled internally in the application abstraction.

Copy link
Contributor Author

@michalpristas michalpristas Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes in case verification fails, it sets state to FAILING
when it passes in second round is tries to start the app but according to this check it is already running so it tries to configure it right away.

i just unified this check with the one in start.go so start.go should be unaffected but operation_start.Check should start functioning as intended (not skipping operation)

@@ -39,7 +39,7 @@ func (a *Application) start(ctx context.Context, t app.Taggable, cfg map[string]
}()

// already started if not stopped or crashed
if a.state.Status != state.Stopped && a.state.Status != state.Crashed && a.state.Status != state.Failed {
if a.Started() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal start() is called to restart on a crash. We sure this change with the above change is not going to affect that behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ok we check it only 2 places here (which stays unaffected, it just looks more readable)
and operation_start.Check which is called on config resolution

@blakerouse
Copy link
Contributor

I think this change will fix #21120. As that issue has to do with concurrency of updates, that this added mutex will solve.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 6, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21573 updated]

  • Start Time: 2020-10-06T16:06:57.779+0000

  • Duration: 42 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 1386
Skipped 4
Total 1390

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Looks good.

@michalpristas michalpristas merged commit 306332f into elastic:master Oct 6, 2020
michalpristas added a commit to michalpristas/beats that referenced this pull request Oct 6, 2020
michalpristas added a commit that referenced this pull request Oct 6, 2020
[Ingest Manager] Improved verify experience (#21573)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug enhancement Ingest Management:beta2 Group issues for ingest management beta2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants