Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pbench-sync-satellite problems #1010

Open
ndokos opened this issue Dec 17, 2018 · 2 comments
Open

pbench-sync-satellite problems #1010

ndokos opened this issue Dec 17, 2018 · 2 comments

Comments

@ndokos
Copy link
Member

ndokos commented Dec 17, 2018

There are several problems with pbench-sync-satellite:

  • The initial problem (and the one that drove the following ones) is that there is no coordination between the satellite specification in the server's config file and the actual configuration of the satellite. In one particular case, the satellite-archive specification in the server's config file was pointing to an existing but non-functional place in the satellite's hierarchy. We need to correlate and sanity-check these settings or generate them from a common configuration.

  • Investigate whether there is any need for the satellite-archive spec in the server config. It is passed as an argument by pbench-remote-satellite-state-change, but the remote script that is invoked on the satellite could retrieve that from its own configuration, rather than depending on the server to pass that information.

  • If (as above), pbench-sync-satellite is unable to perform the remote state change, the state change file is left as is and causes subsequent invocations to fail, even before the script has a chance to discover if there is anything that it needs to do.

  • The script creates a directory for each run where it stores various working files. These directories persist. We need to get rid of them and summarize the working files into a single log file (perhaps one per satellite). DONE: see note below.

  • The calls to pbench-report-status use --type error and apparently send the same text with the same id to ES. It's not quite clear what fails exactly but there are messages like these in the log:

run-2018-12-14T20:12:04-UTC: start - 2018-12-14T20:12:04-UTC
pbench-report-status: curl failed (exit code: "0", HTTP code: "200")
Response body:
{"_index":"ndk.pbench.pbench-sync-satellite.2018-12","_type":"error","_id":"69abec5f935
303a662cbbe4003d54c7c","_version":2,"created":false}
run-2018-12-14T20:13:04-UTC: start - 2018-12-14T20:13:04-UTC
pbench-report-status: curl failed (exit code: "0", HTTP code: "200")
Response body:
{"_index":"ndk.pbench.pbench-sync-satellite.2018-12","_type":"error","_id":"69abec5f935
303a662cbbe4003d54c7c","_version":3,"created":false}
...

and so on, ad infinitum. This may be standard ES behavior, but we probably need to avoid sending the exact same payload multiple times, or check the code that curl returns more carefully and report more accurately.

  • It might be a good idea to split the sync by sizes (similarly to how we now handle the unpacking of tar balls): small tarballs are synced by one process, medium ones by another, large ones by another. We might even be able to run these in parallel, although it's not clear to me if that would be a net win: we'd have to be careful to not overtax the satellite server with excessive demands on its memory, its disk or its network interface.
@ndokos ndokos added this to the v0.55 milestone Dec 17, 2018
@ndokos ndokos self-assigned this Dec 17, 2018
@ndokos
Copy link
Member Author

ndokos commented Dec 17, 2018

For bullet no.2 above, the remote script would have to know where the server config file is: Yet Another Chicken And Egg problem.

@portante portante modified the milestones: v0.55, v0.56 Jan 25, 2019
@portante portante modified the milestones: v0.56, v0.57 Feb 6, 2019
@portante portante modified the milestones: v0.57, v0.58 Mar 8, 2019
@portante portante modified the milestones: v0.58, v0.59 Mar 18, 2019
@portante portante modified the milestones: v0.59, v0.60 Apr 29, 2019
@portante portante modified the milestones: v0.60, v0.61 May 24, 2019
@portante portante modified the milestones: v0.61, v0.62 Jul 8, 2019
@ndokos ndokos assigned tenstormavi and unassigned ndokos Jul 9, 2019
@portante portante modified the milestones: v0.62, v0.63 Jul 11, 2019
@portante portante removed this from the v0.63 milestone Aug 1, 2019
@portante portante added this to the v0.64 milestone Aug 1, 2019
@portante portante removed this from the v0.64 milestone Aug 9, 2019
@ndokos
Copy link
Member Author

ndokos commented Oct 17, 2019

Bullet #4 above has been dealt with already by PR #1354.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants