Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove --verbose from rock entrypoint #495

Merged

Conversation

linostar
Copy link
Collaborator

@linostar linostar commented Feb 26, 2024

  • Have you signed the CLA?

This was triggered by the existing pebble bug at canonical/pebble#339.

Action to be taken in Rockcraft:

  • remove --verbose from the rock’s entrypoint. That should always have been the default behavior, also because pebble doesn’t have a --quiet option. Those who want verbose mode, can always still pass --verbose at container deployment time.

@linostar linostar marked this pull request as ready for review February 27, 2024 08:30
@linostar
Copy link
Collaborator Author

@cjdcordeiro This should be ready for review.

@tigarmo
Copy link
Collaborator

tigarmo commented Feb 27, 2024

@linostar @cjdcordeiro there are many spread failures due to this change that need to be addressed. In particular, a concerning kind of failure is that many tests check docker logs and now the output of the services isn't going to the log at all, apparently (e.g. this check is failing).

I think this is a breaking change to pull on the users - the logs for their rocks will suddenly be empty and they won't know until they check them (presumably because something broke and they need to check the logs)

Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to fix the spread tests that expect logs to be coming out by default. I suggest you fix them in a way that 1) in some cases you explicitly pass the --verbose to then assert the output, and 2) in other cases there's simply no service output

@cjdcordeiro
Copy link
Collaborator

I think this is a breaking change to pull on the users - the logs for their rocks will suddenly be empty and they won't know until they check them (presumably because something broke and they need to check the logs)

Not necessarily a breaking change, but indeed something that is easy to miss. @linostar is giving people a heads-up on Discourse and later on in Matrix

The --verbose was introduced in response to an original request from @simskij (Simon please be aware of this incoming change and speak out if it's gonna break something for you).

TBH, this was an oversight because:

  1. for logs, there's the pebble logs command that can be used at any time after container deployment
  2. there's the --verbose flag that can also be passed at container deployment, but we don't have a --quiet flag. So by passing --verbose by default, we're not giving a choice to silence the output
  3. in cases where you'd like to do something like docker run ... exec, the default verbosity is just polluting the output

@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from b2bf3aa to 79caab3 Compare March 8, 2024 08:59
@simskij
Copy link
Member

simskij commented Mar 8, 2024

I might very well be mistaken as I'm not too familiar with the internals of pebble and the integration with juju, and in that case, please enlighten me, but if I'm not completely mistaken this change potentially breaks the entire charm log flow, as logs no longer will be picked up in juju debug-logs since they get their logs from pebble's stdout.

I'm also interested in knowing whether this change affects the pebble log forwarding feature at all or whether that will remain unaffected by the change prior to this change landing in main.

@cjdcordeiro
Copy link
Collaborator

thanks @simskij . that's important.

let me add @tonyandrewmeyer and @benhoyt here, since this is also related to Pebble and the operator framework.

@benhoyt, in short, the idea is that rocks would have pebble run without the --verbose from now on. Seeing @simskij's message above, do you see any way a charmer could enable the --verbose at deployment time?

@benhoyt
Copy link
Contributor

benhoyt commented Mar 12, 2024

This is definitely important, however ... when Juju starts the workload containers it starts pebble with these args (code here):

/charm/bin/pebble run --create-dirs --hold --http=:<api-port> --verbose

So --verbose is explicitly specified to K8s in the command args, and as such, I don't think this breaks the Juju logging flow.

In addition, does juju debug-log pick up workload container logs now? I don't think it does, just logs from the charm container (and hence Ops framework).

This change also does not affect Pebble's log forwarding feature at all. That's configured in the Pebble config layer and is internal to Pebble.

So I think we're good here as far as charming goes. Unless I'm missing something, it seems to me this would only apply when you're running a Rock via docker at the command line.

That said, I think it's worth testing end-to-end with Juju and a Rock built with this non-verbose rockcraft change, just to be sure we're not regressing here!

@cjdcordeiro
Copy link
Collaborator

Tyvm @benhoyt ! That's useful.

@linostar will take this opportunity to get started with Juju and charms, and give it a try.

But @simskij , it would be paramount to have your appraisal, since you already have proper use cases. On your side, I think the test could be quite simple. Just take an existing rock, and do

echo '''
FROM <your-rock>
ENTRYPOINT ["pebble", "enter"]
''' | docker build -t <your-quiet-rock> -

Then use this new rock in your charm and assess the impact.

@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch 5 times, most recently from 5bd2e84 to e29459f Compare March 27, 2024 13:10
@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from e29459f to d094e94 Compare March 27, 2024 14:42
@linostar
Copy link
Collaborator Author

linostar commented Apr 2, 2024

@benhoyt
So I built 2 hello-world rocks, one called hello that has --verbose pebble option, and the other called hola without --verbose option.
Next, I created a charm for each of those 2 rocks, using the rock as a charm resource. I created a pebble layer in the charms that has a service that just runs the hello binary.
After deploying each charm, the juju debug-log output looks similar. The error logged is just because the hello binary don't stay running in the background and exits quickly.

hello charm debug logs

----- Logs from task 0 -----
2024-04-02T14:23:54Z INFO Most recent service output:
    Hello, world!
2024-04-02T14:23:54Z ERROR cannot start service: exited quickly with code 0
-----
unit-hello-charm-0: 17:23:55 ERROR juju.worker.uniter.operation hook "hello-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-hello-charm-0: 17:23:55 ERROR juju.worker.uniter pebble poll failed for container "hello": failed to send pebble-ready event: hook failed
unit-hello-charm-0: 17:23:59 DEBUG unit.hello-charm/0.juju-log ops 2.12.0 up and running.
unit-hello-charm-0: 17:23:59 DEBUG unit.hello-charm/0.juju-log Emitting Juju event hello_pebble_ready.
unit-hello-charm-0: 17:23:59 ERROR unit.hello-charm/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/./src/charm.py", line 46, in <module>
    ops.main(HelloCharm)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 555, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/./src/charm.py", line 25, in _on_hello_pebble_ready
    container.replan()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/model.py", line 2129, in replan
    self._pebble.replan_services()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/pebble.py", line 1905, in replan_services
    return self._services_action('replan', [], timeout, delay)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/pebble.py", line 1989, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "hello-service" (cannot start service: exited quickly with code 0)

hola charm debug logs

----- Logs from task 0 -----
2024-04-02T14:05:01Z INFO Most recent service output:
    Hello, world!
2024-04-02T14:05:01Z ERROR cannot start service: exited quickly with code 0
-----
unit-hello-charm-0: 17:05:02 ERROR juju.worker.uniter.operation hook "hello-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-hello-charm-0: 17:05:02 ERROR juju.worker.uniter pebble poll failed for container "hello": failed to send pebble-ready event: hook failed
unit-hello-charm-0: 17:05:06 DEBUG unit.hello-charm/0.juju-log ops 2.12.0 up and running.
unit-hello-charm-0: 17:05:06 DEBUG unit.hello-charm/0.juju-log Emitting Juju event hello_pebble_ready.
unit-hello-charm-0: 17:05:06 ERROR unit.hello-charm/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/./src/charm.py", line 46, in <module>
    ops.main(HelloCharm)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 555, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/./src/charm.py", line 25, in _on_hello_pebble_ready
    container.replan()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/model.py", line 2129, in replan
    self._pebble.replan_services()
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/pebble.py", line 1905, in replan_services
    return self._services_action('replan', [], timeout, delay)
  File "/var/lib/juju/agents/unit-hello-charm-0/charm/venv/ops/pebble.py", line 1989, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "hello-service" (cannot start service: exited quickly with code 0)

So basically, juju debug-log doesn't seem to log the output of pebble enter --verbose anyway.
Let me know if there is another use case you want to test.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 4, 2024
@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from 4f4f25c to 8c7b5bb Compare June 4, 2024 14:02
@linostar
Copy link
Collaborator Author

linostar commented Jun 4, 2024

@tigarmo @cjdcordeiro Ready for review.

@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from 3a0e5d3 to 16e6e71 Compare June 6, 2024 08:29
Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks

@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from 16e6e71 to 27e5257 Compare June 6, 2024 14:21
@cjdcordeiro cjdcordeiro self-requested a review June 6, 2024 17:37
Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a test is still failing. pls re-request review once it passes

@linostar
Copy link
Collaborator Author

linostar commented Jun 6, 2024

I see a test is still failing. pls re-request review once it passes

The error in question is unrelated and is due to fedora 37 not available for workers any more. I'm trying to fix that in PR #580.

@linostar linostar requested a review from cjdcordeiro June 6, 2024 18:28
@linostar
Copy link
Collaborator Author

linostar commented Jun 7, 2024

All related spread tests seem to pass, but some tests are being killed to timeout due to some slowness in spread workers (the exception is the fedora worker addressed in PR #580). Even when after re-running the spread tests several times, different tests fail every time.

@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch 3 times, most recently from 7fb0a41 to 2043e90 Compare June 7, 2024 14:08
@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from 2043e90 to 2ac7088 Compare June 7, 2024 15:52
@linostar linostar force-pushed the ROCKS-1019/drop-verbose-from-rock-entrypoint branch from 14d0add to 0a255d8 Compare June 7, 2024 19:21
@linostar
Copy link
Collaborator Author

linostar commented Jun 10, 2024

@tigarmo can you please merge this before something else get merged in main and cause a new conflict?

@lengau lengau merged commit af0ee81 into canonical:main Jun 11, 2024
13 checks passed
@claudiubelu
Copy link

Hello.

I was wondering why some rocks randomly had service logs and other did not. It seems that the difference was the rockcraft version being used, and I eventually found the reason: this PR.

This is a bit late at this point, but I'd like the --verbose flag to be back, or some rockcraft.yaml option to add it (False by default).

We are using rockcraft to build rocks meant to be used in Kubernetes directly, used in various Helm charts for various operators, all without any juju involved at any stage. Because the entrypoint is always Pebble, when you execute kubectl logs, you'd always get the Pebble logs instead of the actual service you'd actually want to see. Even while creating and testing our rocks, we've had plenty of mishaps because of this. For example, in one case we had a rock built and was testing in Kubernetes, the service was starting, the liveness probe was working, however, the Pod would eventually die and end up in a CrashLoopBackOff state... and we'd have no way of actually knowing why that happened in the first place. In Kubernetes, you can run kubectl logs to see the Pod logs, even if that Pod is not Running. However, in our case we would only see the Pebble logs, which were of no use. Running kubectl exec into the Pod to run pebble logs was not option because the Pod has to be Running to do that. Changing the Pebble service's on-failure / on-success options would be a very time-consuming option to debug this further (the rock would have to be rebuilt and pushed to a registry), and it wouldn't always work anyways (some Pods may have Readiness probes, which, if if they fail, the Pod is killed). Instead, we edited the daemonset's / deployment's command to bypass Pebble entirely, just so we can debug this issue further. But in terms of user experience, something like this may not be the first thing that would come to mind for different users that would try to use that image, they might not even know what a rock or Pebble is, all they would see would be some confusing logs from some unknown service that seems to be running instead of what they were expecting to see running, especially since the rocks we're building are meant to be drop-in replacements for other images (or at least as close as it can be). This wouldn't have been the case if the rocks were --verbose in the first place.

I see a lot of comments here are related to rockcraft being used in juju-related cases, that juju is already adding this flag anyways. IMO, that shouldn't have been a reason to remove this flag, especially since someone requested it in the first place, but an option to enable / disable it would have been great.

I am willing to work on adding that option to rockcraft, especially since I've been looking into rockcraft internals for a bit.

Best regards,

Claudiu

@cjdcordeiro
Copy link
Collaborator

Hi Claudiu,

thank you for the detailed feedback and for sharing your use case.

This was indeed an intentional change, that despite its relation to Juju, was meant to fix what we believed to be an undesired default behavior -> being too verbose.

While it might sound obvious that one always wants their service to print out its logs to STDOUT by default, in the case of a rock we must consider other factors:

  • we're using an init system - Pebble - and thus, as you correctly mentioned, some logs would be coming from Pebble while others come from services
  • having --verbose by default might sound obvious for a single service, but can quickly become too noisy if one has multiple services
  • using --verbose with Pebble commands other than run would also cause unnecessary noise.

The goal here is that one can always add --verbose at runtime, but you cannot "silence" Pebble if --verbose is hardcoded into the entrypoint.

In the near future, we might be considering some new features that may be useful for specific logging cases, like Log Forwarding and the ability to add more control, at build time, to what the OCI cmd should be.

In the meantime, adding -v at runtime is the only way to reproduce the old behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants