Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve step logging #3722

Merged
merged 23 commits into from
Jun 13, 2024
Merged

Improve step logging #3722

merged 23 commits into from
Jun 13, 2024

Conversation

anbraten
Copy link
Member

@anbraten anbraten commented May 23, 2024

Some testing similar to #2072

closes #2064
closes #3468
closes #2253
closes #3716
related #2072

Enhancements

  • don't store all logs in array (a left over from previous refactorings)
  • use read buffer
  • limit max size per log line to 3mb
  • read actually line by line (don't only trust that writer.Write is called once per line)
  • allow opening stream for pending steps to get first log message quicker
  • change grpc log data from string to bytes to be more robust with non-utf8 data
  • send multiple log lines as batch (would this even help and how does it behave with large log lines 🤔)

Some inspiration from

@anbraten anbraten added server agent refactor delete or replace old code build_pr_images If set, the CI will build images for this PR and push to Dockerhub labels May 23, 2024
@woodpecker-bot
Copy link
Collaborator

woodpecker-bot commented May 23, 2024

Deployment of preview was successful: https://woodpecker-ci-woodpecker-pr-3722.surge.sh

@anbraten
Copy link
Member Author

anbraten commented May 23, 2024

@zc-devs @fernandrone @eliasscosta @praveen-livspace @davidhazell-tst (Hope all of you are actually interested in this topic, sorry if not)

I did some refactoring and hopefully achieved some improvements regarding the log streaming. Using the below workflow for testing I haven't received any more errors now. Could you verify that the :pull_3722 image solves your issues as well.

skip_clone: true

steps:
    - name: hello
      image: alpine
      commands: 
        - echo "hello world"
        - date +%s%N
        - apk add openssl
        - openssl rand 10000000 | base64 -w 0
        - date +%s%N
        - echo done

CC @6543 Thanks for your initial work in #2072

@anbraten anbraten changed the title Improve logger Improve step logging May 23, 2024
@zc-devs
Copy link
Contributor

zc-devs commented May 24, 2024

It does not always work for me.

  1. Often, the end of log doesn't appear until page refreshing
    Screenshot 2024-05-24 121548
    F5
    Screenshot 2024-05-24 121632
    Even though page refreshes automatically a couple of times (step list and logs components disappear/appear).
  2. Sometimes step hasn't started yet
    Screenshot 2024-05-24 123628
    F5
    Screenshot 2024-05-24 123702
  3. Non-UTF stuff doesn't work (openssl rand 1000)
    Screenshot 2024-05-24 125549
    Cancel
    Screenshot 2024-05-24 125704
    agent.log

1, 2 probably are not related here, cause it is UI-Server interaction. 3 is mentioned in the PR, but I don't know if you actually want to fix it in this PR though.


limit max size per log line to 3mb

Seems, it works. Previously it was over 400 lines. Currently it is near 20:

woodpecker=# SELECT t.line, left(convert_from(t.data, 'UTF-8'), 30), octet_length(t.data::text) FROM log_entries AS t WHERE step_id = 535;

 line |              left              | octet_length
------+--------------------------------+--------------
    0 | + echo "hello world"           |           42
    1 | hello world                    |           24
    2 | + date +%s%N                   |           26
    3 | 1716544320                     |           22
    4 | + apk add openssl              |           36
    5 | fetch https://dl-cdn.alpinelin |          156
    6 | fetch https://dl-cdn.alpinelin |          166
    7 | (1/1) Installing openssl (3.3. |           72
    8 | Executing busybox-1.36.1-r28.t |           74
    9 | OK: 9 MiB in 15 packages       |           50
   10 | + openssl rand 10000000 | base |           76
   11 | nR/CT1Cy5ikpYMdOX9fqHrR6JNxX9V |      6291458
   12 | 6L0fZCCf5fufzoxtxGarbwstEyL1UH |      6291458
   13 | zHy74s1DcMm6SCTUuMC5D3/qC/38e1 |      6291458
   14 | SFgfKlnu3/WJxs+XCAZcZLpfLk24bh |      6291458
   15 | BYJdQCUwsmpImaAbz4Mwdzk4OoNWYk |      1500874
   16 | 1716544324                     |           22
   17 | + echo done                    |           24
   18 | done                           |           10
(19 rows)

@anbraten
Copy link
Member Author

anbraten commented May 24, 2024

Thanks a lot for the testing and detailed reporting.

Often, the end of log doesn't appear until page refreshing

Strange. I haven't noticed such mismatches in the UI (like showing logs and the step hasn't started yet at the same time) yet. It seems that it is somehow missing data send via the SSE stream. Could you check if the browser console shows any errors?

The bug that previous logs are not in the list when reloading with f5 is still a thing I have to fix (maybe in another PR).

Non-UTF stuff doesn't work (openssl rand 1000)

Will check that one again.

@anbraten
Copy link
Member Author

Changing grpc log messages to use bytes instead of string seems to have fixed openssl rand 1000 now as well.

@zc-devs
Copy link
Contributor

zc-devs commented May 24, 2024

Yeeaa, it works now! Thank you!

Screenshot 2024-05-24 211854
woodpecker-agent.log

@zc-devs
Copy link
Contributor

zc-devs commented May 24, 2024

Could you check if the browser console shows any errors?

2 is partially my fault.

I open the test repo in WP and then manually run the pipeline. To gather clean logs, I often restart the Server and Agent pods before run the pipeline. Then download the logs from OpenLens.

So, I ran the pipeline a couple of times. All went well, there were no errors in console. Then I restarted the Server and got them. Then I manually ran the pipeline and got step hasn't started yet.
Screenshot 2024-05-24 213033
Screenshot 2024-05-24 213135
The event call was 404'ed and didn't try to reconnect. After refreshing the page all goes OK again.

However, look at four failed api/stream/logs/... requests. This is from ran the pipeline a couple of times mentioned above. Seems, streams do not close after leaving pipeline execution page (to repo page).
Edit: and even to repo list page - Screenshot 2024-05-24 221819.

1 reproduces sometimes with 10M log.
I run pipeline. Some piece of logs displayed, fill the display at least. Then step list and logs components disappear/appear again. Then I scroll down to the end - log is trimmed. There are no failed HTTP requests, but an error in the console unexpected non-whitespace character after JSON data
Screenshot 2024-05-24 221205

@anbraten
Copy link
Member Author

The event call was 404'ed and didn't try to reconnect. After refreshing the page all goes OK again.

I think that is basically how SSE works. I will probably have a look at the migration to a websocket at some point.

Seems, streams do not close after leaving pipeline execution page (to repo page).

That should be fixed now.

1 reproduces sometimes with 10M log.
I run pipeline. Some piece of logs displayed, fill the display at least. Then step list and logs components disappear/appear again. Then I scroll down to the end - log is trimmed. There are no failed HTTP requests, but an error in the console unexpected non-whitespace character after JSON data

Might have fixed that one now.

@davidhazell-tst
Copy link

davidhazell-tst commented May 28, 2024

@anbraten I tried upgrading my agents to :pull_3722 and am seeing a GRPC version mismatch error:

{"level":"info","time":"2024-05-24T18:32:23Z","caller":"/src/shared/logger/logger.go:101","message":"log level: debug"}
{"level":"error","error":"GRPC version mismatch","time":"2024-05-24T18:32:23Z","caller":"/src/cmd/agent/core/agent.go:143","message":"server version 2.4.1 does report grpc version 7 but we only understand 8"}
{"level":"fatal","error":"GRPC version mismatch","time":"2024-05-24T18:32:23Z","caller":"/src/cmd/agent/core/run.go:51","message":"error running agent"}

Do I need to also update my server version to :pull_3722?

I am currently running server version docker.io/woodpeckerci/woodpecker-server:v2.4.1

@anbraten
Copy link
Member Author

@davidhazell-tst Yes I later on updated some grpc related code and you therefore need to update the server as well. (It only has a tiny change, so there wont be any data migration and you can simply switch back if wanted)

@zc-devs
Copy link
Contributor

zc-devs commented May 28, 2024

docker.io/woodpeckerci/woodpecker-agent@sha256:328e3662285ce545e4081ef30299c039af72cd09a269bc642c24a51fc479acb2

streams do not close after leaving pipeline execution page (to repo page)

still reproduces
Screenshot 2024-05-28 183709

Might have fixed that one now

As well as 1
Screenshot 2024-05-28 184941
two times out of five it fails
Screenshot 2024-05-28 185715

Besides, I've got This step hasn't started yet at sixth run without restarting the server
Screenshot 2024-05-28 190140

woodpecker-server.log

@davidhazell-tst
Copy link

@anbraten We applied the new builds to both the server and agent and our issue appears to be resolved.

@anbraten
Copy link
Member Author

anbraten commented Jun 5, 2024

I still had no success replicating the mentioned issues with streams being not stopped, re-connected, etc. Might that even be related to some reverse-proxy setup 🤔 However as the general feedback seems that the changes are already quite an improvement I would move further improvements to new PRs to merge this one.

@anbraten anbraten marked this pull request as ready for review June 5, 2024 16:54
@anbraten anbraten requested a review from a team June 6, 2024 13:29
@qwerty287 qwerty287 added this to the 2.6.0 milestone Jun 6, 2024
cli/exec/exec.go Show resolved Hide resolved
agent/logger.go Outdated Show resolved Hide resolved
pipeline/log/utils.go Show resolved Hide resolved
Copy link
Contributor

@qwerty287 qwerty287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, untested from my side. I'm still not gappy with splitting long lines, but I understand that it's hard to change that

@anbraten anbraten enabled auto-merge (squash) June 13, 2024 14:35
Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 22.38267% with 215 lines in your changes missing coverage. Please review.

Project coverage is 25.84%. Comparing base (044c3a6) to head (041c2c3).
Report is 5 commits behind head on main.

Files Patch % Lines
pipeline/rpc/mocks/peer.go 7.78% 151 Missing and 3 partials ⚠️
pipeline/rpc/proto/woodpecker_grpc.pb.go 0.00% 22 Missing ⚠️
pipeline/log/utils.go 67.74% 7 Missing and 3 partials ⚠️
agent/logger.go 0.00% 7 Missing ⚠️
cli/exec/line.go 0.00% 7 Missing ⚠️
server/api/stream.go 0.00% 7 Missing ⚠️
pipeline/log/line_writer.go 90.32% 2 Missing and 1 partial ⚠️
cli/exec/exec.go 0.00% 2 Missing ⚠️
pipeline/rpc/proto/woodpecker.pb.go 0.00% 2 Missing ⚠️
server/grpc/rpc.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3722      +/-   ##
==========================================
+ Coverage   25.81%   25.84%   +0.03%     
==========================================
  Files         362      365       +3     
  Lines       26725    26913     +188     
==========================================
+ Hits         6899     6956      +57     
- Misses      19293    19417     +124     
- Partials      533      540       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anbraten anbraten merged commit 8b387e7 into main Jun 13, 2024
8 of 9 checks passed
@anbraten anbraten deleted the improve-logger branch June 13, 2024 15:18
@woodpecker-bot woodpecker-bot mentioned this pull request Jun 13, 2024
1 task
@anbraten anbraten mentioned this pull request Jun 13, 2024
8 tasks
6543 pushed a commit to 6543-forks/woodpecker that referenced this pull request Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent build_pr_images If set, the CI will build images for this PR and push to Dockerhub refactor delete or replace old code server
Projects
None yet
5 participants