Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test reporter logging is owned by root if first work item is docker work item #9208

Closed
MattGal opened this issue Apr 27, 2022 · 8 comments · Fixed by dotnet/dotnet-buildtools-prereqs-docker#614 or dotnet/runtime#68859

Comments

@MattGal
Copy link
Member

MattGal commented Apr 27, 2022

It seems in the scenario where the first work item a machine ever runs is in a docker scenario and uses the reporter, that machine will not be able to appened to the runner logs, as noted below:

bad machine: run.py (reporter log) is owned by root:

helixbot@a00C88M:~/dotnetbuild/logs$ ls -al
total 14048
drwxrwxrwx 2 helixbot root         4096 Apr 27 22:32 .
drwxr-xr-x 4 root     root         4096 Apr 27 18:14 ..
-rw-r--r-- 1 helixbot helixbot        0 Apr 27 22:20 client_0_last_started
-rw-r--r-- 1 helixbot helixbot     6415 Apr 27 18:14 installer.log
-rw-r--r-- 1 root     root            0 Apr 27 18:17 run.py.err
-rw-r--r-- 1 root     root         2951 Apr 27 21:43 run.py.log
-rw-r--r-- 1 helixbot helixbot     6377 Apr 27 21:43 run_client.py.err
-rw-r--r-- 1 helixbot helixbot 14279979 Apr 27 22:55 run_client.py.log
-rw-r--r-- 1 helixbot helixbot        0 Apr 27 18:14 run_heartbeat.py.err
-rw-r--r-- 1 helixbot helixbot    61350 Apr 27 22:55 run_heartbeat.py.log
-rw-r--r-- 1 helixbot helixbot        0 Apr 27 18:14 start_helix.py.err
-rw-r--r-- 1 helixbot helixbot      194 Apr 27 18:14 start_helix.py.log

Good machine: run.py (reporter log) is owned by helixbot:

helixbot@a00C8C3:~/dotnetbuild/logs$ ls -al
total 2244
drwxrwxrwx 2 helixbot root        4096 Apr 27 22:57 .
drwxr-xr-x 4 root     root        4096 Apr 27 22:47 ..
-rw-r--r-- 1 helixbot helixbot       0 Apr 27 22:56 client_0_last_started
-rw-r--r-- 1 helixbot helixbot    6415 Apr 27 22:47 installer.log
-rw-r--r-- 1 helixbot helixbot       0 Apr 27 22:48 run.py.err
-rw-r--r-- 1 helixbot helixbot    8019 Apr 27 22:57 run.py.log
-rw-r--r-- 1 helixbot helixbot       0 Apr 27 22:47 run_client.py.err
-rw-r--r-- 1 helixbot helixbot 2261292 Apr 27 22:57 run_client.py.log
-rw-r--r-- 1 helixbot helixbot       0 Apr 27 22:47 run_heartbeat.py.err
-rw-r--r-- 1 helixbot helixbot    2390 Apr 27 22:57 run_heartbeat.py.log
-rw-r--r-- 1 helixbot helixbot       0 Apr 27 22:47 start_helix.py.err
-rw-r--r-- 1 helixbot helixbot     194 Apr 27 22:47 start_helix.py.log

We could just change ownership of all log files every work item, but as a shorter term fix that's much safer I will explore making the packing test reporter only log to the console; this will still be logged, and will not create files across the mounted volume with incorrect user ownership.

Adding Error message to track impact

{ "ErrorMessage":"PermissionError: [Errno 13] Permission denied: '/root/helix/logs/run.py.log'" } 

Report

Summary

Day Hit Count Week Hit Count Month Hit Count
0 0 0
@MattGal MattGal changed the title Remove helix logging APIs from test reporter Remove helix logging APIs from test reporter (use console) Apr 27, 2022
@MattGal MattGal changed the title Remove helix logging APIs from test reporter (use console) Test reporter logging is owned by root if first work item is docker work item Apr 27, 2022
@MattGal
Copy link
Member Author

MattGal commented Apr 27, 2022

@MattGal
Copy link
Member Author

MattGal commented Apr 28, 2022

PR is merged and should roll out next week. Changing the reporter to only use console logging involves stripping out logging from the regular helix client so I left it alone.

@premun
Copy link
Member

premun commented May 2, 2022

The fix from Matt was reverted because it was causing Linux images to get into a boot loop. I reverted the change and the build is green again but I don't know the real cause. More details in #9217

I moved this issue back into backlog

@MattGal
Copy link
Member Author

MattGal commented May 2, 2022

The fix from Matt was reverted because it was causing Linux images to get into a boot loop. I reverted the change and the build is green again but I don't know the real cause. More details in #9217

I moved this issue back into backlog

Color me skeptical because a linux-boot-related change was happening before this, but I'll work on investigating today.

@MattGal
Copy link
Member Author

MattGal commented May 2, 2022

ah it was old linux versions, drats. https://dnceng.visualstudio.com/internal/_git/dotnet-helix-machines/pullrequest/22617 should fix this and add some logging for why reboots occur.

@MattGal
Copy link
Member Author

MattGal commented May 3, 2022

I just discussed this with @ilyas1974 and since Image Factory is broken (they've updated a bunch of stuff that broke us and the Image Factory team has not been able to roll back yet) and this is hitting a broad swath of machines I will prepare a hotfix for it.

@MattGal
Copy link
Member Author

MattGal commented May 3, 2022

@danmoseley FYI

@MattGal
Copy link
Member Author

MattGal commented May 3, 2022

I figured out where this comes from. Mostly the docker tag mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-webassembly-20220408155625-5bc1463 is missing this bit, which causes the user inside the container to "match" outside the container:

# create helixbot user and give rights to sudo without password
RUN /usr/sbin/adduser --disabled-password --gecos '' --uid 1000 --shell /bin/bash --ingroup adm helixbot && \
    chmod 755 /root && \
    echo "helixbot ALL=(ALL)       NOPASSWD: ALL" > /etc/sudoers

Hunting down the addition of this docker tag, I see it was added by thaystg seven days ago in dotnet/runtime#61776. @thaystg, we're trying to make this not break others in the future but ideally we should prepare a "user is UID 1000, and named helixbot" version of this for Helix testing if at all possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment