-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Check init tasks #478
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before going on with the PR, I think we should clarify this:
If we don't find a default user set, than we allow fallback to the interative user creation, i.e. the default upstream prompting.
I think we said that if we don’t find anything in wsl.conf, we still list all users and take the first one with UID >= 1000? I can imagine people using cloud-init use the standard user creation creation but don’t know about wsl.conf and we want to support them.
My suggestion would be, once the launcher waited on cloud-init as you did in that PR:
- look at
wsl.conf
, if there is a value there, ensure the user exists on the system (withgetent passwd
), update the windows registry - if nothing,
getent passwd
list all users with UID >= 1000. If found, update wsl.conf and the windows registry - otherwise, fallback to interactive user creation.
Does this make sense?
While it does make sense, I've been resisting to move towards that direction because it becomes harder (IMHO) to explain to users why certain cloud-init user-data behaves in one way when the distro is set up via the launcher and differently when through |
It does not depend on cloud-init directly as we don’t really care the way the user was created on the machine. I can be a pre-created image with a default user and they didn’t create/know about We don’t want those users to grow that knowledge and the launcher should do the right thing silently. @jibel opinions? |
Looking at the order of priority to consider a default user for WSL it's: So we should check:
Then create an entry in wsl.conf based on this discovery if there is none so wsl -d starts with the default user |
bef09f7
to
8e73ede
Compare
I changed the PR description to match the current implementation after our discussion. The changes in the description are marked as shown below: EDIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are getting there! I still see some discrepancy with what was discussed on MM though which I think we should fix before merging:
- We told that the order is (from what wsl command is considering):
- wsl.conf
- key in registry
If I am correct, we don’t check the registry here, only set it. I think similarly than wsl.conf, we should consider it as "a default has been set".
-
You are considering a range rather than user >= 1000. I think you are right to not consider the nobody user, but there are still some other potentials users with a way larger uid when they are registered through Active Directory for instance or any other dynamic "bindings". I think the approach to ignore some users is correct and we should filter by the shell associated with it.
-
I see that you try to "fix" any invalid wsl.conf (and if the registry is checked, I think you would do the same). I am more in favor of "so be it". If wsl.conf or the registry are configured, that was an intent by the rootfs owner or by the cloud-init which was set. If this is set, let’s not do anything, no double checking, no fix and error out if this is incorrect.
That way, we don’t diverge from "we have a cloud-init" behavior to "there is none" with the wsl command itself?
So in a nutshell, I think we should:
- if there is
wsl.conf
user entry -> no additional check, no user creation, let’s then wsl run with this user (even if it doesn’t exist) - else if there is a registry key -> no additional check, no user creation, let’s then wsl run with this user (even if it doesn’t exist)
- else if there is one: get first user with uid >= 1000 and an associated shell (no /bin/false, no /usr/sbin/nologin) -> set it as default in wsl.conf (do we need to update the registry if wsl.conf always takes precedence?)
- else -> default to interactive prompting
Does it make sense?
NB: I feel sorry you had to write our own ini parser here. I tried to review the C++ part to the best of my knowledge but I fully trust you on that side :)
DistroLauncher/Ubuntu/InitTasks.cpp
Outdated
// Decomposes a line such as `key = value` into its [key] and [value] components. | ||
std::pair<std::string_view, std::string_view> matchKeyValuePair(std::string_view line); | ||
|
||
// A limited INI "parser" capable of retrieving the value of a [section].key entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a Windows dev expert here, but why do not you use the Windows primitive to read/write the ini file?
For reference: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getprivateprofilestring
It would reduce the size of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That INI-related Win32 functions works will all sorts of local paths, with or without Windows line terminations, but unfortunately not with WSL paths (or network paths in general I assume).
You can check a proof-of-concept on GitHub
If we really want to leverage it. we could copy the file to some temporary location on Windows to parse it using that function. That will certainly be shorter, though it seems a bit strange to me having to copy a file to another location to be able to read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to leverage it. we could copy the file to some temporary location on Windows to parse it using that function. That will certainly be shorter, though it seems a bit strange to me having to copy a file to another location to be able to read it.
That's what I ended up implementing.
Let me interject on this:
The function
Step 3 needs a comment in the code to clarify that behavior, but in fact it's checking for the registry by its effects. If one manually sets the relevant registry key or runs So, consider that a rootfs comes with I'll clarify that in the code, but otherwise I don't believe we need to read the registry directly. Even because that would be more prone to races than what we're doing now. |
About us writing into
I think we can even skip writing that file, because if we set the default user via WSL API (which is the same as writing into the registry) then By avoiding writing it we avoid "fixing" it at the same time. What do you guys think? @didrocks @jibel |
Ack on reading the API indirectly from the registry (yes, a comment on this will help to clarify it) :)
If I am correct, this is still checking the user exists and fix the consistency issue no by changing the registry? Again, I’m not close to the code, but I think basically the source of truth should be:
Basically, we only update one or the two of them, depending on what you told before, if both of them are unset. This is my 2 cents. |
I totally agree on error out if |
I think it is te correct behaviour. If one user is created by cloud-init and at the same time wsl.conf is created from cloud-init but with a different user, we would have the first user in the registry, but wsl.conf would take the precedence and use another user. This is the behaviour we want. What happends if several users are created by cloud-init? Which one would be in the registry? |
Assuming nothing else sets the default user by then, the launcher will iterate over the list of users collected from the passwd database and set as default the lowest UID greater than 1000, which matches the first user written in the cloud-config user-data, per my observation when writing tests (assuming only cloud-init would create non-system users in that scenario). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I really prefer this implementation which is easier to track (partially because of the ini parsing removal, but the comments helps too!)
I only have some leftover comments on the tests themselves:
- I think some condition are coming from the previous implementation
- Also, I feel that maybe we can augment the tests with pre-shipped /etc/passwd, where we have users with uid, but no shell, and various corner cases like this.
Do you mind investigating those?
Note: I don’t think the edit of the description is up to date though. You are still saying:
"user in the NSS passwd database (by launching getent), i.e. any user with UID between 1000 and 65534 (the nobody)."
Also, please remember that the description is used as the commit content of the project, so some care is needed before doing the finale merge to remove the old description to not mislead the user. :)
Replied inline, the condition has meaning to the current test cases leveraging it.
I added a few more test cases in which cloud-init plays a double-agent role, doing its expected job but also deliverying broken
I updated the PR description. I think this covers all the remaining concerns. I'm still being careful with the amount of times we register/unregister WSL instances to avoid too long tests. |
Looks like I need to rebase due doc changes. I'm marking the new commits as usual. |
This affects only the registration time. Subsequent boots don't check for cloud-init in any way. We check if there is a `cloud-init` command in PATH, and launch `cloud-init status --wait` right after registration is complete. This will wait until cloud-init completes what's doing or report error/disabled states. Cloud-init could have created Linux user accounts. Users could have been created in many other ways, even pre-created during image build. We expect the desired default user to be set via `/etc/wsl.conf`. That would work even without the distro launcher, so it's reasonable to assume the desired default to be present there. We "parse" that file and, if there is the default user entry, then we set it via WSL API (which is faster than terminanting and restarting the instance). When "parsing" /etc/wsl.conf (which is INI like syntax), we deliberately stop on the first occurence of the [section].key we're looking for. That matches the behavior observed in WSL itself. If we don't find a default user set, we check if the current default user is still root, in which case we look for any non-system user in the NSS passwd database (by launching getent), i.e. any user with UID between 1000 and 65534 (the nobody). Finally, we fallback to the interative user creation, i.e. the default upstream prompting. We do some effort to keep /etc/wsl.conf and the Windows registry (via WSL API) in sync, in the sense that whatever user we set as default via one means we also set in the other. I originally intended to put this implementation inside DistributionInfo.h/cpp to ensure less files being touched (thus less diff compared to upstream) but that doesn't play nice with CI 'Refresh releases and assets' workflow. So I placed the code in its own component, and inside a subdirectory, so if there comes a time we need to revert that change (it becomes obsolete or so) it remains easy to drop. Also, being inside a subdirectory allows me to drop a .clang-format in there without worrying of reformatting upstream code :) Lessons learned from the launcher old times: - We can put cpp files inside a subdirectory and still work with precompiled headers in Visual Studio if we either adjust the include path for every cpp file or adjust the path to the precompiled header for every cpp file (no winner :) )
And unnecessary assertion for empty stdout That's never the case.
For now that's all we need from the golden testutils we have everywhere else.
It assumed distroName to match the AppID Which is not the case. Ubuntu24.04LTS is the AppID for the distroName Ubuntu-24.04. Thus we need to check for Ubuntu-WX.YZ :) This is yet a bit tricky because nowadays we set the policy during image build, So a rootfs meant to build an LTS app (thus Prompt=never) could be recycled to build Ubuntu-Preview during tests and that would break the assertion. TODO: Verify if CI is happy with just this change.
To check whether a file exists in the distro filesystem.
s/testUserNotRoot/testWhetherUserIsRoot Parameterized :)
To allow reusing those test helpers inside testcases It pessimizes the existing TestBasicSetup a little bit But it's for a good reason.
Every testcase wipes out the WSL instance So not a lot can run in a reasonable way in CI Let's see how long those two will take. We use `<launcher.exe> install` to avoid the interactive shell in the end. If we append `--root` then the launcher should still wait for cloud-init but skip checking the creation of the default user. We check for the specific user set as default, different combinations of cloud-config may create users in different order and set or not in /etc/wsl.conf. This way we check if the launcher does the right thing even without /etc/wsl.conf and if it skips doing anything else when `install --root`.
We have more to do It's reasonable to assume it will take longer :)
Temporarily at least, as we don't have oracular images yet.
once we have backported everything and Ubuntu is transitionned to 24.04
Making sure to explain how we indirectly look for the registry.
With a caveat: that function can only read local files, i.e. in the Windows OS filesystem. No shared mounts. Thus we make an effort to copy the instance's /etc/wsl.conf to a private (best effort) temporary path, using OS primitives to guaratee automatic deletion. From that place, it's just a function call to read the key we are interested in. Let's trust the OS can deliver us a suitable temporary file under %TEMP%. In production that's more likely, as the %TEMP% dir is private to the app when FS virtualization is ON.
If the username is wrong, i.e. there is no matching UID in passwd ignore it and exit success. As a consequence we can totally remove writing into /etc/wsl.conf: 1. We won't fix any mistakes 2. If a default user is set from outside (WSL API/registry), it suffices to make such user the default, no need to keep both in sync. 3. If no default user is set, we do so via the WSL API, which unfolds into the reasoning above. And just to clarify, if both are set to different users, registry takes precedence over /etc/wsl.conf even after reboot.
When approaching cloud-init, run it unconditionally Added a not-so-beautiful trick to avoid printing /bin/bash: error 1: cloud-init command not found. Finally, if it doesn't exit 0, show the long status cloud-init status --long.
Relying only on the UID may be misleading So we wanted to check the login shell (the last field of the line) to determine whether an account is a system or real user. That required parsing the passwd more carefully Using istringstreams as before would incur in potentially lots of small memory allocations. Writing an iteration scheme by hand with string_views prevents most of the unnecessary allocations while offering an elegant, idiomatic iteration and parsing on-the-flight. Most of that code wouldn't be needed in C++20, std::ranges::split would do most of this job, but with C++17 we need to craft the iteration by hand.
For that we need GoWSL, so let's import it. This whole idea is very prone to races. Let's see how it behaves in CI.
3d81952
to
787f78d
Compare
We could break passwd in one way at a time, but I'm affraid of having tons of test cases making CI run for too long. So, as the system happily ignores broken lines, I'm all for breaking them in all the ways we want in the same tests and have few tests running with broken passwd.
CI was failing because bash.exe didn't know how to handle symlinks. Powershell knows by default, but the resulting script would be uglier. It turns out it's just an MSYS toggle to fix bash.exe.
Plus comments about the order we check for registry settnig errors.
Apparently quoting is necessary for globing with a single '*'
ebdee97
to
2d40dbd
Compare
When trying to download a rootfs
2d40dbd
to
acfdc4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Thanks for working on those tests! :)
This affects only the registration time. Subsequent boots don't check for cloud-init or depend on it in any way.
If the distro launcher is used to register a WSL instance, we check if there is a
cloud-init
command in PATH, and if so we launchcloud-init status --wait
right after registration is complete.This will wait until cloud-init completes what's doing or report error/disabled states. By having a Win32 process mapped to that command seems enough to make WSL consider the boot is complete, thus preventing WSL boot time watchdog to kick in.
EDIT:
Cloud-init could have created Linux user accounts. Users could have been created in many other ways, even pre-created during image build. We expect the desired default user to be set via
/etc/wsl.conf
. That would work even without the distro launcher, so it's reasonable to assume the desired default to be present there.We "parse" that file and, if there is the default user entry, then we set it via WSL API (which is faster than terminanting and restarting the instance). In order to leverage Windows primitives for parsing INI, we have to make a temporary copy of that file into the Windows filesystem.
If we don't find a default user set, we check if the current default user is still root, in which case we look for any non-system
user in the NSS passwd database (by launching getent), i.e. any user with the lowest UID greather than or equals to 1000 and have a real login shell (no
/bin/sync
,/bin/false
or/usr/bin/nologin
).Finally, we fallback to the interative user creation, i.e. the default upstream prompting.
I originally intended to put this implementation inside DistributionInfo.h/cpp to ensure less files being touched (thus less
diff compared to upstream) but that doesn't play nice with CI 'Refresh releases and assets' workflow.
So I placed the code in its own component, and inside a subdirectory, so if there comes a time we need to revert that change (it becomes obsolete or so) it remains easy to drop. Also, being inside a subdirectory allows me to drop a .clang-format in
there without worrying of reformatting upstream code :)
Lessons learned from the launcher's old times:
I didn't add too many test cases because I was afraid of taking too long to register/unregister in between tests.
I also avoided adding stuff that could make the tests brittle such as installing huge apt packages (in fact I'm installing nothing at all at this moment), attaching to Ubuntu Pro, enrolling into Landscape and what not. Up for discussion on what makes sense to add here. I also didn't play with cloud-init user-data file selection. Although I could, I believe that's better tested as a unit test of the datasource in the upstream repository itself. Up for discussion as well.
Finally we cannot use private directories to behave as
%USERPROFILE%
, we need to put stuff inside the real user profile directory on Windows, which is sad. We can set%USERPROFILE%
in the Windows shell appropriately and subprocesses of the launcher (either Linux or Windows processes) find the correct value, but systemd services won't. They will still match up the global%USERPROFILE%
, one reason why I struggled to implement those tests.UDENG-2768