-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows support #559
Windows support #559
Changes from 11 commits
b4e70b6
1719638
322afb5
f8f7ce4
8cf9e0b
8eb15dd
76ce349
1f095cb
abb13ed
cf4951e
80d9ea8
877f816
59c40db
e5fb32c
d87ad02
d612642
c08690b
17d3b3a
be633b3
d753976
a06679c
e639e22
00f193a
36c99d3
b7b0cc6
07dc9e8
dfc3f5b
dab4a66
c036b27
0c65c05
780753a
98ba0e9
52f25c7
b492c27
26bc094
58c25ec
cd66987
1afb0fc
5d647ea
c20d612
2d7c953
d759fe9
871f219
5a3ef3a
e083cc8
4918a08
9bb4317
d47c588
d863809
d993d43
99ec7b2
eb31b84
2216f47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
from conda_store_server import conda_utils | ||
from pydantic import BaseModel, Field, constr, validator | ||
|
||
ON_WIN = sys.platform.startswith("win") | ||
|
||
def _datetime_factory(offset: datetime.timedelta): | ||
"""utcnow datetime + timezone as string""" | ||
|
@@ -194,20 +195,20 @@ class Settings(BaseModel): | |
metadata={"global": True}, | ||
) | ||
|
||
default_uid: int = Field( | ||
os.getuid(), | ||
default_uid: int | None = Field( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This syntax is Python 3.10+ IIUC. Let's use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually this should be fine, but we need to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know what the minimal version is. The point I was trying to make: we don't use it anywhere else. I don't see why we should introduce this and immediately bump our lowest supported version to 3.10+ or require an extra import. Let's just use Optional, like everywhere else. |
||
None if ON_WIN else os.getuid(), | ||
description="default uid to assign to built environments", | ||
metadata={"global": True}, | ||
) | ||
|
||
default_gid: int = Field( | ||
os.getgid(), | ||
default_gid: int | None = Field( | ||
None if ON_WIN else os.getgid(), | ||
description="default gid to assign to built environments", | ||
metadata={"global": True}, | ||
) | ||
|
||
default_permissions: str = Field( | ||
"775", | ||
default_permissions: str | None = Field( | ||
None if ON_WIN else "775", | ||
description="default file permissions to assign to built environments", | ||
metadata={"global": True}, | ||
) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
import io | ||
import os | ||
import posixpath | ||
import shutil | ||
|
||
import minio | ||
|
@@ -223,7 +224,7 @@ def get(self, key): | |
return f.read() | ||
|
||
def get_url(self, key): | ||
return os.path.join(self.storage_url, key) | ||
return posixpath.join(self.storage_url, key) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you remember what failed without this? The effect of this change will be converting Windows paths to posixpaths: >>> import posixpath
>>>
>>> posixpath.join("foo", "bar")
'foo/bar'
>>>
>>> import os
>>> os.path.join("foo","bar")
'foo\\bar' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Windows backslash paths are not correct for file URLs. Without this change, some of the buttons in the UI don't work because they use |
||
|
||
def delete(self, db, build_id, key): | ||
filename = os.path.join(self.storage_path, key) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,11 +50,48 @@ def chdir(directory: pathlib.Path): | |
os.chdir(current_directory) | ||
|
||
|
||
def du(path): | ||
""" | ||
Pure Python equivalent of du -sb | ||
|
||
Based on https://stackoverflow.com/a/55648984/161801 | ||
""" | ||
if os.path.islink(path): | ||
return os.lstat(path).st_size | ||
if os.path.isfile(path): | ||
st = os.lstat(path) | ||
return st.st_size | ||
apparent_total_bytes = 0 | ||
have = set() | ||
for dirpath, dirnames, filenames in os.walk(path): | ||
apparent_total_bytes += os.lstat(dirpath).st_size | ||
for f in filenames: | ||
fp = os.path.join(dirpath, f) | ||
if os.path.islink(fp): | ||
apparent_total_bytes += os.lstat(fp).st_size | ||
continue | ||
st = os.lstat(fp) | ||
if st.st_ino in have: | ||
continue | ||
have.add(st.st_ino) | ||
apparent_total_bytes += st.st_size | ||
for d in dirnames: | ||
dp = os.path.join(dirpath, d) | ||
if os.path.islink(dp): | ||
apparent_total_bytes += os.lstat(dp).st_size | ||
|
||
# Round up | ||
n_blocks = (apparent_total_bytes + 511) // 512 | ||
return n_blocks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do more tests tomorrow. So far: tested on Linux against There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, this is a bit tricky. First, we incorrectly calculate file size on macOS with I also had to remove the round up calculation from the original script. Also, simplified it. I also haven't tested this script with symlink/filesystem loops. Is this a safe assumption to make that we won't run into this where this is supposed to be used? My updated code + some tests: https://gist.github.com/nkaretnikov/1a66b90a74fa805f1022e90252e54c87
On Windows, I've also attempted to test against du from SysInternals, but it also shows different info. It hasn't been updated in a while, so I just ignored that. In general, I suggest we rely on what OS built-in "File info" tools return to debug this. On Linux, the updated script matches the du command we're using. Windows (updated Python du matches native File info size, Sysinternals du shows different info): macOS (updated Python du roughly matches native File info, built-in du returns different size): macOS (updated Python du matches du from GNU coreutils, installed via brew): Note: creating symlinks on Windows requires having Developer Mode on:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I tested it on my Mac, du and this function gave the same numbers. I'm not sure if the subtle differences of blocks matter much (it matters if there are sparse or compressed files, but I doubt those would show up). The main thing is that we treat hard links correctly. Note that in my tests, this function is significantly slower than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point. On which dir did you test, can you post the numbers?
How did you test it? Did you test against the built-in du? What filesystem did you test on? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested it on my conda-store-state with a couple of environments:
That's on my Mac. I also can confirm that some of the environments do share files via hard-links:
I don't remember if I tested it on Linux, so it's possible there's a discrepancy there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I pushed a fix for this. Note that the numbers will be off a little bit (by less than 512) on Mac because However, we still need to figure out something for this There is a Windows There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually even regular But I just noticed that the way conda-store is using this, it gathers the stats for each prefix as it is created. I need to double check it, but I think it actually isn't accounting for hard links across environments correctly anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I'd appreciate some guidance from @costrouc or someone else on how disk_usage is actually used in conda-store before I feel comfortable with the du stuff here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I've already tried it. See above where I post my du test results. Sysinternals du printed wrong results for me, compared to Windows file explorer. The output also differs from Linux/macOS du visually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I didn't notice you tested it already. Regarding what the Finder returns, I would be curious to know exactly why the discrepancy exists. You can access it programmatically with AppleScript
(I highly recommend using an LLM to help you write AppleScript) |
||
|
||
|
||
def disk_usage(path: pathlib.Path): | ||
if sys.platform == "darwin": | ||
cmd = ["du", "-sAB1", str(path)] | ||
else: | ||
elif sys.platform == "linux": | ||
cmd = ["du", "-sb", str(path)] | ||
else: | ||
return str(du(path)) | ||
|
||
return subprocess.check_output(cmd, encoding="utf-8").split()[0] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,9 +71,18 @@ def start(self): | |
argv = [ | ||
"worker", | ||
"--loglevel=INFO", | ||
"--beat", | ||
] | ||
|
||
# The default Celery pool requires this on Windows. See | ||
# https://stackoverflow.com/questions/37255548/how-to-run-celery-on-windows | ||
if sys.platform == "win32": | ||
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes me worried that we rely on a library that has no official Windows support. It's worth discussing the level of Windows support we expect to provide ourselves and possibly consider alternatives to celery. I did attempt to test this by running my concurrency test from 3fc0e14. I also parameterized it by all pools from Before we start talking about celery or concurrency, I suggest we get the rest of the testsuite working on Windows. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most of the other pools fail because they aren't actually concurrent on Windows, and one of the tasks blocks all the others (I think the watch task, but I'm not completely sure). But even if that weren't the case, conda-store would be very slow without concurrent tasks due to some slow tasks like updating channels. |
||
else: | ||
# --beat does not work on Windows | ||
argv += [ | ||
"--beat", | ||
] | ||
|
||
if self.concurrency: | ||
argv.append(f"--concurrency={self.concurrency}") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already declared in another file. Can we move that definition somewhere where it makes the most sense and import everywhere? Ideally, this should be done in a way that minimizes potential import cycles.