Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support #559

Closed
wants to merge 53 commits into from
Closed
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
b4e70b6
Fix pip install -e . on Windows
asmeurer Sep 5, 2023
1719638
Use None for permissions when on Windows
asmeurer Sep 6, 2023
322afb5
Use the Python API for alembic rather than a subprocess
asmeurer Sep 6, 2023
f8f7ce4
Use python -m conda_lock
asmeurer Sep 12, 2023
8cf9e0b
Lower the batch size for updating packages from 1000 to 990
asmeurer Sep 15, 2023
8eb15dd
Merge branch 'main' into windows
asmeurer Sep 15, 2023
76ce349
Use a pure Python equivalent of du on Windows
asmeurer Sep 18, 2023
1f095cb
Merge branch 'main' into windows
asmeurer Sep 25, 2023
abb13ed
Fix missing parenthesis
asmeurer Sep 25, 2023
cf4951e
Fix the worker for Windows
asmeurer Sep 25, 2023
80d9ea8
Use posixpath to construct URLs
asmeurer Sep 25, 2023
877f816
Fix login not working consistently on Windows
asmeurer Oct 5, 2023
59c40db
Fix test_action_decorator to work on Windows
asmeurer Oct 5, 2023
e5fb32c
Fix test_action_decorator on Windows
asmeurer Oct 5, 2023
d87ad02
Skip test_set_conda_prefix_permissions on Windows
asmeurer Oct 5, 2023
d612642
Fix du() on Mac and Windows to return bytes instead of blocks
asmeurer Oct 5, 2023
c08690b
Merge branch 'main' into windows
asmeurer Oct 9, 2023
17d3b3a
Add Windows to CI
asmeurer Oct 9, 2023
be633b3
Run pre-commit
asmeurer Oct 9, 2023
d753976
Try using python -m on CI
asmeurer Oct 9, 2023
a06679c
Revert "Try using python -m on CI"
asmeurer Oct 9, 2023
e639e22
Try activating environment base on Windows
asmeurer Oct 9, 2023
00f193a
Remove mamba from linux CI too
asmeurer Oct 9, 2023
36c99d3
Try using channel-priority: strict
asmeurer Oct 10, 2023
b7b0cc6
CI test
asmeurer Oct 10, 2023
07dc9e8
Trigger CI
asmeurer Oct 10, 2023
dfc3f5b
Try using only conda-forge on Windows
asmeurer Oct 10, 2023
dab4a66
Try using miniforge
asmeurer Oct 10, 2023
c036b27
Set conda-forge as the only channel in ~/.condarc
asmeurer Oct 10, 2023
0c65c05
Fix command
asmeurer Oct 10, 2023
780753a
Fix shell command
asmeurer Oct 10, 2023
98ba0e9
Try force reinstalling Python on Windows
asmeurer Oct 10, 2023
52f25c7
Use the correct environment
asmeurer Oct 10, 2023
b492c27
Revert "Use the correct environment"
asmeurer Oct 10, 2023
26bc094
Add a comment
asmeurer Oct 10, 2023
58c25ec
Try removing extra setup-miniconda config
asmeurer Oct 10, 2023
cd66987
Run tests in verbose mode
asmeurer Oct 10, 2023
1afb0fc
Add an option to not redirect stderr in context.run
asmeurer Oct 10, 2023
5d647ea
Don't capture stderr with conda env export --json
asmeurer Oct 10, 2023
c20d612
Disable mamba in the integration tests too
asmeurer Oct 10, 2023
2d7c953
Print conda-store server address to the terminal when using --standalone
asmeurer Oct 11, 2023
d759fe9
Use "localhost" instead of "127.0.0.1" for consistency with the docs
asmeurer Oct 11, 2023
871f219
Add a note to the docs that Docker image creation only works on Linux
asmeurer Oct 11, 2023
5a3ef3a
Document that filesystem permissions options aren't supported on Windows
asmeurer Oct 11, 2023
e083cc8
Fix a formatting issue in the docs
asmeurer Oct 11, 2023
4918a08
Don't document a config file for using --standalone
asmeurer Oct 11, 2023
9bb4317
Add FAQ entry for long paths on Windows
asmeurer Oct 11, 2023
d47c588
Add a basic test for disk_usage()/du()
asmeurer Oct 11, 2023
d863809
Run black
asmeurer Oct 11, 2023
d993d43
Document that there are different environment files for Mac and Windows
asmeurer Oct 11, 2023
99ec7b2
Fix filenames
asmeurer Oct 11, 2023
eb31b84
Fix filename
asmeurer Oct 11, 2023
2216f47
Account for the size of the directory itself (which is large on Linux)
asmeurer Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import sys
import json
import pathlib
import typing
Expand All @@ -16,7 +17,9 @@ def action_install_lockfile(
json.dump(conda_lock_spec, f)

command = [
"conda-lock",
sys.executable,
"-m",
"conda_lock",
"install",
"--validate-platform",
"--log-level",
Expand Down
11 changes: 8 additions & 3 deletions conda-store-server/conda_store_server/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
from traitlets.config import LoggingConfigurable


ON_WIN = sys.platform.startswith("win")

def conda_store_validate_specification(
db: Session,
conda_store: "CondaStore",
Expand Down Expand Up @@ -301,21 +303,24 @@ def _default_celery_results_backend(self):
)

default_uid = Integer(
os.getuid(),
None if ON_WIN else os.getuid(),
help="default uid to assign to built environments",
config=True,
allow_none=True,
)

default_gid = Integer(
os.getgid(),
None if ON_WIN else os.getgid(),
help="default gid to assign to built environments",
config=True,
allow_none=True,
)

default_permissions = Unicode(
"775",
None if ON_WIN else "775",
help="default file permissions to assign to built environments",
config=True,
allow_none=True,
)

default_docker_base_image = Union(
Expand Down
8 changes: 5 additions & 3 deletions conda-store-server/conda_store_server/dbutil.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import os
from contextlib import contextmanager
from subprocess import check_call
from tempfile import TemporaryDirectory

from alembic import command
Expand Down Expand Up @@ -78,6 +77,9 @@ def upgrade(db_url, revision="head"):
current_table_names = set(inspect(engine).get_table_names())

with _temp_alembic_ini(db_url) as alembic_ini:

alembic_cfg = Config(alembic_ini)

if (
"alembic_version" not in current_table_names
and len(current_table_names) > 0
Expand All @@ -86,10 +88,10 @@ def upgrade(db_url, revision="head"):
# we stamp the revision at the first one, that introduces the alembic revisions.
# I chose the leave the revision number hardcoded as it's not something
# dynamic, not something we want to change, and tightly related to the codebase
command.stamp(Config(alembic_ini), "48be4072fe58")
command.stamp(alembic_cfg, "48be4072fe58")
# After this point, whatever is in the database, Alembic will
# believe it's at the first revision. If there are more upgrades/migrations
# to run, they'll be at the next step :

# run the upgrade.
check_call(["alembic", "-c", alembic_ini, "upgrade", revision])
command.upgrade(config=alembic_cfg, revision=revision)
5 changes: 3 additions & 2 deletions conda-store-server/conda_store_server/orm.py
Original file line number Diff line number Diff line change
Expand Up @@ -513,10 +513,11 @@ def update_packages(self, db, subdirs=None):
package_builds[package_key].append(new_package_build_dict)
logger.info("CondaPackageBuild objects created")

batch_size = 1000
# sqlite3 has a max expression depth of 1000
batch_size = 990
all_package_keys = list(package_builds.keys())
for i in range(0, len(all_package_keys), batch_size):
logger.info(f"handling subset at index {i} (batch size {batch_size}")
logger.info(f"handling subset at index {i} (batch size {batch_size})")
subset_keys = all_package_keys[i : i + batch_size]

# retrieve the parent packages for the subset
Expand Down
13 changes: 7 additions & 6 deletions conda-store-server/conda_store_server/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from conda_store_server import conda_utils
from pydantic import BaseModel, Field, constr, validator

ON_WIN = sys.platform.startswith("win")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already declared in another file. Can we move that definition somewhere where it makes the most sense and import everywhere? Ideally, this should be done in a way that minimizes potential import cycles.


def _datetime_factory(offset: datetime.timedelta):
"""utcnow datetime + timezone as string"""
Expand Down Expand Up @@ -194,20 +195,20 @@ class Settings(BaseModel):
metadata={"global": True},
)

default_uid: int = Field(
os.getuid(),
default_uid: int | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is Python 3.10+ IIUC. Let's use Optional[int] here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this should be fine, but we need to add from __future__ import annotations. I guess we aren't testing old Python versions anywhere. What's the oldest version conda-store should support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the minimal version is. The point I was trying to make: we don't use it anywhere else. I don't see why we should introduce this and immediately bump our lowest supported version to 3.10+ or require an extra import. Let's just use Optional, like everywhere else.

None if ON_WIN else os.getuid(),
description="default uid to assign to built environments",
metadata={"global": True},
)

default_gid: int = Field(
os.getgid(),
default_gid: int | None = Field(
None if ON_WIN else os.getgid(),
description="default gid to assign to built environments",
metadata={"global": True},
)

default_permissions: str = Field(
"775",
default_permissions: str | None = Field(
None if ON_WIN else "775",
description="default file permissions to assign to built environments",
metadata={"global": True},
)
Expand Down
7 changes: 4 additions & 3 deletions conda-store-server/conda_store_server/server/app.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import os
import posixpath
import sys

import conda_store_server
Expand Down Expand Up @@ -198,9 +199,9 @@ def trim_slash(url):
app = FastAPI(
title="conda-store",
version=__version__,
openapi_url=os.path.join(self.url_prefix, "openapi.json"),
docs_url=os.path.join(self.url_prefix, "docs"),
redoc_url=os.path.join(self.url_prefix, "redoc"),
openapi_url=posixpath.join(self.url_prefix, "openapi.json"),
docs_url=posixpath.join(self.url_prefix, "docs"),
redoc_url=posixpath.join(self.url_prefix, "redoc"),
contact={
"name": "Quansight",
"url": "https://quansight.com",
Expand Down
3 changes: 2 additions & 1 deletion conda-store-server/conda_store_server/storage.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import io
import os
import posixpath
import shutil

import minio
Expand Down Expand Up @@ -223,7 +224,7 @@ def get(self, key):
return f.read()

def get_url(self, key):
return os.path.join(self.storage_url, key)
return posixpath.join(self.storage_url, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember what failed without this? The effect of this change will be converting Windows paths to posixpaths:

>>> import posixpath
>>>
>>> posixpath.join("foo", "bar")
'foo/bar'
>>>
>>> import os
>>> os.path.join("foo","bar")
'foo\\bar'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows backslash paths are not correct for file URLs. Without this change, some of the buttons in the UI don't work because they use %5C (URL encoded \) instead of /.


def delete(self, db, build_id, key):
filename = os.path.join(self.storage_path, key)
Expand Down
39 changes: 38 additions & 1 deletion conda-store-server/conda_store_server/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,48 @@ def chdir(directory: pathlib.Path):
os.chdir(current_directory)


def du(path):
"""
Pure Python equivalent of du -sb

Based on https://stackoverflow.com/a/55648984/161801
"""
if os.path.islink(path):
return os.lstat(path).st_size
if os.path.isfile(path):
st = os.lstat(path)
return st.st_size
apparent_total_bytes = 0
have = set()
for dirpath, dirnames, filenames in os.walk(path):
apparent_total_bytes += os.lstat(dirpath).st_size
for f in filenames:
fp = os.path.join(dirpath, f)
if os.path.islink(fp):
apparent_total_bytes += os.lstat(fp).st_size
continue
st = os.lstat(fp)
if st.st_ino in have:
continue
have.add(st.st_ino)
apparent_total_bytes += st.st_size
for d in dirnames:
dp = os.path.join(dirpath, d)
if os.path.islink(dp):
apparent_total_bytes += os.lstat(dp).st_size

# Round up
n_blocks = (apparent_total_bytes + 511) // 512
return n_blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do more tests tomorrow. So far: tested on Linux against du -sb, returns wrong results for /bin. For /bin/ls it seems to work. I'll write a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is a bit tricky.

First, we incorrectly calculate file size on macOS with du. There's no built-in set of options that can give us the equivalent of -sb AFAICT. We could install du from GNU coreutils, which is what is used on Linux, then we'd get the same size as our script. But I'd suggest we just use our Python code to calculate the size on all systems. However, it would still not match what Finder is showing, but I think it's close enough to simply ignore. Different tools calculate this differently and it's not hugely important to users unless we're in the same ballpark.

I also had to remove the round up calculation from the original script. Also, simplified it.

I also haven't tested this script with symlink/filesystem loops. Is this a safe assumption to make that we won't run into this where this is supposed to be used?

My updated code + some tests: https://gist.github.com/nkaretnikov/1a66b90a74fa805f1022e90252e54c87
Note: uncomment the TemporaryDirectory line again and remove two lines after it, which were added for testing on Windows:

    # with tempfile.TemporaryDirectory() as dir:
        dir = "c:\\tmpdir"
        os.makedirs(dir)

On Windows, I've also attempted to test against du from SysInternals, but it also shows different info. It hasn't been updated in a while, so I just ignored that.

In general, I suggest we rely on what OS built-in "File info" tools return to debug this.

On Linux, the updated script matches the du command we're using.

Windows (updated Python du matches native File info size, Sysinternals du shows different info):
Screen Shot 2023-10-01 at 12 55 55

macOS (updated Python du roughly matches native File info, built-in du returns different size):
Screen Shot 2023-10-01 at 13 12 57

macOS (updated Python du matches du from GNU coreutils, installed via brew):
Screen Shot 2023-10-01 at 13 19 23

Note: creating symlinks on Windows requires having Developer Mode on:

Settings > Privacy & Security > For Developers and turn Developer Mode to on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested it on my Mac, du and this function gave the same numbers. I'm not sure if the subtle differences of blocks matter much (it matters if there are sparse or compressed files, but I doubt those would show up). The main thing is that we treat hard links correctly.

Note that in my tests, this function is significantly slower than du, so we should consider whether it's worth using it over du.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in my tests, this function is significantly slower than du, so we should consider whether it's worth using it over du.

Good point. On which dir did you test, can you post the numbers?

When I tested it on my Mac, du and this function gave the same numbers.

How did you test it? Did you test against the built-in du? What filesystem did you test on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on my conda-store-state with a couple of environments:

$ du -sAB1 conda-store-state/
1111657	conda-store-state/
>>> from conda_store_server.utils import du
>>> du('conda-store-state')
1111657

That's on my Mac. I also can confirm that some of the environments do share files via hard-links:

$ ls -i conda-store-state/admin/b82cde5b5489ceffd8a8589ebd73f20a9f4836260b18295f49e057c441b235dc-20231005-213031-558981-2-test2/lib/libsqlite3.0.dylib
256048028 conda-store-state/admin/b82cde5b5489ceffd8a8589ebd73f20a9f4836260b18295f49e057c441b235dc-20231005-213031-558981-2-test2/lib/libsqlite3.0.dylib
$ ls -i conda-store-state/admin/99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20231005-211948-613237-1-test/lib/libsqlite3.0.dylib
256048028 conda-store-state/admin/99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20231005-211948-613237-1-test/lib/libsqlite3.0.dylib

I don't remember if I tested it on Linux, so it's possible there's a discrepancy there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix for this. Note that the numbers will be off a little bit (by less than 512) on Mac because du rounds up to the nearest multiple of 512 when converting bytes to blocks.

However, we still need to figure out something for this du() function. I just tested it against my main ~/anaconda directory and I had to exit out of it after several minutes. I think it might be accidentally quadratic.

There is a Windows du command someone at https://learn.microsoft.com/en-us/sysinternals/downloads/du. Maybe we should just package that as a conda package so we can just use it directly (or maybe it already is packaged?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even regular du -sAB1 ~/anaconda is slow on my machine. Maybe it's just too hard to deal with that many hard links.

But I just noticed that the way conda-store is using this, it gathers the stats for each prefix as it is created. I need to double check it, but I think it actually isn't accounting for hard links across environments correctly anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd appreciate some guidance from @costrouc or someone else on how disk_usage is actually used in conda-store before I feel comfortable with the du stuff here.

Copy link
Contributor

@nkaretnikov nkaretnikov Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a Windows du command someone at https://learn.microsoft.com/en-us/sysinternals/downloads/du. Maybe we should just package that as a conda package so we can just use it directly (or maybe it already is packaged?).

I've already tried it. See above where I post my du test results. Sysinternals du printed wrong results for me, compared to Windows file explorer. The output also differs from Linux/macOS du visually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't notice you tested it already.

Regarding what the Finder returns, I would be curious to know exactly why the discrepancy exists. You can access it programmatically with AppleScript

osascript -e 'tell application "Finder" to get physical size of folder "Macintosh HD:Users:aaronmeurer:Documents:conda-store:conda-store-server:conda-store-state"'
6.6859008E+8

(I highly recommend using an LLM to help you write AppleScript)



def disk_usage(path: pathlib.Path):
if sys.platform == "darwin":
cmd = ["du", "-sAB1", str(path)]
else:
elif sys.platform == "linux":
cmd = ["du", "-sb", str(path)]
else:
return str(du(path))

return subprocess.check_output(cmd, encoding="utf-8").split()[0]

Expand Down
11 changes: 10 additions & 1 deletion conda-store-server/conda_store_server/worker/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,18 @@ def start(self):
argv = [
"worker",
"--loglevel=INFO",
"--beat",
]

# The default Celery pool requires this on Windows. See
# https://stackoverflow.com/questions/37255548/how-to-run-celery-on-windows
if sys.platform == "win32":
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me worried that we rely on a library that has no official Windows support.

It's worth discussing the level of Windows support we expect to provide ourselves and possibly consider alternatives to celery.

I did attempt to test this by running my concurrency test from 3fc0e14. I also parameterized it by all pools from get_available_pool_names. All of them failed. This could be for a number of reasons, but it's not worth investigating at the moment due to 50% of the testsuite failing for me. Also: pytest celery fixture might require additional tweaks to get working.

Before we start talking about celery or concurrency, I suggest we get the rest of the testsuite working on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the other pools fail because they aren't actually concurrent on Windows, and one of the tasks blocks all the others (I think the watch task, but I'm not completely sure). But even if that weren't the case, conda-store would be very slow without concurrent tasks due to some slow tasks like updating channels.

else:
# --beat does not work on Windows
argv += [
"--beat",
]

if self.concurrency:
argv.append(f"--concurrency={self.concurrency}")

Expand Down
4 changes: 2 additions & 2 deletions conda-store-server/hatch_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ def initialize(self, version: str, build_data: Dict[str, Any]) -> None:
# main.js to enable easy configuration see
# conda_store_server/server/templates/conda-store-ui.html
# for global variable set
with (source_directory / "main.js").open("r") as source_f:
with (source_directory / "main.js").open("r", encoding='utf-8') as source_f:
content = source_f.read()
content = re.sub(
'"MISSING_ENV_VAR"', "GLOBAL_CONDA_STORE_STATE", content
)
with (destination_directory / "main.js").open("w") as dest_f:
with (destination_directory / "main.js").open("w", encoding='utf-8') as dest_f:
dest_f.write(content)
Loading