Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add numpy 2 hotfix to main #227

Merged
merged 48 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
7f0f367
add numpy 2 hotfix to main
JamesRobertsonGames Jun 14, 2024
39ec561
ensure the lint passes
JamesRobertsonGames Jun 14, 2024
013d769
Apply suggestions from code review
JamesRobertsonGames Jun 18, 2024
b011dcc
re-add numpy old hotfix
JamesRobertsonGames Jun 18, 2024
e38027f
ensure find packages with lock without ==
JamesRobertsonGames Jun 18, 2024
67417d8
propogate find changes to be made
JamesRobertsonGames Jun 18, 2024
0a93cab
ensure correct application of upper bound hotfix
JamesRobertsonGames Jun 19, 2024
d0d073b
avoid anaconda_depends for repin of numpy
JamesRobertsonGames Jun 19, 2024
d35eab8
add protection list and logic to the n2 hotfix
JamesRobertsonGames Jun 25, 2024
513846a
lint issues
JamesRobertsonGames Jun 25, 2024
9e1ab7b
lint issue
JamesRobertsonGames Jun 25, 2024
b0fd63e
pyyaml added
JamesRobertsonGames Jun 25, 2024
71306c3
pyyaml in testenv
JamesRobertsonGames Jun 25, 2024
ce7e20b
modify yaml and ensure upto date filtering of edge cases
JamesRobertsonGames Jul 16, 2024
0863898
remove need for yaml
JamesRobertsonGames Jul 16, 2024
c979b12
remove pyyaml
JamesRobertsonGames Jul 16, 2024
881cf75
add numpy 2 hotfix rework
JamesRobertsonGames Jul 25, 2024
18e3c4d
formatting changes
JamesRobertsonGames Jul 25, 2024
e3cb04b
main trimmed whitespace
JamesRobertsonGames Jul 25, 2024
86c54a1
repair lint issues
JamesRobertsonGames Jul 25, 2024
0027dab
remove all the lint errors from flake
JamesRobertsonGames Jul 25, 2024
571832b
revert test-hotfix
JamesRobertsonGames Jul 25, 2024
34f2d32
py-rattler adding
JamesRobertsonGames Jul 25, 2024
fec2680
add to readme the new processes
JamesRobertsonGames Jul 25, 2024
ed2659a
remove dep none remover
JamesRobertsonGames Jul 25, 2024
4d654e5
remove yaml
JamesRobertsonGames Jul 25, 2024
c26a1fd
upload proposed changes
JamesRobertsonGames Jul 25, 2024
71e30cd
Apply suggestions from code review
JamesRobertsonGames Jul 26, 2024
d75edb5
Update numpy2.py
JamesRobertsonGames Jul 26, 2024
411223c
change code for clearer changes for numpy 2
JamesRobertsonGames Jul 26, 2024
6716a8f
remove rattler goodness
JamesRobertsonGames Jul 26, 2024
8dd4c11
change to n2
JamesRobertsonGames Jul 26, 2024
c896b1b
flake8
JamesRobertsonGames Jul 26, 2024
d2ab4c9
remove logging for items not being updated
JamesRobertsonGames Jul 26, 2024
2d59ddb
regenerate n2 patch
JamesRobertsonGames Jul 26, 2024
7f237ee
Merge branch 'master' into numpy2-hotfix
ryanskeith Jul 27, 2024
c1b7795
make changes suggested in previous review
JamesRobertsonGames Aug 1, 2024
7639c1a
linting errors fixed
JamesRobertsonGames Aug 1, 2024
c09fa2f
delete numpy2 config and lint modifications
JamesRobertsonGames Aug 1, 2024
4326c27
speed up patching process with better data handling
JamesRobertsonGames Aug 1, 2024
1eb8cd5
README updated with changes
JamesRobertsonGames Aug 1, 2024
e5bb64f
make changes for legibility to per review
JamesRobertsonGames Aug 8, 2024
3b6625c
linting correctors
JamesRobertsonGames Aug 8, 2024
c54945f
remove correction code
JamesRobertsonGames Aug 13, 2024
36222d0
revert to force git to pick up the change
JamesRobertsonGames Aug 13, 2024
4d8e248
revert depends and contrains
JamesRobertsonGames Aug 13, 2024
b6595af
ensure return case is not needed
JamesRobertsonGames Aug 13, 2024
4d468b1
remove protect dict to await issues
JamesRobertsonGames Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 51 additions & 13 deletions README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need to update this to reflect current code changes.

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# repodata-hotfixes
## Changes to package metadata to fix behavior

When packages are created, authors do their best to specify constraints that make their package work. Sometimes things change, and their constraints are not accurate for making things work. This results in broken environments. People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments. This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.
When packages are created, authors do their best to specify constraints that make their package work. Sometimes things change, and their constraints are not accurate for making things work. This results in broken environments. People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments. This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.

## Things that may require a metadata hotfix:

Expand All @@ -13,25 +13,25 @@ When packages are created, authors do their best to specify constraints that mak

### Dependency and Constraint updates

Changing dependencies and constraints is the primary reason hotfixes are applied. Their
Changing dependencies and constraints is the primary reason hotfixes are applied. Their
may be reasons why you need to change a longstanding package but rebuilding may not be
feasible or perhaps not worth the time. By changing dependencies and constraints,
feasible or perhaps not worth the time. By changing dependencies and constraints,
the data used to solve for dependencies can be modified and leave the larger ecosystem
unharmed.

NOTE: Hotfixes are applied in a overwrite manner. So any changes are implemented
NOTE: Hotfixes are applied in a overwrite manner. So any changes are implemented
will effect the the entire dependency or constraint list (i.e. If someone
changes one out of the ten dependency for a single package, all ten will still should be in the
"patch-instructions" as patching is an overwriting operation).

### Removal

Adding a package to the removal list will remove the entire entry from the repodata.json. It will no longer be searchable by conda search.
Adding a package to the removal list will remove the entire entry from the repodata.json. It will no longer be searchable by conda search.

We should put things on the remove list when:
- We need a quick fix to stop consumers from downloading a bad package.

Another approach might be to move the package into broken package directory (see directions in perseverance-skills). This will cause it not to be indexed in the first place.
Another approach might be to move the package into broken package directory (see directions in perseverance-skills). This will cause it not to be indexed in the first place.

### Revoked

Expand All @@ -45,12 +45,51 @@ We should put things on the revoke list when:
- We feel we want a customer to still have access but not the whole consumer population by default
- ?

## Numpy 2.0 Compatibility Checks and Updates

### Running `generate_numpy2_patch.py`

The `generate_numpy2_patch.py` script is used to check and update package dependencies for compatibility with numpy 2.0. To run the script, use the following command:

```
python `generate_numpy2_patch.py`
```

### What numpy2.py does

`generate_numpy2_patch.py` performs the following tasks:
1. Scans through the repodata for packages depending on numpy.
2. Checks if these dependencies need updates to ensure compatibility with numpy 2.0.
3. Proposes changes to add upper bounds to numpy dependencies where necessary.
4. Generates a `numpy2_patch.json` file containing all proposed changes.

### When to use numpy2.py

Use `generate_numpy2_patch.py` when:
- Preparing for a major numpy version update (e.g., transitioning to numpy 2.0).
- You need to audit and update numpy dependencies across many packages.
- You want to ensure compatibility of the ecosystem with upcoming numpy versions.

### Running main.py with proposed_numpy_changes.json

After running `generate_numpy2_patch.py`, you'll have a `numpy2_patch.json` file. To apply these changes:

1. Ensure `numpy2_patch.json` is in the same directory as `main.py`.
2. Run `main.py` as usual:

```
python main.py
```

`main.py` will automatically detect and incorporate the changes from `numpy2_patch.json` into the hotfix process.

## Utility scripts:

### Seeing current hotfixes with `gen-current-hotfix-report.py`:

It can be quite difficult to grok what the hotfix scripts are doing. The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.
It can be quite difficult to grok what the hotfix scripts are doing. The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.

The script downloads the current repodata. It then shows you a diff. Example usage of this script:
The script downloads the current repodata. It then shows you a diff. Example usage of this script:

```
python gen-current-hotfix-report.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
Expand All @@ -60,16 +99,15 @@ For repeated runs add `--use-cache` to avoid downloading the repodata files.

### Testing hotfixes with `test-hotfix.py`:

The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.

This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.
This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
and the ones you are working on.

Example usage of this script:

```
python test-hotfix.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
```

Use the `--color` or `--show-pkgs` options for different outputs.

For repeated runs add `--use-cache` to avoid downloading the repodata files.
For repeated runs add `--use-cache` to avoid downloading the repodata files.
246 changes: 246 additions & 0 deletions generate_numpy2_patch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
import requests
import logging
import json
import re
from collections import defaultdict
from pathlib import Path
from conda.models.version import VersionOrder

numpy2_protect_dict = {
# add any numpy dependencies that needs to be protected here
# "package_name": "protected_version"
}

proposed_changes = []

# Configure the logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Create a logger object
logger = logging.getLogger(__name__)

CHANNEL_NAME = "main"
CHANNEL_ALIAS = "https://repo.anaconda.com/pkgs"
SUBDIRS = (
"noarch",
"linux-64",
"linux-aarch64",
"linux-s390x",
"osx-64",
"osx-arm64",
"win-64",
)


# Initialize NUMPY_2_CHANGES with a nested defaultdict structure
NUMPY_2_CHANGES = defaultdict(lambda: defaultdict(dict))


def collect_proposed_change(subdirectory, filename, change_type, original_dependency, updated_dependency, reason):
"""
Collects a proposed change to a dependency for later processing.
Parameters:
- subdirectory: The subdirectory where the file is located.
- filename: The name of the file being modified.
- change_type: The type of change (e.g., 'dep', 'constr').
- original_dependency: The original dependency string.
- updated_dependency: The updated dependency string.
- reason: The reason for the change.
"""
NUMPY_2_CHANGES[subdirectory][filename] = {
"type": change_type,
"original": original_dependency,
"updated": updated_dependency
}

logger.info(f"numpy 2.0.0: {reason} for {filename}. "
f"Original: '{original_dependency}' -> New: '{updated_dependency}' ({reason})")


def parse_version(version_str):
"""
Extracts the version number from a version string.
Parameters:
- version_str: The version string to parse.
Returns:
The extracted version number or None if not found.
"""
match = re.search(r'(\d+(\.\d+)*)', version_str)
return match.group(1) if match else None


def has_upper_bound(dependency):
"""
Checks if a dependency string contains an upper bound.
Parameters:
- dependency: The dependency string to check.
Returns:
True if an upper bound is found, False otherwise.
"""
return any(part.strip().startswith('<') for part in dependency.split(','))


def patch_record_with_fixed_deps(dependency, parts):
"""
Adds an upper bound to a dependency if necessary.
Parameters:
- dependency: The original dependency string.
- parts: The parts of the dependency string, split by spaces.
Returns:
The potentially modified dependency string.
"""
version_str = parts[1]
version = parse_version(version_str)
if version:
if version_str.startswith('==') or version_str.startswith('<') or version_str[0].isdigit():
return dependency
if version_str.startswith('>') or version_str.startswith('>='):
return f"{dependency},<2.0a0"
return f"{dependency} <2.0a0"
return dependency


def update_numpy_dependencies(dependencies_list, package_record, dependency_type, package_subdir, filename):
"""
Adds upper bounds to numpy dependencies as needed.
Iterates through dependencies, modifying those without upper bounds and meeting specific criteria.
Parameters:
- dependencies_list: Dependencies to check and modify.
- package_record: Metadata about the current package.
- dependency_type: Type of dependency ('run', 'build').
- package_subdir: Package location subdirectory.
- filename: Package filename.
"""
# Flag to determine if unspecified dependencies should get an upper bound
add_bound_to_unspecified = True

# Iterate through each dependency in the list
for _, dependency in enumerate(dependencies_list):
parts = dependency.split()
package_name = parts[0]

# Check if the dependency is for numpy and does not have an upper bound
if "numpy" in package_name and not has_upper_bound(dependency):
if package_name in numpy2_protect_dict:
# Handle dependencies that are in the protection dictionary
_handle_protected_dependency(parts, dependency, package_subdir, filename, dependency_type)
elif add_bound_to_unspecified:
# Handle dependencies that are unspecified and need an upper bound
_handle_unspecified_dependency(parts, dependency, package_subdir, filename, dependency_type)


def _handle_protected_dependency(parts, dependency, package_subdir, filename, dependency_type):
"""
Handles dependencies that are in the protection dictionary.
"""
version_str = parts[1] if len(parts) > 1 else None
version = parse_version(version_str) if version_str else None
protected_version = parse_version(numpy2_protect_dict[parts[0]])

if version and protected_version:
try:
# Compare the version with the protected version
if VersionOrder(version) <= VersionOrder(protected_version):
# Add an upper bound to the dependency if the version is less than or equal to the protected version
new_dependency = f"{dependency},<2.0a0" if len(parts) > 1 else f"{dependency} <2.0a0"
collect_proposed_change(package_subdir, filename, dependency_type,
dependency, new_dependency, "Version <= protected_version")
except ValueError:
# Handle version comparison errors
new_dependency = f"{dependency},<2.0a0" if len(parts) > 1 else f"{dependency} <2.0a0"
collect_proposed_change(package_subdir, filename, dependency_type,
dependency, new_dependency, "Version comparison failed")


def _handle_unspecified_dependency(parts, dependency, package_subdir, filename, dependency_type):
"""
Handles dependencies that are unspecified and need an upper bound.
"""
if len(parts) > 1:
# Patch the record with fixed dependencies if there are multiple parts
new_dependency = patch_record_with_fixed_deps(dependency, parts)
if new_dependency != dependency:
collect_proposed_change(package_subdir, filename, dependency_type,
dependency, new_dependency, "Upper bound added")
else:
# Add an upper bound to the dependency if there is only one part
new_dependency = f"{dependency} <2.0a0"
collect_proposed_change(package_subdir, filename, dependency_type,
dependency, new_dependency, "Upper bound added")


def main():
base_dir = Path(__file__).parent / CHANNEL_NAME
repodatas = {}

# Iterate over each subdir to load or fetch repodata
for subdir in SUBDIRS:
repodata_path = base_dir / subdir / "repodata_from_packages.json"

# Check if the repodata file exists locally
if repodata_path.is_file():
with repodata_path.open() as fh:
repodatas[subdir] = json.load(fh) # Load repodata from local file
else:
# Fetch repodata from the remote URL if not available locally
repodata_url = f"{CHANNEL_ALIAS}/{CHANNEL_NAME}/{subdir}/repodata_from_packages.json"
response = requests.get(repodata_url)
response.raise_for_status()
repodatas[subdir] = response.json() # Load repodata from the response
repodata_path.parent.mkdir(parents=True, exist_ok=True) # Ensure the directory exists
with repodata_path.open('w') as fh:
# Save the fetched repodata to a local file
json.dump(
repodatas[subdir],
fh,
indent=2,
sort_keys=True,
separators=(",", ": "),
)

# Process each subdir's repodata to update numpy dependencies
for subdir in SUBDIRS:
index = repodatas[subdir]["packages"]
for fn, record in index.items():
name = record["name"]
depends = record["depends"]
constrains = record.get("constrains", [])

# Filter out None dependencies
depends = [dep for dep in depends if dep is not None]

# Check if the package is for specific Python versions
if any(py_ver in fn for py_ver in ["py39", "py310", "py311", "py312"]):
# Exclude certain package names from processing
if name not in ["anaconda", "_anaconda_depends", "__anaconda_core_depends", "_anaconda_core"]:
try:
# Update numpy dependencies in the 'depends' list
for dep in depends:
if dep.split()[0] in ["numpy", "numpy-base"]:
update_numpy_dependencies(depends, record, "depends", subdir, fn)
# Update numpy dependencies in the 'constrains' list
for constrain in constrains:
if constrain.split()[0] in ["numpy", "numpy-base"]:
update_numpy_dependencies(constrains, record, "constrains", subdir, fn)
except Exception as e:
# Log any errors encountered during the update process
logger.error(f"numpy 2.0.0 error {fn}: {e}")

# Write the proposed changes to a JSON file
json_filename = Path("numpy2_patch.json")
json_filename.write_text(json.dumps(dict(NUMPY_2_CHANGES), indent=2))

logger.info(f"Proposed changes have been written to {json_filename}")


if __name__ == "__main__":
main()
Loading
Loading