-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add numpy 2 hotfix to main #227
Conversation
Co-authored-by: Charles Bousseau <[email protected]>
@cbouss thanks for the review, made the suggestions and reuploaded the most upto date output |
Much better output now in the output.txt @cbouss |
Besides the diff it would be great to have some analysis on the repodata before and after the patch. I feel this would help ensure all is covered. Also, I have been thinking, what if we publish this patch, then start building python packages with no upper bounds (because they support numpy 2), and someone regenerate a patch? A <1 bound would be wrongfully added. So perhaps instead we could have a configuration file like:
Which could be used to patch like (pseudo code):
This way if needed we can tweak the configuration file and retain more control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking on this over the weekend, I feel there are two major things to ask for (if I understand the code correctly).
First, let's simplify numpy2.py code.
- Let's rename to generate_numpy2.py changes
- Remove all code not being used and most reporting type code.
- Simplify the data structure in the numpy2_patch.json
- Move testing code to the tests folder
Second, let's simplify the incorporation of the numpy2 patches into main.py
- Remove any logging or side product reporting
- Use standard dep replacement code
main.py
Outdated
import requests | ||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good idea to add in logging into this script. This script is feed into a conda-index process.
main.py
Outdated
return json.load(f) | ||
except FileNotFoundError: | ||
logger.error("numpy2_patch.json not found. Aborting hotfixes.") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just remove this block and use my oneliner below.
main.py
Outdated
sys.exit(1) | ||
|
||
|
||
NUMPY_2_CHANGES = load_numpy2_changes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NUMPY_2_CHANGES = load_numpy2_changes() | |
NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text()) |
Of course, add in from pathlib import Path
above.
test_numpy.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code that tests code should be put in the tests directory.
main.py
Outdated
@@ -1534,6 +1635,8 @@ def do_hotfixes(base_dir): | |||
def main(): | |||
base_dir = join(dirname(__file__), CHANNEL_NAME) | |||
do_hotfixes(base_dir) | |||
if NUMPY_2_CHANGES != {}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be creating csv files as a by product of the hotfixing process.
main.py
Outdated
@@ -671,6 +771,9 @@ def patch_record_in_place(fn, record, subdir): | |||
depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,") | |||
break | |||
|
|||
if NUMPY_2_CHANGES is not {}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if NUMPY_2_CHANGES is not {}: | |
if NUMPY_2_CHANGES: |
Python truthiness. Depending on how you generated the patch. We should either apply all the numpy patches at the very beginning or at the very end. There are other places in main where numpy is touched and gets an upper bound. We don't want to overwrite them or redo them.
main.py
Outdated
""" | ||
if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]: | ||
return | ||
changes = NUMPY_2_CHANGES[subdir][filename] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can then refer to subdir_changes from above.
main.py
Outdated
return None | ||
|
||
|
||
def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'): | |
def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'): |
main.py
Outdated
- filename (str): The name of the file being processed. | ||
- sort_type (str, optional): The key in the 'change' dictionary to sort the CSV data by. Defaults to 'reason'. | ||
""" | ||
for i, dep in enumerate(depends): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i, dep in enumerate(depends): | |
replace_dep(depends, change["original"], change["updated"]) |
replace_dep is the standard usage for replacing deps in main.py.
Should have addressed the comments, the output is looking good on the test-hotfix.py to me, removed all non-essential logging and simplified the solution based on the suggestions. Let me know if I have missed anything or anything new is observed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so close.
My biggest recommendation is to change how you are storing your data for faster data accesses to avoid full searches of numpy 2 changes for every single file in repodata.json.
generate_numpy2_patch.py
Outdated
@@ -0,0 +1,211 @@ | |||
from os.path import dirname, isdir, isfile, join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always recommend using pathlib but this is a small point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you need to update this to reflect current code changes.
generate_numpy2_patch.py
Outdated
import re | ||
|
||
numpy2_protect_dict = { | ||
'add_bound_to_unspecified': True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be more of a flag than belonging to this dictionary. Putting Like with Like to make data easy.
generate_numpy2_patch.py
Outdated
new_dependency = f"{dependency},<2.0a0" if len(parts) > 1 else f"{dependency} <2.0a0" | ||
collect_proposed_change(package_subdir, filename, dependency_type, dependency, | ||
new_dependency, "Version comparison failed") | ||
elif numpy2_protect_dict.get('add_bound_to_unspecified', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this as a flag seems like a better idea here as suggested above.
main.py
Outdated
@@ -5,12 +5,12 @@ | |||
import os | |||
import re | |||
import sys | |||
import requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems to follow the isort order for imports:
Python Standard lib import
Python Standard from lib import
Then third party imports.
main.py
Outdated
- subdir: The subdirectory of the record. | ||
- filename: The filename of the record. | ||
""" | ||
relevant_changes = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you did this but we can make this much much easier if we change the data structure. I think we should have a mapping of mappings to avoid having to do a full search on the numpy list for every single file in the repodata.json.
Before you had a dictionary which was a dictionary of subdirs containing a dictionary of filenames containing a list of changes. I am suggesting you go back to that in part. Have a dictionary of subdirs, each containing a dictionary of filenames as keys storing the dictionary of values.
This will make this search go much much faster.
if subdir not in NUMPY_2_CHANGES:
return
relavant_changes = NUMPY_2_changes["subdirectory"].get(filename, {})
main.py
Outdated
replace_dep(depends, change["original"], change["updated"]) | ||
|
||
|
||
def _get_dependency_list(record, change_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this warrants its own function. If you store "depends" and "constrains" directly in the numpy2_patch.json file, the replace_dep line becomes:
replace_dep(record[change["change_type"]], change["original"], change["updated"]
Your process already did all the cleaning so we shouldn't have any weird third cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also gets rid of the unnecessary for loop.
main.py
Outdated
@@ -686,6 +731,9 @@ def patch_record_in_place(fn, record, subdir): | |||
depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,") | |||
break | |||
|
|||
if NUMPY_2_CHANGES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated before, as this block changes many different files and NOT a targeted file. It should be run last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is progressing quite well. Here are a few things.
- I think we can safely drop "numpy-base" from the list as it has never been triggered
- This should simplify the logic in places so you can directly ask and not have to do the reverse.
- I think as this code has some complex nested if/else statements (5 deep in one severe case!), you need more inline comments as it is hard to follow and will be hard to follow for future devs. My gut tells me there is still some simplifications that should occur.
- Specifically more comments for generate_numpy2_patch:update_numpy_dependencies and :main
- Lastly, how are you checking for no arch again? the filename will not work in the noarch case.
main.py
Outdated
if not change: | ||
return | ||
else: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case do you feel that you will have a failure when you have been given a filename the record will fail? I don't think you need to be so protective here.
generate_numpy2_patch.py
Outdated
try: | ||
for dep in depends: | ||
if dep.split()[0] in ["numpy", "numpy-base"]: | ||
update_numpy_dependencies(depends, record, "dep", subdir, fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_numpy_dependencies(depends, record, "dep", subdir, fn) | |
update_numpy_dependencies(depends, record, "depends", subdir, fn) |
generate_numpy2_patch.py
Outdated
update_numpy_dependencies(depends, record, "dep", subdir, fn) | ||
for constrain in constrains: | ||
if constrain.split()[0] in ["numpy", "numpy-base"]: | ||
update_numpy_dependencies(constrains, record, "constr", subdir, fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_numpy_dependencies(constrains, record, "constr", subdir, fn) | |
update_numpy_dependencies(constrains, record, "constrains", subdir, fn) |
generate_numpy2_patch.py
Outdated
- reason: The reason for the change. | ||
""" | ||
# change dep and constr to dependency and constraint | ||
if change_type == "dep": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By giving the full change_type on creation, you don't need this if else statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of lines 55-58 now.
main.py
Outdated
change = NUMPY_2_CHANGES[subdir].get(filename) | ||
if not change: | ||
return | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need an else in this case.
generate_numpy2_patch.py
Outdated
logger.error(f"numpy 2.0.0 error {fn}: {e}") | ||
|
||
json_filename = Path("numpy2_patch.json") | ||
with json_filename.open('w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary but this can be a one liner.
with json_filename.open('w') as f: | |
json_filename.write_text(json.dumps(dict(NUMPY_2_CHANGES), indent=2) |
@ryanskeith done your suggestions, on the noarch I think a previous review concluded that there is little point doing noarch in this instance? @cbouss if you have an opinion on this feel free to wade in! Happy to handle this edge case, or we could deploy without this and then rectify the noarch packages in another PR? |
Yes I believe it is best to not patch noarch packages, at least for now. I am not sure of which edge cases this would uncover, and it is now time to release numpy 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. You could remove the dead code in generate_numpy2_patch.py
, but it isn't necessary.
generate_numpy2_patch.py
Outdated
- reason: The reason for the change. | ||
""" | ||
# change dep and constr to dependency and constraint | ||
if change_type == "dep": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of lines 55-58 now.
This PR introduces new functionality for checking and updating package dependencies for numpy 2.0 compatibility. It also includes significant updates to the documentation to reflect these changes and provide guidance on using the new features.
Links
Explanation of changes:
numpy2.py
for checking and proposing numpy 2.0 compatibility updatesmain.py
to incorporate changes fromproposed_numpy_changes.json
numpy2.py
and the updatedmain.py
numpy2.py
andmain.py
Note to Reviewers:
This implementation has been carefully designed. However, I value your opinions and welcome any feedback on the design or implementation. This is the best attempt currently at a balanced approach considering future hotfix applications will need to be done carefully.
Also the packages in the "protect_dict" are just placeholder for now, once this is approved as an approach there will be work to refine this list