Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add numpy 2 hotfix to main #227

Merged
merged 48 commits into from
Aug 13, 2024
Merged

add numpy 2 hotfix to main #227

merged 48 commits into from
Aug 13, 2024

Conversation

JamesRobertsonGames
Copy link
Contributor

@JamesRobertsonGames JamesRobertsonGames commented Jun 14, 2024

This PR introduces new functionality for checking and updating package dependencies for numpy 2.0 compatibility. It also includes significant updates to the documentation to reflect these changes and provide guidance on using the new features.

Links

Explanation of changes:

  • Added new script numpy2.py for checking and proposing numpy 2.0 compatibility updates
  • Modified main.py to incorporate changes from proposed_numpy_changes.json
  • Updated README with instructions for using numpy2.py and the updated main.py
  • Added functionality to generate CSV files for reviewing proposed changes
  • Included guidance in README for reviewing CSV updates using various tools
  • Refactored existing code for better readability and compliance with flake8 standards
  • Added error handling and logging improvements in the scripts
  • Updated documentation to reflect the new workflow involving numpy2.py and main.py

Note to Reviewers:

This implementation has been carefully designed. However, I value your opinions and welcome any feedback on the design or implementation. This is the best attempt currently at a balanced approach considering future hotfix applications will need to be done carefully.

Also the packages in the "protect_dict" are just placeholder for now, once this is approved as an approach there will be work to refine this list

@cbouss cbouss self-requested a review June 17, 2024 17:38
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
@JamesRobertsonGames
Copy link
Contributor Author

@cbouss thanks for the review, made the suggestions and reuploaded the most upto date output

main.py Outdated Show resolved Hide resolved
@JamesRobertsonGames
Copy link
Contributor Author

Much better output now in the output.txt @cbouss

@cbouss
Copy link
Contributor

cbouss commented Jun 20, 2024

Besides the diff it would be great to have some analysis on the repodata before and after the patch. I feel this would help ensure all is covered.

Also, I have been thinking, what if we publish this patch, then start building python packages with no upper bounds (because they support numpy 2), and someone regenerate a patch? A <1 bound would be wrongfully added. So perhaps instead we could have a configuration file like:

pandas: 2.2.2,
scikit-lear: 1.4.2,
...

Which could be used to patch like (pseudo code):

if pkg in numpy2_protect_dict:
  if VersionOrder(version) <= VersionOrder(numpy2_protect_dict['version']):
    add_dep(numpy < 2.0a0)

This way if needed we can tweak the configuration file and retain more control.

main.py Outdated Show resolved Hide resolved
@JamesRobertsonGames JamesRobertsonGames marked this pull request as draft July 4, 2024 09:30
numpy2_patch.json Outdated Show resolved Hide resolved
@cbouss cbouss requested a review from ryanskeith July 26, 2024 18:41
Copy link
Contributor

@ryanskeith ryanskeith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking on this over the weekend, I feel there are two major things to ask for (if I understand the code correctly).

First, let's simplify numpy2.py code.

  • Let's rename to generate_numpy2.py changes
  • Remove all code not being used and most reporting type code.
  • Simplify the data structure in the numpy2_patch.json
  • Move testing code to the tests folder

Second, let's simplify the incorporation of the numpy2 patches into main.py

  • Remove any logging or side product reporting
  • Use standard dep replacement code

main.py Outdated
import requests
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea to add in logging into this script. This script is feed into a conda-index process.

main.py Outdated
return json.load(f)
except FileNotFoundError:
logger.error("numpy2_patch.json not found. Aborting hotfixes.")
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove this block and use my oneliner below.

main.py Outdated
sys.exit(1)


NUMPY_2_CHANGES = load_numpy2_changes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NUMPY_2_CHANGES = load_numpy2_changes()
NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())

Of course, add in from pathlib import Path above.

test_numpy.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code that tests code should be put in the tests directory.

numpy2.py Outdated Show resolved Hide resolved
main.py Outdated
@@ -1534,6 +1635,8 @@ def do_hotfixes(base_dir):
def main():
base_dir = join(dirname(__file__), CHANNEL_NAME)
do_hotfixes(base_dir)
if NUMPY_2_CHANGES != {}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be creating csv files as a by product of the hotfixing process.

main.py Outdated
@@ -671,6 +771,9 @@ def patch_record_in_place(fn, record, subdir):
depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
break

if NUMPY_2_CHANGES is not {}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if NUMPY_2_CHANGES is not {}:
if NUMPY_2_CHANGES:

Python truthiness. Depending on how you generated the patch. We should either apply all the numpy patches at the very beginning or at the very end. There are other places in main where numpy is touched and gets an upper bound. We don't want to overwrite them or redo them.

main.py Outdated
"""
if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
return
changes = NUMPY_2_CHANGES[subdir][filename]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can then refer to subdir_changes from above.

main.py Outdated
return None


def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'):

main.py Outdated
- filename (str): The name of the file being processed.
- sort_type (str, optional): The key in the 'change' dictionary to sort the CSV data by. Defaults to 'reason'.
"""
for i, dep in enumerate(depends):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, dep in enumerate(depends):
replace_dep(depends, change["original"], change["updated"])

replace_dep is the standard usage for replacing deps in main.py.

@JamesRobertsonGames
Copy link
Contributor Author

@cbouss @ryanskeith

Should have addressed the comments, the output is looking good on the test-hotfix.py to me, removed all non-essential logging and simplified the solution based on the suggestions. Let me know if I have missed anything or anything new is observed :)

Copy link
Contributor

@ryanskeith ryanskeith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so close.

My biggest recommendation is to change how you are storing your data for faster data accesses to avoid full searches of numpy 2 changes for every single file in repodata.json.

@@ -0,0 +1,211 @@
from os.path import dirname, isdir, isfile, join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always recommend using pathlib but this is a small point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need to update this to reflect current code changes.

import re

numpy2_protect_dict = {
'add_bound_to_unspecified': True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be more of a flag than belonging to this dictionary. Putting Like with Like to make data easy.

new_dependency = f"{dependency},<2.0a0" if len(parts) > 1 else f"{dependency} <2.0a0"
collect_proposed_change(package_subdir, filename, dependency_type, dependency,
new_dependency, "Version comparison failed")
elif numpy2_protect_dict.get('add_bound_to_unspecified', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this as a flag seems like a better idea here as suggested above.

generate_numpy2_patch.py Outdated Show resolved Hide resolved
main.py Outdated
@@ -5,12 +5,12 @@
import os
import re
import sys
import requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to follow the isort order for imports:
Python Standard lib import
Python Standard from lib import

Then third party imports.

main.py Outdated
- subdir: The subdirectory of the record.
- filename: The filename of the record.
"""
relevant_changes = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you did this but we can make this much much easier if we change the data structure. I think we should have a mapping of mappings to avoid having to do a full search on the numpy list for every single file in the repodata.json.

Before you had a dictionary which was a dictionary of subdirs containing a dictionary of filenames containing a list of changes. I am suggesting you go back to that in part. Have a dictionary of subdirs, each containing a dictionary of filenames as keys storing the dictionary of values.

This will make this search go much much faster.

if subdir not in NUMPY_2_CHANGES:
    return

relavant_changes = NUMPY_2_changes["subdirectory"].get(filename, {})

main.py Outdated
replace_dep(depends, change["original"], change["updated"])


def _get_dependency_list(record, change_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this warrants its own function. If you store "depends" and "constrains" directly in the numpy2_patch.json file, the replace_dep line becomes:

replace_dep(record[change["change_type"]], change["original"], change["updated"]

Your process already did all the cleaning so we shouldn't have any weird third cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also gets rid of the unnecessary for loop.

main.py Outdated
@@ -686,6 +731,9 @@ def patch_record_in_place(fn, record, subdir):
depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
break

if NUMPY_2_CHANGES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated before, as this block changes many different files and NOT a targeted file. It should be run last.

Copy link
Contributor

@ryanskeith ryanskeith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is progressing quite well. Here are a few things.

  • I think we can safely drop "numpy-base" from the list as it has never been triggered
    • This should simplify the logic in places so you can directly ask and not have to do the reverse.
  • I think as this code has some complex nested if/else statements (5 deep in one severe case!), you need more inline comments as it is hard to follow and will be hard to follow for future devs. My gut tells me there is still some simplifications that should occur.
    • Specifically more comments for generate_numpy2_patch:update_numpy_dependencies and :main
  • Lastly, how are you checking for no arch again? the filename will not work in the noarch case.

generate_numpy2_patch.py Outdated Show resolved Hide resolved
main.py Outdated
if not change:
return
else:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do you feel that you will have a failure when you have been given a filename the record will fail? I don't think you need to be so protective here.

try:
for dep in depends:
if dep.split()[0] in ["numpy", "numpy-base"]:
update_numpy_dependencies(depends, record, "dep", subdir, fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
update_numpy_dependencies(depends, record, "dep", subdir, fn)
update_numpy_dependencies(depends, record, "depends", subdir, fn)

update_numpy_dependencies(depends, record, "dep", subdir, fn)
for constrain in constrains:
if constrain.split()[0] in ["numpy", "numpy-base"]:
update_numpy_dependencies(constrains, record, "constr", subdir, fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
update_numpy_dependencies(constrains, record, "constr", subdir, fn)
update_numpy_dependencies(constrains, record, "constrains", subdir, fn)

- reason: The reason for the change.
"""
# change dep and constr to dependency and constraint
if change_type == "dep":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By giving the full change_type on creation, you don't need this if else statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of lines 55-58 now.

main.py Show resolved Hide resolved
main.py Outdated
change = NUMPY_2_CHANGES[subdir].get(filename)
if not change:
return
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need an else in this case.

logger.error(f"numpy 2.0.0 error {fn}: {e}")

json_filename = Path("numpy2_patch.json")
with json_filename.open('w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary but this can be a one liner.

Suggested change
with json_filename.open('w') as f:
json_filename.write_text(json.dumps(dict(NUMPY_2_CHANGES), indent=2)

@JamesRobertsonGames
Copy link
Contributor Author

@ryanskeith done your suggestions, on the noarch I think a previous review concluded that there is little point doing noarch in this instance? @cbouss if you have an opinion on this feel free to wade in! Happy to handle this edge case, or we could deploy without this and then rectify the noarch packages in another PR?

@cbouss
Copy link
Contributor

cbouss commented Aug 9, 2024

@ryanskeith done your suggestions, on the noarch I think a previous review concluded that there is little point doing noarch in this instance? @cbouss if you have an opinion on this feel free to wade in! Happy to handle this edge case, or we could deploy without this and then rectify the noarch packages in another PR?

Yes I believe it is best to not patch noarch packages, at least for now. I am not sure of which edge cases this would uncover, and it is now time to release numpy 2.

Copy link
Contributor

@cbouss cbouss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cbouss cbouss requested a review from ryanskeith August 12, 2024 17:40
Copy link
Contributor

@ryanskeith ryanskeith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. You could remove the dead code in generate_numpy2_patch.py, but it isn't necessary.

- reason: The reason for the change.
"""
# change dep and constr to dependency and constraint
if change_type == "dep":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of lines 55-58 now.

@JamesRobertsonGames JamesRobertsonGames merged commit c0924de into master Aug 13, 2024
2 of 8 checks passed
@JamesRobertsonGames JamesRobertsonGames deleted the numpy2-hotfix branch August 13, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants