add numpy 2 hotfix to main #227

JamesRobertsonGames · 2024-06-14T13:31:23Z

This PR introduces new functionality for checking and updating package dependencies for numpy 2.0 compatibility. It also includes significant updates to the documentation to reflect these changes and provide guidance on using the new features.

Links

PKG-4837

Explanation of changes:

Added new script numpy2.py for checking and proposing numpy 2.0 compatibility updates
Modified main.py to incorporate changes from proposed_numpy_changes.json
Updated README with instructions for using numpy2.py and the updated main.py
Added functionality to generate CSV files for reviewing proposed changes
Included guidance in README for reviewing CSV updates using various tools
Refactored existing code for better readability and compliance with flake8 standards
Added error handling and logging improvements in the scripts
Updated documentation to reflect the new workflow involving numpy2.py and main.py

Note to Reviewers:

This implementation has been carefully designed. However, I value your opinions and welcome any feedback on the design or implementation. This is the best attempt currently at a balanced approach considering future hotfix applications will need to be done carefully.

Also the packages in the "protect_dict" are just placeholder for now, once this is approved as an approach there will be work to refine this list

main.py

Co-authored-by: Charles Bousseau <[email protected]>

JamesRobertsonGames · 2024-06-18T11:19:24Z

@cbouss thanks for the review, made the suggestions and reuploaded the most upto date output

main.py

JamesRobertsonGames · 2024-06-19T13:14:12Z

Much better output now in the output.txt @cbouss

cbouss · 2024-06-20T14:51:25Z

Besides the diff it would be great to have some analysis on the repodata before and after the patch. I feel this would help ensure all is covered.

Also, I have been thinking, what if we publish this patch, then start building python packages with no upper bounds (because they support numpy 2), and someone regenerate a patch? A <1 bound would be wrongfully added. So perhaps instead we could have a configuration file like:

pandas: 2.2.2,
scikit-lear: 1.4.2,
...

Which could be used to patch like (pseudo code):

if pkg in numpy2_protect_dict:
  if VersionOrder(version) <= VersionOrder(numpy2_protect_dict['version']):
    add_dep(numpy < 2.0a0)

This way if needed we can tweak the configuration file and retain more control.

main.py

numpy2_patch.json

ryanskeith

After thinking on this over the weekend, I feel there are two major things to ask for (if I understand the code correctly).

First, let's simplify numpy2.py code.

Let's rename to generate_numpy2.py changes
Remove all code not being used and most reporting type code.
Simplify the data structure in the numpy2_patch.json
Move testing code to the tests folder

Second, let's simplify the incorporation of the numpy2 patches into main.py

Remove any logging or side product reporting
Use standard dep replacement code

ryanskeith · 2024-07-26T19:10:00Z

main.py

 import requests
+import logging


I don't think it is a good idea to add in logging into this script. This script is feed into a conda-index process.

ryanskeith · 2024-07-26T19:49:58Z

main.py

+            return json.load(f)
+    except FileNotFoundError:
+        logger.error("numpy2_patch.json not found. Aborting hotfixes.")
+        sys.exit(1)


I would just remove this block and use my oneliner below.

ryanskeith · 2024-07-26T19:51:05Z

main.py

+        sys.exit(1)
+
+
+NUMPY_2_CHANGES = load_numpy2_changes()


Suggested change

NUMPY_2_CHANGES = load_numpy2_changes()

NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())

Of course, add in from pathlib import Path above.

ryanskeith · 2024-07-26T20:46:51Z

test_numpy.py

code that tests code should be put in the tests directory.

numpy2.py

ryanskeith · 2024-07-26T23:54:41Z

main.py

@@ -1534,6 +1635,8 @@ def do_hotfixes(base_dir):
 def main():
    base_dir = join(dirname(__file__), CHANNEL_NAME)
    do_hotfixes(base_dir)
+    if NUMPY_2_CHANGES != {}:


We shouldn't be creating csv files as a by product of the hotfixing process.

ryanskeith · 2024-07-27T00:29:13Z

main.py

@@ -671,6 +771,9 @@ def patch_record_in_place(fn, record, subdir):
                depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
                break

+    if NUMPY_2_CHANGES is not {}:


Suggested change

if NUMPY_2_CHANGES is not {}:

if NUMPY_2_CHANGES:

Python truthiness. Depending on how you generated the patch. We should either apply all the numpy patches at the very beginning or at the very end. There are other places in main where numpy is touched and gets an upper bound. We don't want to overwrite them or redo them.

ryanskeith · 2024-07-29T14:04:10Z

main.py

+    """
+    if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
+        return
+    changes = NUMPY_2_CHANGES[subdir][filename]


You can then refer to subdir_changes from above.

ryanskeith · 2024-07-29T14:05:26Z

main.py

+    return None
+
+
+def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):


Suggested change

def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):

def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'):

ryanskeith · 2024-07-29T14:07:28Z

main.py

+    - filename (str): The name of the file being processed.
+    - sort_type (str, optional): The key in the 'change' dictionary to sort the CSV data by. Defaults to 'reason'.
+    """
+    for i, dep in enumerate(depends):


Suggested change

for i, dep in enumerate(depends):

replace_dep(depends, change["original"], change["updated"])

replace_dep is the standard usage for replacing deps in main.py.

JamesRobertsonGames · 2024-08-01T11:14:45Z

@cbouss @ryanskeith

Should have addressed the comments, the output is looking good on the test-hotfix.py to me, removed all non-essential logging and simplified the solution based on the suggestions. Let me know if I have missed anything or anything new is observed :)

ryanskeith

This is so close.

My biggest recommendation is to change how you are storing your data for faster data accesses to avoid full searches of numpy 2 changes for every single file in repodata.json.

ryanskeith · 2024-08-01T15:13:29Z

generate_numpy2_patch.py

@@ -0,0 +1,211 @@
+from os.path import dirname, isdir, isfile, join


I always recommend using pathlib but this is a small point.

ryanskeith · 2024-08-01T15:15:12Z

README.md

Looks like you need to update this to reflect current code changes.

ryanskeith · 2024-08-01T15:16:16Z

generate_numpy2_patch.py

+import re
+
+numpy2_protect_dict = {
+    'add_bound_to_unspecified': True,


This seems to be more of a flag than belonging to this dictionary. Putting Like with Like to make data easy.

ryanskeith · 2024-08-01T15:38:14Z

generate_numpy2_patch.py

+                            new_dependency = f"{dependency},<2.0a0" if len(parts) > 1 else f"{dependency} <2.0a0"
+                            collect_proposed_change(package_subdir, filename, dependency_type, dependency,
+                                                    new_dependency, "Version comparison failed")
+                elif numpy2_protect_dict.get('add_bound_to_unspecified', False):


Using this as a flag seems like a better idea here as suggested above.

generate_numpy2_patch.py

ryanskeith · 2024-08-01T15:44:29Z

main.py

@@ -5,12 +5,12 @@
 import os
 import re
 import sys
+import requests


This file seems to follow the isort order for imports:
Python Standard lib import
Python Standard from lib import

Then third party imports.

ryanskeith · 2024-08-01T15:59:19Z

main.py

+    - subdir: The subdirectory of the record.
+    - filename: The filename of the record.
+    """
+    relevant_changes = [


I see why you did this but we can make this much much easier if we change the data structure. I think we should have a mapping of mappings to avoid having to do a full search on the numpy list for every single file in the repodata.json.

Before you had a dictionary which was a dictionary of subdirs containing a dictionary of filenames containing a list of changes. I am suggesting you go back to that in part. Have a dictionary of subdirs, each containing a dictionary of filenames as keys storing the dictionary of values.

This will make this search go much much faster.

if subdir not in NUMPY_2_CHANGES: return relavant_changes = NUMPY_2_changes["subdirectory"].get(filename, {})

ryanskeith · 2024-08-01T16:04:02Z

main.py

+        replace_dep(depends, change["original"], change["updated"])
+
+
+def _get_dependency_list(record, change_type):


I don't think this warrants its own function. If you store "depends" and "constrains" directly in the numpy2_patch.json file, the replace_dep line becomes:

replace_dep(record[change["change_type"]], change["original"], change["updated"]

Your process already did all the cleaning so we shouldn't have any weird third cases.

This also gets rid of the unnecessary for loop.

ryanskeith · 2024-08-01T16:06:00Z

main.py

@@ -686,6 +731,9 @@ def patch_record_in_place(fn, record, subdir):
                depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
                break

+    if NUMPY_2_CHANGES:


As stated before, as this block changes many different files and NOT a targeted file. It should be run last.

ryanskeith

This is progressing quite well. Here are a few things.

I think we can safely drop "numpy-base" from the list as it has never been triggered
- This should simplify the logic in places so you can directly ask and not have to do the reverse.
I think as this code has some complex nested if/else statements (5 deep in one severe case!), you need more inline comments as it is hard to follow and will be hard to follow for future devs. My gut tells me there is still some simplifications that should occur.
- Specifically more comments for generate_numpy2_patch:update_numpy_dependencies and :main
Lastly, how are you checking for no arch again? the filename will not work in the noarch case.

generate_numpy2_patch.py

ryanskeith · 2024-08-02T15:20:13Z

main.py

+    if not change:
+        return
+    else:
+        try:


In what case do you feel that you will have a failure when you have been given a filename the record will fail? I don't think you need to be so protective here.

ryanskeith · 2024-08-02T15:22:41Z

generate_numpy2_patch.py

+                    try:
+                        for dep in depends:
+                            if dep.split()[0] in ["numpy", "numpy-base"]:
+                                update_numpy_dependencies(depends, record, "dep", subdir, fn)


Suggested change

update_numpy_dependencies(depends, record, "dep", subdir, fn)

update_numpy_dependencies(depends, record, "depends", subdir, fn)

ryanskeith · 2024-08-02T15:22:55Z

generate_numpy2_patch.py

+                                update_numpy_dependencies(depends, record, "dep", subdir, fn)
+                        for constrain in constrains:
+                            if constrain.split()[0] in ["numpy", "numpy-base"]:
+                                update_numpy_dependencies(constrains, record, "constr", subdir, fn)


Suggested change

update_numpy_dependencies(constrains, record, "constr", subdir, fn)

update_numpy_dependencies(constrains, record, "constrains", subdir, fn)

ryanskeith · 2024-08-02T15:25:06Z

generate_numpy2_patch.py

+    - reason: The reason for the change.
+    """
+    # change dep and constr to dependency and constraint
+    if change_type == "dep":


By giving the full change_type on creation, you don't need this if else statement.

You can get rid of lines 55-58 now.

main.py

ryanskeith · 2024-08-02T15:31:55Z

main.py

+    change = NUMPY_2_CHANGES[subdir].get(filename)
+    if not change:
+        return
+    else:


We don't need an else in this case.

ryanskeith · 2024-08-02T15:35:36Z

generate_numpy2_patch.py

+                        logger.error(f"numpy 2.0.0 error {fn}: {e}")
+
+    json_filename = Path("numpy2_patch.json")
+    with json_filename.open('w') as f:


Not necessary but this can be a one liner.

Suggested change

with json_filename.open('w') as f:

json_filename.write_text(json.dumps(dict(NUMPY_2_CHANGES), indent=2)

JamesRobertsonGames · 2024-08-08T09:44:29Z

@ryanskeith done your suggestions, on the noarch I think a previous review concluded that there is little point doing noarch in this instance? @cbouss if you have an opinion on this feel free to wade in! Happy to handle this edge case, or we could deploy without this and then rectify the noarch packages in another PR?

cbouss · 2024-08-09T14:04:17Z

@ryanskeith done your suggestions, on the noarch I think a previous review concluded that there is little point doing noarch in this instance? @cbouss if you have an opinion on this feel free to wade in! Happy to handle this edge case, or we could deploy without this and then rectify the noarch packages in another PR?

Yes I believe it is best to not patch noarch packages, at least for now. I am not sure of which edge cases this would uncover, and it is now time to release numpy 2.

cbouss

LGTM!

ryanskeith

Looks good. You could remove the dead code in generate_numpy2_patch.py, but it isn't necessary.

ryanskeith · 2024-08-12T18:33:26Z

generate_numpy2_patch.py

+    - reason: The reason for the change.
+    """
+    # change dep and constr to dependency and constraint
+    if change_type == "dep":


You can get rid of lines 55-58 now.

JamesRobertsonGames added 2 commits June 14, 2024 14:29

add numpy 2 hotfix to main

7f0f367

ensure the lint passes

39ec561

cbouss self-requested a review June 17, 2024 17:38

cbouss reviewed Jun 17, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

main.py Outdated Show resolved Hide resolved

main.py Outdated Show resolved Hide resolved

main.py Outdated Show resolved Hide resolved

main.py Outdated Show resolved Hide resolved

JamesRobertsonGames and others added 3 commits June 18, 2024 11:07

Apply suggestions from code review

013d769

Co-authored-by: Charles Bousseau <[email protected]>

re-add numpy old hotfix

b011dcc

ensure find packages with lock without ==

e38027f

propogate find changes to be made

67417d8

cbouss reviewed Jun 18, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

ensure correct application of upper bound hotfix

0a93cab

JamesRobertsonGames requested a review from cbouss June 19, 2024 13:25

avoid anaconda_depends for repin of numpy

d0d073b

JamesRobertsonGames added 5 commits June 25, 2024 17:55

add protection list and logic to the n2 hotfix

d35eab8

lint issues

513846a

lint issue

9e1ab7b

pyyaml added

b0fd63e

pyyaml in testenv

71306c3

skupr-anaconda reviewed Jun 27, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

JamesRobertsonGames marked this pull request as draft July 4, 2024 09:30

JamesRobertsonGames added 8 commits July 16, 2024 13:55

modify yaml and ensure upto date filtering of edge cases

ce7e20b

remove need for yaml

0863898

remove pyyaml

c979b12

add numpy 2 hotfix rework

881cf75

formatting changes

18e3c4d

main trimmed whitespace

e3cb04b

repair lint issues

86c54a1

remove all the lint errors from flake

0027dab

cbouss reviewed Jul 26, 2024

View reviewed changes

numpy2_patch.json Outdated Show resolved Hide resolved

JamesRobertsonGames added 2 commits July 26, 2024 17:43

remove logging for items not being updated

d2ab4c9

regenerate n2 patch

2d59ddb

cbouss requested a review from ryanskeith July 26, 2024 18:41

Merge branch 'master' into numpy2-hotfix

7f237ee

ryanskeith requested changes Jul 29, 2024

View reviewed changes

JamesRobertsonGames added 3 commits August 1, 2024 08:53

make changes suggested in previous review

c1b7795

linting errors fixed

7639c1a

delete numpy2 config and lint modifications

c09fa2f

JamesRobertsonGames requested review from ryanskeith and cbouss August 1, 2024 08:05

ryanskeith requested changes Aug 1, 2024

View reviewed changes

JamesRobertsonGames added 2 commits August 1, 2024 20:41

speed up patching process with better data handling

4326c27

README updated with changes

1eb8cd5

ryanskeith requested changes Aug 2, 2024

View reviewed changes

make changes for legibility to per review

e5bb64f

linting correctors

3b6625c

cbouss approved these changes Aug 9, 2024

View reviewed changes

cbouss requested a review from ryanskeith August 12, 2024 17:40

ryanskeith approved these changes Aug 12, 2024

View reviewed changes

JamesRobertsonGames added 5 commits August 13, 2024 11:39

remove correction code

c54945f

revert to force git to pick up the change

36222d0

revert depends and contrains

4d8e248

ensure return case is not needed

b6595af

remove protect dict to await issues

4d468b1

JamesRobertsonGames merged commit c0924de into master Aug 13, 2024
2 of 8 checks passed

JamesRobertsonGames deleted the numpy2-hotfix branch August 13, 2024 13:55

	NUMPY_2_CHANGES = load_numpy2_changes()
	NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())

		return None


		def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):

	for i, dep in enumerate(depends):
	replace_dep(depends, change["original"], change["updated"])

		@@ -0,0 +1,211 @@
		from os.path import dirname, isdir, isfile, join

		replace_dep(depends, change["original"], change["updated"])


		def _get_dependency_list(record, change_type):

	update_numpy_dependencies(depends, record, "dep", subdir, fn)
	update_numpy_dependencies(depends, record, "depends", subdir, fn)

	update_numpy_dependencies(constrains, record, "constr", subdir, fn)
	update_numpy_dependencies(constrains, record, "constrains", subdir, fn)

	with json_filename.open('w') as f:
	json_filename.write_text(json.dumps(dict(NUMPY_2_CHANGES), indent=2)

add numpy 2 hotfix to main #227

add numpy 2 hotfix to main #227

Conversation

JamesRobertsonGames commented Jun 14, 2024 • edited Loading

Links

Explanation of changes:

JamesRobertsonGames commented Jun 18, 2024

JamesRobertsonGames commented Jun 19, 2024

cbouss commented Jun 20, 2024

ryanskeith left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesRobertsonGames commented Aug 1, 2024

ryanskeith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanskeith left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesRobertsonGames commented Aug 8, 2024

cbouss commented Aug 9, 2024

cbouss left a comment

Choose a reason for hiding this comment

ryanskeith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesRobertsonGames commented Jun 14, 2024 •

edited

Loading

ryanskeith left a comment •

edited

Loading

ryanskeith left a comment •

edited

Loading