Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation function raises FileNotFoundError regarding data YAML file #1310

Closed
eecavanna opened this issue Nov 8, 2023 · 5 comments · Fixed by #1340
Closed

Transformation function raises FileNotFoundError regarding data YAML file #1310

eecavanna opened this issue Nov 8, 2023 · 5 comments · Fixed by #1340
Assignees
Labels
bug Something isn't working high priority jupyter X SMALL Less than 8 hours, less than 1 day

Comments

@eecavanna
Copy link
Collaborator

Summary

While running a transformation function named fix_award_dois from within a Python notebook that imports the containing Python class from the nmdc-schema Python package on PyPI, Python raised a FileNotFoundError exception because it couldn't find a file at the path specified.

Screenshot

image

Related snippets

Invocation of load_yaml_file function:

study_doi_data = load_yaml_file(
filename='assets/misc/study_dois_changes.yaml')

Definition of load_yaml_file function:

def load_yaml_file(filename):
"""Loads a YAML file into a Python dict."""
with open(filename, "r") as f:
data = yaml.safe_load(f)
return data

Proposal

Reference the file in a way that works regardless of where the function is being invoked.

Suggestion (untested):

+ from importlib import resources

# ...

 def load_yaml_file(filename): 
     """Loads a YAML file into a Python dict.""" 
-    with open(filename, "r") as f: 
+    with resources.open(filename, "r") as f: 
        data = yaml.safe_load(f) 
     return data 

Docs:

@eecavanna eecavanna added bug Something isn't working high priority X SMALL Less than 8 hours, less than 1 day labels Nov 8, 2023
@eecavanna eecavanna changed the title Failed to run migration function that references data file Transformation function raises FileNotFoundError regarding data YAML file Nov 8, 2023
@eecavanna
Copy link
Collaborator Author

As a workaround for the current migration, I can download the referenced file from the nmdc-schema GitHub repo and put it at the path assets/misc/study_dois_changes.yaml relative to the Python notebook.

image

image

@turbomam
Copy link
Member

turbomam commented Nov 8, 2023

Thanks @eecavanna

@brynnz22 would be it be OK for me to change the assignee to you?

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 8, 2023

The underlying issue is still present in nmdc-schema, and will come into play again the next time code that imports the nmdc-schema Python package calls the load_yaml_function (or calls something that calls that function). That's because that function uses Python's open() function with a relative path, which is relative to the current working directory.

Python docs about open(): https://docs.python.org/3/library/functions.html#open

@turbomam
Copy link
Member

turbomam commented Nov 8, 2023

apologies for closing

@eecavanna eecavanna assigned eecavanna and unassigned turbomam Nov 8, 2023
@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 9, 2023

For future reference: I found this code in the nmdc-schema repo (while working on something else). Maybe the same approach (i.e. way of getting the contents of a file) can be used in this case.

return io.BytesIO(pkgutil.get_data(__name__, "nmdc_schema_merged.yaml"))

Docs: https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority jupyter X SMALL Less than 8 hours, less than 1 day
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants