-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: pydantic causing simple parse script to fail on build #141
Comments
FYI, it seems like this works just fine if I revert back to Although I would prefer to use the new release, since I'd rather pass in a list of files to the builder and do the directory crawling on my own. In most cases, I'll be traversing an |
Sorry for these streams of consciousness. I'm realizing that even the new version uses the base paths to crawl and find the files, which has been breaking for me when trying to traverse an S3 directory. I'd like an option to just pass in raw S3 files. I hacked this together with a local install of the package via:
def get_assets(self):
self.assets = self.paths
return self
Probably (1) and (4) are unsustainable, because I'm sure it does something but I'm in a rush. (2) Could be fixed of course with focus on argument validation (which might be related to issues with (1)). (3) Could be some sort of switch in Happy to lead an effort on a PR here if you find this valuable. This would be huge for my work, since I can just do the crawling myself and pass a list of Zarr files from a private S3 store. Please let me know if there's some way else to make the crawler work with an S3 store, though. Based on the size of our datasets on S3, it's not feasible for me to build the catalog locally (we're deriving/downloading/publishing the datasets natively through AWS). |
Thank you for your patience @riley-brady! i just realized i accidentally stopped watching this repository a while ago.
It turns out that guide is a bit outdated ;(
w/ the keyword-only argument requirement via cat_builder.build(parsing_func=parse_dummy) cat_builder.save(name=..., another_argument=..., another_argument=...)
👍🏽 for providing an option to disable the crawling or allowing users to provide custom crawlers. if you are looking into a quick workaround, the following might work import joblib
def parsing_func(file):
....
cat_builder = Builder(.....)
cat_builders.assets = assets # assets here is a list of files to parse
cat_builders.entries = joblib.Parallel(**cat_builder.joblib_parallel_kwargs)(
joblib.delayed(parsing_func)(asset, **parsing_func_kwargs) for asset in cat_builder.assets
)
cat_builder.df = pd.DataFrame(cat_builder.entries)
cat_builder = cat_builder.clean_dataframe() After this, you should be able to save the catalog cat_builder.save(name=..., another_argument=..., another_argument=...) 👍🏽 👍🏽 for a PR when you have time... |
What happened?
When attempting to run
Build.build(custom_parser)
I get apydantic
error that kills the catalog build, despite following the tutorial and successfully running the parser on test files.What did you expect to happen?
A catalog to build successfully
Minimal Complete Verifiable Example
Relevant log output
Anything else we need to know?
pydantic
version: '1.10.2'ecgtools
version: '2022.10.7'The text was updated successfully, but these errors were encountered: