-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should have some utilities to validate and re-cast to the target schema given they exist in functional forms. #37
Comments
Here is a similar function for the data schema: https://github.com/mmcdermott/MEDS_transforms/blob/fd8ffccf7823bd957165d99a1a2aa5e97943fd0c/src/MEDS_transforms/extract/finalize_MEDS_data.py#L16 |
@EthanSteinberg, thoughts on this? If you think this would be useful, I'd be a proponent of bringing it over now rather than later. |
I agree, it would be useful to put these validation checks here. |
I think we want this validation code to use pyarrow though. I don't want to add another dependency (polars) to this repository |
I agree that we don't want to add a dependency and think the main validation code or re-typing code should be in pyarrow. I think we could consider having code that is only runnable if try:
import polars as pl
# validation code here...
except:
pass but I think starting with pyarrow would still be very helpful. |
Sample code for the label schema as well, in case it is helpful once we decide to implement this: https://github.com/justin13601/ACES/blob/e9655390f25bf79167370a802176bcf671cefa44/src/aces/__main__.py#L35 |
Some possibly related libraries: |
E.g., see this function: https://github.com/mmcdermott/MEDS_transforms/blob/573816cbf3f6005a8fc25eb25424706ca0c97b6e/src/MEDS_transforms/extract/finalize_MEDS_metadata.py#L28
This is polars specific, obviously, which we don't want to be, but having the ability to identify if a
codes.parquet
or adata/*.parquet
file meets a valid extended schema and converting to the right pyarrow schema is very useful (especially because there exist minor differences we should be cognizant of likelarge_string
vsstring
, etc.).Tagging @EthanSteinberg for your input.
The text was updated successfully, but these errors were encountered: