-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize package names #24
Comments
My workflow for normalizing the water package names was like this:
I put the mapped env_values in field name 'norm_env_package'. Perhaps we should reference the MIxS package label; e.g., 'mixs5_env_package'. That make it more clear which env_package name we are normalizing on. Also, I left some env_package values as their original value in the norm_env_package field; e.g. 'sea water', 'waste water'. On reflection, I think I should I have mapped these to 'water' b/c that is the name of the MIxS package. The original values are still in the env_package field. But do we want to normalize on the subset of 'water' packages to a normalized name? For example, do we want to normalize 'wastewater' and 'waste water'? My proposal for normalization mappings:
|
@cmungall In the short term we can normalize on the controlled terms in the mixs standard. But, in the long term it would be good to normalize the package names by referencing URIs in the mixs-rdf project. We haven't created URIs for package names yet, but these seems like the next logical step. |
Ignore the link the ENVO issue about medical infrastructure. I accidentally posted it here. |
Results of package names provided by @cmungall in file
|
This should be done as a pre-processing step, part of overall ETL pipeline, such that each individual analysis does not need to do normalization
Currently done for water packages here:
https://nbviewer.jupyter.org/github/INCATools/biosample-analysis/blob/master/src/notebooks/water-package-profiling.ipynb
I am envisioning a general toolkit that performs this kind of repair on the whole TSV
The text was updated successfully, but these errors were encountered: