-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update water content validation #143
Comments
@turbomam FYI |
Thanks for the examples. MIxS specifies that values for
DataHarmonizer takes input that's flattened, not structured, so we have translated the MIxS
Informally speaking That allows us to parse the flattened string from DataHarmonizer into the But it's not compatible with values that you and other scientist use! I think you are suggesting that we turn all validation off, allowing any string. That would be a quick fix, but it would lead to worse search and unit conversion results. I would prefer to globally revise Here's what your examples (plus ont of my own) get parsed into if we send them directly to the quantulum3 parser without any additional validation. Most but not all of them can be parsed faithfully into values and units. from quantulum3 import parser
examples = [
"75%",
"75 %",
".75",
"5 g water / g dry soil",
"5 cc per cc",
"5 cc/cc",
".75",
"75% water",
".75 g water per g soil WHC",
"60% WFPS",
"5 g/g",
]
for ex in examples:
ex_parsed=parser.parse(ex)
print(f'"{ex}" is parsed into {ex_parsed}')
PS |
@turbomam catching up on this issue. I think your proposed solution here makes a ton of sense.
Am I reading this correctly that these two would incorrectly parse? Is there a way to address this?
Agree that this would definitely also fix #140 |
I'll make a regexr test page containing my proposed validation and you can try some values that you think should pass and values that you think shouldn't pass. Even better, you could make a list of three or four of each in advance. The two parsing result you provided are the real output from the value/unit parser we use, quantulum3. Getting those compound units to parse out would require us writing our own custom NMDC value/unit parser, or retraining the quantulum3 parser. Note that unit parsing and value/unit validation are two different things. |
Based on discussion at Infrastructure sync meeting, adding to the August sprint |
Will update submission portal schema to allow for validation to pass. However, the chosen solution makes it difficult to parse the results & will need re-visited. Marking this as the interim fix. See #148 for next step in correcting this. |
@mslarae13 is the interim fix done? Can this issue be closed? |
yes. water content validates now |
Water content and water content method are 2 MIxS fields used in the NMDC submission template.
Currently, MIxS says
water content method
water content
Water content (in soils and sediment can be measured in a variety of ways, hence the water content method fields.
You can use
All are slightly different formatting.
How do we validate this?
The text was updated successfully, but these errors were encountered: