TG2-VALIDATION_DAY_STANDARD #147

ArthurChapman · 2018-08-26T10:41:09Z

TestField	Value
GUID	47ff73ba-0028-4f79-9ce1-ee7008d66498
Label	VALIDATION_DAY_STANDARD
Description	Is the value of dwc:day an integer between 1 and 31 inclusive?
TestType	Validation
Darwin Core Class	dwc:Event
Information Elements ActedUpon	dwc:day
Information Elements Consulted
Expected Response	INTERNAL_PREREQUISITES_NOT_MET if dwc:day is bdq:Empty; COMPLIANT if the value of the field dwc:day is an integer between 1 and 31 inclusive; otherwise NOT_COMPLIANT.
Data Quality Dimension	Conformance
Term-Actions	DAY_STANDARD
Parameter(s)
Source Authority
Specification Last Updated	2023-09-18
Examples	[dwc:day="15": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:day is in range"]
	[dwc:day="32": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:day is not in range"]
Source	TDWG2018 DQIG Meeting; TG2-Gainesville
References
Example Implementations (Mechanisms)	FilteredPush/Kurator:event_date_qc 10.5281/zenodo.596795.
Link to Specification Source Code	event_date_qc DwCEventDQ.validationDayStandard()
Notes	This is part of a group of similar tests (VALIDATION_DAY_INRANGE (8d787cb5-73e2-4c39-9cd1-67c7361dc02e), VALIDATION_STARTDAYOFYEAR_INRANGE (85803c7e-2a5a-42e1-b8d3-299a44cafc46), VALIDATION_ENDDAYOFYEAR_INRANGE (9a39d88c-7eee-46df-b32a-c109f9f81fb8)).

ArthurChapman · 2018-08-30T00:05:35Z

We have changed the description of this from "is not an unambiguous value" to "is not an integer between 1 and 31 inclusive.

chicoreus · 2018-08-30T05:24:15Z

Isn't this a duplicate of #125 ?

chicoreus · 2018-09-07T15:56:48Z

Added guid.

…DESCRIPTION: VALIDATION_DAY_NOTSTANDARD now returns NOT_COMPLIANT for any dwc:day value except an integer in the range 1-31, prerequisites not met no longer thrown internally to this implementation as that is only returned if term is absent.

chicoreus · 2019-03-13T13:39:54Z

Field not present as a prerequisite is a significant barrier for implemenations. An implementation of an atomic test as a method which takes values as parameters e.g. validateDayNotStandard(day), is very likely to be isolated from a larger framework which understands whether or not a term is present in an input data set or not. Bringing "is present" into the test definition conflates areas of concern.

This gets back to the need for a definition of isEmpty - if we define isEmpty to include null values, empty strings, and cases where the term isn't present in the data set then we can consistently define prerequisites as being not empty.

I thus propose that we change the element of the definition "INTERNAL_PREREQUISITES_NOT_MET if the field dwc:day is not present; " to "INTERNAL_PREREQUISITES_NOT_MET if the field dwc:day is EMPTY;", reference a standard definition of EMPTY, and propagate this change to all other test definitions.

tucotuco · 2019-03-26T20:28:15Z

@chicoreus I see what you mean about being able to abstract the test in code be independent of the record structure that it might act on. Defining EMPTY to include when a field isn't present seems an elegant simplification that I don't anticipate would cause any issues. I mean, once it comes to the test implementation, it doesn't much matter of there was a source value that was null or if there was no source field. Making the change would likely simplify all the descriptions that reference 'not present'. Downside: it means a review and rewrite of a bunch of tests - it looks like there might be 67 tests affected. I would not wish that on Lee or Arthur.

Tasilee · 2019-03-27T19:51:33Z

If setting up a definition of EMPTY will consolidate intent, so be it. I am happy to work through the tests and change the Expected response wording.

But how best to do it? My initial reaction is to add a new Issue for EMPTY that can therefore be referenced, otherwise, in relation to this work, it will be hanging?

What would the definition be?

EMPTY:
Field not present
Field contains a string of one or more null characters
Field contains a string of one or more blank characters
...

tucotuco · 2019-03-27T21:41:32Z

Hmm. Not sure a label is necessary since we can find them in a string search using 'EMPTY'. As for a definition, how about: EMPTY: A field that is needed as input is not present, or, the input field is present but there is no value in the field, or the field is present and the value of the field consists entirely of non-printing characters.

…

On Wed, Mar 27, 2019 at 4:51 PM Lee Belbin ***@***.***> wrote: If setting up a definition of EMPTY will consolidate intent, so be it. I am happy to work through the tests and change the Expected response wording. But how best to do it? My initial reaction is to add a new Issue for EMPTY that can therefore be referenced, otherwise, in relation to this work, it will be hanging? What would the definition be? EMPTY: Field not present Field contains a string of one or more null characters Field contains a string of one or more blank characters ... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#147 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAcP6w4RqyQm7kSnAAbfFyJg23Q-eX39ks5va8vGgaJpZM4WMu13> .

Tasilee · 2019-03-27T23:27:01Z

Thanks @tucotuco. I like the definition but where do we put it so that it is at least as referenceable as the tests and comments?

ArthurChapman · 2019-03-28T00:04:03Z

It goes into the Vocabulary #152

Tasilee · 2019-03-28T00:05:33Z

Thanks @ArthurChapman. Duh, I forgot about that :|. Will do it now.

ArthurChapman · 2019-03-28T00:07:16Z

But note we already have a definition given by @chicoreus. Perhaps we need to discuss more - see previous discussion in #111

Tasilee · 2019-03-28T00:13:46Z

I like @tucotuco 's :) as I can easily understand it, and added it to #111 and the vocab #152. Lets see what @chicoreus says. I'm sure he will find an issue? Ha.

ArthurChapman · 2019-03-28T00:31:19Z

Problem is, @tucotuco doesn't cover everything - like 9999, etc. and some other things that were in @chicoreus version. That is why we need to discuss more fully. @tucotuco and I will look at it more tomorrow.

chicoreus · 2019-03-28T03:56:28Z

I think a general definition of EMPTY on the line of @tucotuco's would work, if accompanied by notes about implementations in different languages (empty string, null (if the language supports it e.g in java, javascript, sql), undefined (e.g. in javascript), a char array of size 0 (e.g. in C)). I'm not quite sure, however, about non-printing characters (if using either an ascii or unicode definition), as those are characters and constitute data, and could represent mistyped values (e.g. something that's supposed to be a small integer that is being represented as a low ascii character). I'd be inclined to treat a non-printing character as non-empty and then let tests downstream fail. Based on the data I've seen in the wild (and @tucotuco has seen plenty more), I'd propose including several strings that are typical string serializations of a database null ("\N", "null", "NULL"). I also put on the table, but am not particularly happy with other common serializations of empty value markers ("**", "9999", etc.). The trouble with these is that some of them might be valid data values for some fields, so they don't seem to fit a generalized definition of empty.

tucotuco · 2019-03-28T12:11:31Z

I agree, non-printing characters are not best included for prerequisites not being met. It would not allow us to report that the content (and there is content) is not compliant, and that is what we really want the recipient to know. I think the same applies for all the string serializations of a database null, now that I think more about it. That is something being shared that is not compliant also.

…

On Thu, Mar 28, 2019 at 12:56 AM Paul J. Morris ***@***.***> wrote: I think a general definition of EMPTY on the line of @tucotuco <https://github.com/tucotuco>'s would work, if accompanied by notes about implementations in different languages (empty string, null (if the language supports it e.g in java, javascript, sql), undefined (e.g. in javascript), a char array of size 0 (e.g. in C)). I'm not quite sure, however, about non-printing characters (if using either an ascii or unicode definition), as those are characters and constitute data, and could represent mistyped values (e.g. something that's supposed to be a small integer that is being represented as a low ascii character). I'd be inclined to treat a non-printing character as non-empty and then let tests downstream fail. Based on the data I've seen in the wild (and @tucotuco <https://github.com/tucotuco> has seen plenty more), I'd propose including several strings that are typical string serializations of a database null ("\N", "null", "NULL"). I also put on the table, but am not particularly happy with other common serializations of empty value markers ("**", "9999", etc.). The trouble with these is that some of them might be valid data values for some fields, so they don't seem to fit a generalized definition of empty. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#147 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAcP65YJtOEQLgYsSQlH83w6ncge3yeyks5vbD1tgaJpZM4WMu13> .

tucotuco · 2019-06-12T15:48:10Z

Following the discussions arising from the event date case study for the BISS paper, I believe that this test should be updated to use the expected response and notes of TG2-VALIDATION_DAY_OUTOFRANGE (#125), which should then be deprecated. Everything else about this test is already up to date with the proposed change.

Tasilee · 2019-08-14T01:09:00Z

@chicoreus: I'm presuming you are suggesting the removal of all phrases like "dwc:xxx is not present"? It does in retrospect seem odd that we have defined EMPTY but not PRESENT.

chicoreus · 2019-08-14T15:39:56Z

@Tasilee Yes, i would very strongly advocate the removal of "is not present" everywhere it occurs. I would further advocate that we ensure that the definition of isEmpty()/EMPTY allows test implementations to treat the absence of a term in a single record as an empty value without making a distinction between the term being present with no value and the term being absent. This allows tests to be implemented as methods with signatures that take a list of terms (e.g. validationDayNotStandard(day, month, year), rather than as methods that take Darwin Core records and internally extract the terms from them (which is what we would be forced into by assessing the presence/absence of terms at the level of each test).

Presence/absence of terms related to the core in a MultiRecord feels like a distinct test related to only a particular subset of data preparation pipelines, and not being a question about the utility of the data for consumers (e.g. all research consumers would care about is the presence of values for dwc:eventDate, while year, month, and day are primarily concerns of upstream providers and aggregators attempting to map disparate schemas onto Darwin Core and then to aggregate disparate mappings onto a common provision of dwc:eventDate).

I will make two claims: (1) Including a test of terms being present as part of the assessment of emptyness of terms at the level of individual tests puts severe engineering constraints on the implementations of those tests (e.g. @tucotuco couldn't have assessed the difference between term present/absent and term value = null in his sql implementation of the subset of TIME tests). And (2) assessing whether terms are present or absent in a particular SingleRecord or MultiRecord is out of scope for the CORE tests as it relates to a subset of use cases around mapping the data that are out of scope of the set of uses for the data that we examined for CORE.

(1) Says we can define a separate test for presence of terms in a data serialization. (2), if accepted, says we can ignore this issue and just drop is present, (2) if rejected says we should define a separate test for the presence/absence of the information elements/darwin core terms we identified as the set relevant to the core tests.

tucotuco · 2019-08-14T15:49:59Z

I think it simplifies everything to include in our EMPTY definition the case where a term is not present and remove what would then be superfluous wording.

Tasilee · 2019-08-14T23:08:36Z

Thanks @chicoreus. I always need to read your comments 4 times to get my head around the concepts. Yes, you all know I am slow by now. Anyway, I think I agree with the conclusions and with @tucotuco. I will need to await clarification of the EXTERNAL_PREREQUISITES question before editing. I don't want to do two passes! Again. I'll leave the definitions to @ArthurChapman and @tucotuco

ArthurChapman · 2019-08-15T01:27:14Z

OK - I will attempt to change the definition of Empty in #152 (and in the BISS Paper)

ArthurChapman · 2019-08-15T01:30:14Z

I have changed the definition of EMPTY to:
A field that is either not present or does not contain any characters or values. Note: A field containing non-printing or other invalid characters or values (including NULL values) may be separately detected.

Tasilee · 2019-08-15T01:48:11Z

Do we need a note about @chicoreus comment on difficulty of implementing a concept of fields that are 'not present'

chicoreus · 2019-08-15T02:22:00Z

That is probably very worth putting into an informative document as part of the standard.

chicoreus · 2019-08-15T03:23:03Z

This issue is still conflating standardization (unambiguously interpreting text strings as integers) with a test of range (does the day exist for the given month and year).

We should have a test validation_day_outofrange(day, month, year) testing the latter (day exists for month and year), and this test #147 validation_day_notstandard(day) paired with #127 amendment_day_standardized(day) handling cases where values like "1st" can be unambigously interpreted as 1 (and for such, we should use the language unambigously interpreted instead of cast. Cast has a specific meaning in typed programing languages. @ArthurChapman somewhere mentioned cast of the string "1.240" representing a decimal number to the integer 1, which is likely to fail when presented with a european format "1,1240" string, and in many implementations of cast of the string to an integer will fail as the string doesn't contain an integer "1" might work, "1.0" is unlikely to. Actual implementations are likely to use parsing of numbers from strings rather than cast, and the expected case for any Darwin Core input values is as data typed as strings rather than as numeric types (given the absence of numeric types in the formal definitions, and the widespread presence of untyped values in csv files in Darwin Core Archives...)..

ArthurChapman · 2019-08-16T00:02:53Z

@chicoreus Check out #125 which was deprecated by @tucotuco . Are you suggesting it be resurected? Check out discussions there

chicoreus · 2019-08-16T02:53:56Z

We need both this and #125. The validation #125 is needed to test whether a value for day exists for the given month and year. It must treat non-integer values as internal prerequisites not met. This test validation #147 (which should take only dwc:day as a parameter and test whether the value of day is an integer) forms a couplet with the amendment #127 (which can propose the replacement of a non standard non-integer value with an integer (e.g. 2nd to 2)). This pairing of a validation and an amendment where the validation tests for exactly what the amendment can correct is critical to measuring improvement of quality. The test (#125) for the (not able to be corrected from these three fields alone, and thus not paired with an amendment) existence of a day value given the month and year is a distinct test, and we need both.

chicoreus · 2019-08-16T02:58:54Z

This test needs to be VALIDATION_DAY_NOTSTANDARD(dwc:day).
Test #125 needs to be VALIDATION_DAY_OUTOFRANGE(dwc:day, dwc:month: dwc:year). Another name for #125 is validation_day_existsformonthyear (not suggesting we adopt that, but it is less confusing).

ArthurChapman · 2019-08-16T04:40:50Z

In the Expected Response. we have replaced "cast" with either "ambiguously interpreted" or "interpreted" depending on contect

chicoreus · 2019-08-16T18:24:32Z

I propose (1) Removing year and month from the information elements. (2) changing the specification from:

INTERNAL_PREREQUISITES_NOT_MET if a) dwc:day is EMPTY, or b) dwc:day can not be unambiguously interpreted as an integer, or c) dwc:day can be interpreted as an integer between 29 and 31 inclusive and dwc:month can not be interpreted as an integer between 1 and 12, or d) dwc:month can be interpreted as the integer 2 and dwc:month can be interpreted as the integer 29 and dwc:year can not be unambiguously interpreted as a valid ISO 8601 year; COMPLIANT e) if the value of the field dwc:day can be interpreted as an integer between 1 and 28 inclusive, or f) dwc:day can be interpreted as an integer between 29 and 30 and dwc:month can be interpreted as one of (4,6,9,11), or g) dwc:day can be interpreted as an integer between 29 and 31 and dwc:month can be interpreted as one of (1,3,5,7,8,10,12), or h) dwc:day can be interpreted as the integer 29 and dwc:month can be interpreted as the integer 2 and dwc:year is a valid leap year (evenly divisible by 400 or (evenly divisible by 4 but not evenly divisible by 100)); otherwise NOT_COMPLIANT

to:

INTERNAL_PREREQUISITES_NOT_MET if dwc:day is EMPTY; COMPLIANT if the value of the field dwc:day is an integer in the between 1 and 31 inclusive; otherwise NOT_COMPLIANT.

chicoreus · 2019-08-16T18:31:41Z

Contrast proposed specification with #125 VALIDATION_DAY_OUTOFRANGE (which checks if the day exists given the month and the year), and compare with the parallel specification in the amendment #127 AMENDMENT_DAY_STANDARDIZED.

Tasilee · 2019-08-18T21:22:30Z

@chicoreus: Your logic makes sense. #125 is doing the complex range checking, while this one is detecting a simpler scenario. I'll edit the Expected Response accordingly as suggested.

chicoreus · 2019-08-19T15:01:25Z

Removing year and month from information elements. Minor edit of specification (s/II/I/ s/in the between/between) from:

IINTERNAL_PREREQUISITES_NOT_MET if dwc:day is EMPTY; COMPLIANT if the value of the field dwc:day is an integer in the between 1 and 31 inclusive; otherwise NOT_COMPLIANT.

to:

INTERNAL_PREREQUISITES_NOT_MET if dwc:day is EMPTY; COMPLIANT if the value of the field dwc:day is an integer between 1 and 31 inclusive; otherwise NOT_COMPLIANT.

…c:day based on latest changes in issues. DESCRIPTION: Implementation of validation day notstandard to match amendment_day_standardized testing only dwc:day and checking for integer in range 1-31, and validation_day_inrange to use day month year and return internal prerequisites not met on month or year only when those values are required to evaluate a day value (29-31).

tucotuco · 2020-06-16T18:21:12Z

@timrobertson100 Noted that this test still showed a dependency on year and month, which we explicitly removed after long deliberations. I have updated the Notes from

"This test must take into account the given month and year, if present, to account for leap years. This is part of a group of similar tests (#125, #130, #131)."

to

"This is part of a group of similar tests (#125, #130, #131)."

to reflect this.

…sts to current specifications. DESCRIPTION: Updating implementation of VALIDATION_STARTDAYOFYEAR_OUTOFRANGE to updated (2022-03-10) specification, adding and fixing unit tests to confirm with current specification. Removing instead of deprecating and using as wrapper previous startDayOfYearInRangeForYear() method, as it takes year instead of eventDate and can't conform to the current standard. Updating implementation and fixing unit tests for VALIDATION_DAY_NOTSTANDARD to conform with current specification.

ArthurChapman · 2023-06-27T02:37:56Z

Changed Note from:

"This is part of a group of similar tests (#125 #130, #131)."
to

"This is part of a group of similar tests (VALIDATION_DAY_INRANGE (8d787cb5-73e2-4c39-9cd1-67c7361dc02e), VALIDATION_STARTDAYOFYEAR_INRANGE )85803c7e-2a5a-42e1-b8d3-299a44cafc46, VALIDATION_ENDDAYOFYEAR_INRANGE (9a39d88c-7eee-46df-b32a-c109f9f81fb8))."

To removed internal GitHub links

Tasilee · 2023-09-18T05:25:18Z

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

ArthurChapman added TG2 TIME Conformance Validation labels Aug 26, 2018

chicoreus mentioned this issue Aug 30, 2018

TG2-VALIDATION_DAY_INRANGE #125

Open

Tasilee added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Aug 30, 2018

Tasilee added Test and removed Test labels Sep 25, 2018

Tasilee added responseDone and removed responseDone labels Oct 30, 2018

chicoreus added the NEEDS WORK label Mar 13, 2019

ArthurChapman mentioned this issue Mar 28, 2019

TG2- Use the word "EMPTY" instead of NULL and provide definition #111

Closed

chicoreus removed the NEEDS WORK label Aug 19, 2019

Tasilee mentioned this issue Jun 17, 2020

TG2 - Note Test dependencies and/or workflows #186

Open

chicoreus added a commit that referenced this issue Oct 1, 2020

Per issue #189 adding test data for #126 and #147.

b9c3bb6

Tasilee mentioned this issue Oct 4, 2020

TG2 - Test Data Framework #189

Closed

Tasilee changed the title ~~TG2-VALIDATION_DAY_NOTSTANDARD~~ TG2-VALIDATION_DAY_STANDARD Mar 22, 2022

ymgan mentioned this issue Feb 15, 2023

eventDate, year, month, day checks iobis/obis-qc#24

Open

17 tasks

chicoreus added the CODED label Jul 5, 2023

chicoreus added the CORE TG2 CORE tests label Sep 18, 2023

TG2-VALIDATION_DAY_STANDARD #147

TG2-VALIDATION_DAY_STANDARD #147

Comments

ArthurChapman commented Aug 26, 2018 • edited by Tasilee Loading

ArthurChapman commented Aug 30, 2018

chicoreus commented Aug 30, 2018

chicoreus commented Sep 7, 2018

chicoreus commented Mar 13, 2019

tucotuco commented Mar 26, 2019

Tasilee commented Mar 27, 2019

tucotuco commented Mar 27, 2019 via email

Tasilee commented Mar 27, 2019

ArthurChapman commented Mar 28, 2019 • edited Loading

Tasilee commented Mar 28, 2019

ArthurChapman commented Mar 28, 2019

Tasilee commented Mar 28, 2019

ArthurChapman commented Mar 28, 2019

chicoreus commented Mar 28, 2019

tucotuco commented Mar 28, 2019 via email

tucotuco commented Jun 12, 2019

Tasilee commented Aug 14, 2019

chicoreus commented Aug 14, 2019

tucotuco commented Aug 14, 2019

Tasilee commented Aug 14, 2019

ArthurChapman commented Aug 15, 2019

ArthurChapman commented Aug 15, 2019

Tasilee commented Aug 15, 2019

chicoreus commented Aug 15, 2019

chicoreus commented Aug 15, 2019

ArthurChapman commented Aug 16, 2019

chicoreus commented Aug 16, 2019

chicoreus commented Aug 16, 2019

ArthurChapman commented Aug 16, 2019

chicoreus commented Aug 16, 2019 • edited Loading

chicoreus commented Aug 16, 2019

Tasilee commented Aug 18, 2019

chicoreus commented Aug 19, 2019

tucotuco commented Jun 16, 2020

ArthurChapman commented Jun 27, 2023

Tasilee commented Sep 18, 2023

ArthurChapman commented Aug 26, 2018 •

edited by Tasilee

Loading

ArthurChapman commented Mar 28, 2019 •

edited

Loading

chicoreus commented Aug 16, 2019 •

edited

Loading