-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be (even) more lenient #5
Comments
Sounds good. However note that this is not covered by the spec. Refer to opening_hours.js which supports all of those things but will give proper warnings. |
Should this include a closing time of 00:00 as well as 24:00? I've noticed that in a number of nodes and I'm not sure if that is an error in the data, or if this parser isn't handling it properly. |
My comment from https://josm.openstreetmap.de/ticket/18140#comment:17
|
In principle Wd could be supported in a non-conflicting way, but it would need a clear definition (and even if that boils down to "the days considered as work days is defined by local convention". Because now the problem is that the only way to produce a OH grammar conforming output would be to transform it to a list of normal weekdays, so say for DACH: Wd -> Mo,Tu,We,Th,Fr or would that be Wd -> Mo,Tu,We,Th,Fr,Sa ? :-) |
Some ideas:
|
732edc6 adds some translations of "to" for use in non-strict mode. I've had a look at supporting 4 digit times, but it is a bad case of diminishing returns, ~10 changes in 160'000 examples and two of those would be regressions because of mix up with years. As this would need some fiddling around with semantic lookup ahead to fix, I think we shouldn't do this. |
What do you mean with
? |
It changed the parsing of ~10 of the values from our test set (there are more values that use 4 digit times but they continue to fail for other reasons). |
Ah well, isn't the problem with the test set, that it only includes unique values? For example, the opening hours In other words, the question is, how many opening hours use 4 digits? |
I don't think your argument really holds water, because in the end we are testing constructs, not individual values. And I would say that it is just as important to parse Mo-Fr 09:00-17:00 correctly as it is Mo-su 09:00-22:00. My argument was that handling 4 digit times causes issues with -legit- values with year values, and that handling illegal values in this case doesn't seem to be worth the effort because the gains are likely to be very small. |
Let's say there is a city that has 15514 shops and all of these have the opening hours The test set does not test constructs, but unique taggings. This is insofar a difference as testing for the construct, you'd need to test times like 0600-1200, 0601-1200, 0602-1200, 0603-1200 etc. |
If you want to argue semantics, then lets say "constructs in use", the goal was and is to parse all legal constructs. Given that enumerating all of them is not possible, testing all that are actually used (plus additional specific tests) is a reasonable place holder. Weighting is irrelevant as not parsing a legal construct is a failure regardless of it it occurs once or 1'000'000 times. But in any case I might actually add this in, because I've found some more cases which can be handled by this which might make the effort of avoiding trashing years worth while. |
I created this Kotlin script that parses every single instance of an opening hours value in OSM and reports the result. In Java, the code would look very similar. Maybe you are interested in this: https://gist.github.com/westnordost/b43d2e5100e913cdd2eb8d2d2eb2d296 With the current version of OpeningHoursParser, it outputs
|
I just noticed that the retrieved number of opening hours strings by that script is less than one fourth of the actual count of opening hours tagged. Probably just a temporary bug/issue with sophox. |
Potential further improvements could be made by consuming:
See https://github.com/simonpoole/OpeningHoursParser/blob/master/test-data/oh.txt-fail for list of strings that currently fail in non-strict mode.
The text was updated successfully, but these errors were encountered: