Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match and ignore certain characters in start & end dates #304

Closed
rwelty1889 opened this issue Nov 3, 2021 · 10 comments
Closed

match and ignore certain characters in start & end dates #304

rwelty1889 opened this issue Nov 3, 2021 · 10 comments

Comments

@rwelty1889
Copy link

What's your idea for a cool feature that would help you use OHM better.

this is a request for very limited support for some EDTF features.

  1. ignore trailing %, ~, and ? characters after dates (they represent approximation and uncertainty and the timeslider would probably do this anyway with full EDTF support

  2. ignore leading and trailing / on dates -
    they represent intervals where one end of the range is unknown and just ignoring them is fine for now.

this is part of experimentation with lifecycle stuff.

Current workarounds
i could just delete the characters but they are part of the data. i would have to put FIXME as reminders that this needs to be addressed.

Additional info
the three buildings that appear here in OHM do not exist in 1834, but are displayed because OHM does not currently know what to do with the start_date value

@batpad
Copy link

batpad commented Nov 3, 2021

@rwelty1889 thanks for the issue! We should absolutely aim to support approximation / uncertainty in dates and at least capture that data, even if we are unlikely to be able to correctly process those approximate dates correctly in the vector tiles pipeline, etc. right now.

Whilst saying that, I am very wary to start adding this kind of logic to the vector tile pipeline (cc @geohacker ).

My proposal here would be to add separate start_date_edtf and end_date_edtf tags to objects. I know this is likely a bit painful to do - perhaps there could be an editor feature to process start_date_edtf and automatically populate start_date for eg.?

We do need a pathway to be able to record date approximations so this is an important conversation to have - unfortunately, we will also need to balance the "needs of the renderer" at this point, and currently it is hard to support anything other than fixed start and end dates. Will let @geohacker comment a bit more on the feasibility of doing these string replacements - I'm not a huge fan of adding more complexity to the already slightly brittle SQL functions.

@rwelty1889 would you be open to considering the idea of separate tags for EDTF times?

Going to chew on this a bit more and discuss with @geohacker - we should definitely do something about this, just not certain what the best way forward here is. Thanks again for the ticket @rwelty1889 and happy to talk about this soon.

@rwelty1889
Copy link
Author

for now, i would consider using the _edtf versions of the tag side by side with the original versions, in order to preserve the symbols. in the long run, we need a coordinated approach that doesn't require duplication (or quasi duplication) of data in the tags. providing some sort of date filter between the date tags and the renderer is probably going to need to be done.

@rwelty1889
Copy link
Author

just for reference, the start_date page in the wiki documents a lot of stuff about uncertainty and imprecision which so far as i know is rarely if ever supported in OSM. it's also mostly ad hoc, where as EDTF is an open standard. this link contains my current thoughts and commentary on the issues: start_date issues

@danrademacher
Copy link
Member

related to #303

@danrademacher
Copy link
Member

@batpad I wonder if this is something we could add to our paddate functionality. I guess that's where things get brittle, but we already guess at months and days when they are absent, in translating from start_date to state_decdate. It might be possible to enhance that transformation without adding much brittleness to the system. I think this would work for characters like ~ where ommitting them leaves a compatible date string, but not for .. in between years or leading \ characters, where it would get pretty hard to guess.

Also noting this is related to #15

@rwelty1889
Copy link
Author

what i asked for in this ticket is minimal. another project i have started but have limited time for is a parser for levels 0 and 1 of EDTF, which i'm writing in ANTLR, meaning that it can target several different implementation languages. that's something that i'll put out there under a reasonably permissive open source license when i get it working, probably a 3 clause BSD or something like that.

@danrademacher
Copy link
Member

I see on your Wiki that you list existing tags. Is that from an export or other systematic search or just anecdotal? Might not matter. Just curious.

What I am wondering is if instead of targeting specific strings, we could REGEX out only numbers and dashes, ignore everything else, and get better results. This would at least not lead to a whole series of string replacements in our vector tile code. But it would also apply to everything, whether or not it improved the output

I looked at your list and classified them by that:

current tag post-filter Result OK?
~1855 1855 Yes
1860s 1860 Yes
~1940s 1940 Yes
480 BC 480 No, needs to be negated
before 1855 1855 No, but would appear at a date
before 1910-01-20 1910-01-20 No, but would appear at a date
after 1823 1823 No, but would appear at a date
C18 18 No, would appear at Year 18
mid C14 14 No, would appear at Year 14
late 1920s 1920 No, but would appear at a date
~C13 13 No, would appear at Year 13
1914..1918 19141918 No, would maybe appear 2 million years in future
2008-08-08..2008-08-24 2008-08-082008-08-24 No, would be ignored
mid C17..late C17 1717 No, would appear 1717

So a few places where things are helped and many more where they are not. so that means a case-by-case filtering. Ideally informed by TagInfo (forthcoming) we could prioritize parsing those formats that are most common in the existing data.

@batpad I'm keen to hear more about your concerns on brittleness in the date functions. My hope has been that we could incrementally add intelligence to our date functions that might start simple with ~1855=1855 but then progressively expand to cover more cases, like 1914..1918=1916, but always resulting in a single start_decdate and single end_decdate

@rwelty1889
Copy link
Author

you did this test on the current tagging; i'm proposing abandoning that part of start_date in favor of EDTF which is the second column in that table. current tagging is mostly ad hoc when it comes to uncertainty, where as EDTF is standards based.

@1ec5
Copy link
Member

1ec5 commented Sep 21, 2022

My proposal here would be to add separate start_date_edtf and end_date_edtf tags to objects. I know this is likely a bit painful to do - perhaps there could be an editor feature to process start_date_edtf and automatically populate start_date for eg.?

The wiki had been recommending start_date_edtf and end_date_edtf, but start_date:edtf and end_date:edtf are much more common. I edited the wiki to use colons instead.

edtf

@1ec5
Copy link
Member

1ec5 commented Mar 21, 2024

I don’t think we should implement the workaround proposed in #304 (comment). The start_date:edtf and end_date:edtf keys are documented and well-established at this point. iD validates start_date and end_date according to the basic YYYY-MM-DD format and start_date:edtf and end_date:edtf according to EDTF.

In the future, if we want to fold these subkeys back into start_date and end_date, then we should first implement proper EDTF support in PostgresQL, based on either @rwelty1889’s EDTF-Parser one of the existing libraries in other languages. This would enable the tiles to indicate the minimum and maximum possible start/end dates of a feature, so that a stylesheet could vary its opacity or some other property accordingly.

/ref #15 (comment)

@1ec5 1ec5 closed this as completed Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants