Deal better with approximate/uncertain dates #6

cogat · 2017-05-22T05:55:34Z

Currently, a fixed (precision-dependent) and arbitrary amount of fuzziness is added to approximate/uncertain dates. This is used to return a date range that can be used for filtering, but it's a pretty basic interpretation.

Properly, the level of uncertainty will be a curve that ramps continuously between 1.0 ("the given date is a certain match for the EDTF range") to 0.0 ("the given date is certainly not a match").

Open questions:

How are these fuzzy ramps defined? We might get some non-arbitrary results from crowdsourcing interpretations of whether a given date is 'circa' another date, etc. But we're likely to want to find a simple ramping equation.
Are curves significantly more useful than just padding a strict date range? It will allow us to return queries to "Show me works made around 1929" sorted by confidence. Is that a real-world use case?
How can we implement fuzzy matching in a db-efficient format?

koenedaele · 2017-07-12T11:27:33Z

Hi, thank you for this library. Still checking out EDTF, but it looks like it solves a lot of issues I've been having. I know other implementations exist, but our main backend language is Python (and this seems like one of the more complete implementations anyway).

I have done research on querying fuzzy time intervals that might be of interest, have a look at http://samm.univ-paris1.fr/IMG/pdf/paris2014.pdf or https://www.researchgate.net/publication/266750215_Modelling_Imperfect_Time_in_Datasets. If you can read Dutch, have a look at http://lib.ugent.be/fulltxt/RUG01/001/418/820/RUG01-001418820_2010_0001_AC.pdf. We have implemented this for postgresql in a fairly efficient way, see https://github.com/OnroerendErfgoed/pgSFTI for a native C implementation and https://github.com/koenedaele/pgFTI for a pure SQL implementation (based on Postgis).

When implementing solutions like these I have found that the main hindrance is user adoption. Something like fuzzy sets (what my work is based on) is very hard to understand to most art historians and archaeologists. But I'm thinking that capturing information as EDTF and then generating Fuzzy Time Intervals from them might work very well.

I have no good solution for translating from ca. 1905, to a fuzzy time interval. (Does that mean probably in 1905, but possibly in 1904 or 1906; or does that mean somewhere between 1900 and 1910 but probably more towards the middle). I think it's inherently contextual and hard to solve for eavery use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal better with approximate/uncertain dates #6

Deal better with approximate/uncertain dates #6

cogat commented May 22, 2017 •

edited

Loading

koenedaele commented Jul 12, 2017

Deal better with approximate/uncertain dates #6

Deal better with approximate/uncertain dates #6

Comments

cogat commented May 22, 2017 • edited Loading

koenedaele commented Jul 12, 2017

cogat commented May 22, 2017 •

edited

Loading