Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal better with approximate/uncertain dates #6

Open
cogat opened this issue May 22, 2017 · 1 comment
Open

Deal better with approximate/uncertain dates #6

cogat opened this issue May 22, 2017 · 1 comment

Comments

@cogat
Copy link
Contributor

cogat commented May 22, 2017

Currently, a fixed (precision-dependent) and arbitrary amount of fuzziness is added to approximate/uncertain dates. This is used to return a date range that can be used for filtering, but it's a pretty basic interpretation.

Properly, the level of uncertainty will be a curve that ramps continuously between 1.0 ("the given date is a certain match for the EDTF range") to 0.0 ("the given date is certainly not a match").

Open questions:

  • How are these fuzzy ramps defined? We might get some non-arbitrary results from crowdsourcing interpretations of whether a given date is 'circa' another date, etc. But we're likely to want to find a simple ramping equation.
  • Are curves significantly more useful than just padding a strict date range? It will allow us to return queries to "Show me works made around 1929" sorted by confidence. Is that a real-world use case?
  • How can we implement fuzzy matching in a db-efficient format?
@koenedaele
Copy link

Hi, thank you for this library. Still checking out EDTF, but it looks like it solves a lot of issues I've been having. I know other implementations exist, but our main backend language is Python (and this seems like one of the more complete implementations anyway).

I have done research on querying fuzzy time intervals that might be of interest, have a look at http://samm.univ-paris1.fr/IMG/pdf/paris2014.pdf or https://www.researchgate.net/publication/266750215_Modelling_Imperfect_Time_in_Datasets. If you can read Dutch, have a look at http://lib.ugent.be/fulltxt/RUG01/001/418/820/RUG01-001418820_2010_0001_AC.pdf. We have implemented this for postgresql in a fairly efficient way, see https://github.com/OnroerendErfgoed/pgSFTI for a native C implementation and https://github.com/koenedaele/pgFTI for a pure SQL implementation (based on Postgis).

When implementing solutions like these I have found that the main hindrance is user adoption. Something like fuzzy sets (what my work is based on) is very hard to understand to most art historians and archaeologists. But I'm thinking that capturing information as EDTF and then generating Fuzzy Time Intervals from them might work very well.

I have no good solution for translating from ca. 1905, to a fuzzy time interval. (Does that mean probably in 1905, but possibly in 1904 or 1906; or does that mean somewhere between 1900 and 1910 but probably more towards the middle). I think it's inherently contextual and hard to solve for eavery use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants