Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation page on supported date formats #882

Open
victorlin opened this issue Apr 5, 2022 · 5 comments
Open

Add documentation page on supported date formats #882

victorlin opened this issue Apr 5, 2022 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@victorlin
Copy link
Member

#740 improves the help text for filter's --min-date/--max-date. Similar documentation should also be provided for metadata dates, which is slightly different (e.g. no support for relative dates). Moving some things over from an old wiki page as a starter:

Overview

Augur supports a variety of date formats:

  • 2018 - Year only
  • 2018.23 - Numerical (floating point)
  • 2018-03 - Year, month (positive only)
  • 2018-03-25 - Year, month, day (ISO 8601 date) (positive only)
  • 2018-03-XX - Year, month, ambiguous day (positive only)
  • 2018-XX-XX - Year, ambiguous month, ambiguous day (positive only)

Generally, this comes down to flavors of numerical or (potentially incomplete) ISO dates.

Implementation

Internally, Augur stores dates in numerical format for the following reasons:

  1. Pre-historic dates (BC) are not supported by some implementations of ISO date. For example, Python’s own datetime.
  2. During initial implementation, dates needed to be numerical for some uses (e.g. timetree) and it was easier to just convert to numerical and treat them this way across the board.

Related discussions

@huddlej
Copy link
Contributor

huddlej commented Apr 19, 2022

This is a great idea! We should also consider linking out to examples of ISO 8601 dates, since users may not be familiar with this term (or will not know that they know the associated formats). Linking to the ISO 8601 calendar dates and durations sections on wikipedia would be fine for this.

@victorlin
Copy link
Member Author

victorlin commented May 6, 2022

Relatedly, I've done a bunch of date parsing work in #854 (see dates.py), but this has yet to be merged.

@j23414
Copy link
Contributor

j23414 commented May 6, 2022

Do we want to also support and document the following formats?

  • YYYY-MM (Day unknown)
  • YYYY (Month and day unknown)

Fauna's format_date function seems to process it, but I assume it will be superseded by augur's version. I could also see dropping any parenthetical strings "\s(\S.*)" as a pre-processing step and outside the scope.

@victorlin
Copy link
Member Author

@j23414 yeah, that would be 2018 and 2018-03 in the issue description examples.

The current support for those isn't in dates.py, but rather a hidden feature of augur filter's subsampling logic which only applies to the metadata date column during subsampling:

df_dates = metadata['date'].str.split('-', n=2, expand=True)

As a path to follow, we should aim to support the same date formats across different use cases via functions in dates.py.

@huddlej
Copy link
Contributor

huddlej commented May 6, 2022

One way to think about incomplete dates like YYYY and YYYY-MM would be to standardize/sanitize these to YYYY-XX-XX and YYYY-MM-XX, respectively, early in a workflow. This could be part of the proposed work in #860.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

3 participants