Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix date parsing #839

Merged
merged 3 commits into from
Jan 2, 2020
Merged

Fix date parsing #839

merged 3 commits into from
Jan 2, 2020

Conversation

jameshadfield
Copy link
Member

Problem summary

Recent Ebola work has highlighted some errors in how numerical dates are handled in augur & auspice. Specifically, an isolate had collection date of 2019-12-01 which was represented in auspice as 2019-11-30 due to errors converting between string & numerical date representations.

Context

Note that auspice only receives numerical dates via the dataset JSON, and that augur performs string -> numeric conversion using the TreeTime utility function numeric_date().

numeric_date() as written in Treetime v0.6* contained some bugs. Due to happy coincidence, @rneher made some changes to TreeTime in Treetime #98 today, so I didn’t have to 😄 These changes are reflected in Treetime v0.7* (I’m using commit d4450aac32cfafbb62492554d87e23ee4efa168f for this testing) which can be seen in the results below.

Auspice v2.2.0 contains bugs

The auspice implementation of calendarToNumeric (our utility function to convert between string & numeric formats) contained a few different bugs and produced erroneous results:

————————— 
Augur -> Treetime v0.6.4.1 -> Auspice (result matches input data?)  
————————— 
1899-07-01 -> 1899.498288843258 -> 1899-06-29 (false) 
2000-01-01 -> 2000.002737850787 -> 1999-12-31 (false) 
2016-03-01 -> 2016.167008898015 -> 2016-02-29 (false)
2019-01-01 -> 2019.002737850787 -> 2018-12-31 (false) 
2019-03-01 -> 2019.1642710472279 -> 2019-02-28 (false) 
2019-12-01 -> 2019.9171800136892 -> 2019-11-30 (false) 
2100-01-01 -> 2100.002737850787 -> 2099-12-31 (false)

Similarly, the conversion of calendar dates to numeric didn’t really match either version of numeric_date():

------------------- 
Calendar dates -> numeric (matches Treetime output at v0.6.4.1 // 0.7*?)
-------------------
1899-07-01 -> 1899.5018822724162 (false // false) 
2000-01-01 -> 2000.002737850787 (true // false) 
2016-03-01 -> 2016.167008898015 (true // false) 
2019-01-01 -> 2019.003422313484 (false // false) 
2019-03-01 -> 2019.1649555099248 (false // false) 
2019-12-01 -> 2019.917864476386 (false // false) 
2100-01-01 -> 2100.002737850787 (true // false)

(P.S. View the console to see this output, I’ll remove the commit printing this before merge.)

Code choices taken in this PR

Auspice has used a d3 time scale to convert between dates. While dates are notoriously hard to deal with, for the purposes of our use case (converting between string & numeric formats, and given that we know how TreeTime does it) it is prudent to do this ourselves instead of using external libraries.

Results from this PR:

-------------------
Augur -> Treetime v0.7 -> Auspice (match?)
-------------------
1899-07-01 -> 1899.4972602739726 -> 1899-07-01 (true) 
2000-01-01 -> 2000.0013661202186 -> 2000-01-01 (true) 
2016-03-01 -> 2016.1653005464482 -> 2016-03-01 (true) 
2019-01-01 -> 2019.0013698630137 -> 2019-01-01 (true) 
2019-03-01 -> 2019.16301369863 -> 2019-03-01 (true) 
2019-12-01 -> 2019.9164383561645 -> 2019-12-01 (true) 
2100-01-01 -> 2100.0013698630137 -> 2100-01-01 (true) 

-------------------
Augur -> Treetime v0.6.4.1 -> Auspice (match?)
------------------- 
1899-07-01 -> 1899.498288843258 -> 1899-07-01 (true) 
2000-01-01 -> 2000.002737850787 -> 2000-01-02 (false) 
2016-03-01 -> 2016.167008898015 -> 2016-03-02 (false) 
2019-01-01 -> 2019.002737850787 -> 2019-01-01 (true) 
2019-03-01 -> 2019.1642710472279 -> 2019-03-01 (true) 
2019-12-01 -> 2019.9171800136892 -> 2019-12-01 (true) 
2100-01-01 -> 2100.002737850787 -> 2100-01-01 (true) 

-------------------
Calendar dates -> numeric (matches Treetime output at v0.6.4.1 // 0.7*)
------------------- 
1899-07-01 -> 1899.4972602739726 (false // true) 
2000-01-01 -> 2000.0013661202186 (false // true) 
2016-03-01 -> 2016.1653005464482 (false // true)
2019-01-01 -> 2019.0013698630137 (false // true) 
2019-03-01 -> 2019.16301369863 (false // true)
2019-12-01 -> 2019.9164383561645 (false // true)
2100-01-01 -> 2100.0013698630137 (false // true)

Should this be merged?

From the results above, auspice (as per this PR) now matches TreeTime v0.7*. While it does not match TreeTime 0.6.4.1, which is used in the current augur release, neither does the current auspice release! This PR (as well as TreeTime v0.7) contain the “correct” implementations of numeric dates. I think this should be merged.

Note

We should consider not representing days over, say, 100 years ago as YYYY-MM-DD as this coveys a sense of accuracy which is misleading.

(remove before merge)

tmp
Rewrite this function to "undo" the `numeric_date` function as found in TreeTime v0.7*. This new functionality correctly restores dates when using `numeric_date` from TreeTime v0.7 (as expected) and restores 5/7 test dates converted using TreeTime 0.6.4.1 (current auspice master gets them all wrong).
Same motivations and outcomes as previous commit.
@jameshadfield jameshadfield requested a review from rneher December 23, 2019 05:12
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll open an issue in augur regarding an TreeTime update.

@jameshadfield jameshadfield merged commit af1e1a9 into master Jan 2, 2020
@jameshadfield jameshadfield deleted the dates-d3 branch January 2, 2020 20:51
@rneher
Copy link
Member

rneher commented Jan 11, 2020

For some strange reason, calendarToNumeric was called on a number for a non-timetree json that I just generated which results in a cryptic error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants