Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Dates #698

Closed
wants to merge 8 commits into from
Closed

WIP: Dates #698

wants to merge 8 commits into from

Conversation

aviks
Copy link
Member

@aviks aviks commented Apr 10, 2012

So here is an initial support for Date and DateTime objects in Julia. Currently implemented is date arithmetic.


julia> a=now()
10 Apr 2012 16:21:34.488 BST

julia> typeof(a)
DateTime
Methods for generic function DateTime
DateTime(Integer,Integer,Integer,Integer,Integer,Integer,Integer,Integer,Integer,String)

julia> b=convert(Date, a)
10 Apr 2012

julia> c=Date(2012, 4,9)
9 Apr 2012

julia> b-c
1

julia> c+1 == b
true

julia> yday(c)
100

julia> day_of_week(c)
2

There are more things to do of course. First is to implement similar date arithmetic on DateTime. The big one however is date formats and parsing... that's a large piece of work by itself. However, the current functionality is self contained and useful.

There are quite a few tests in tests/date.jl

I've had to read a struct off libc, and so there is some simple but dirty looking C code in support/timefuncs.c/h. Not sure if that is the best way.

Let me know what you guys think about all this.

@aviks
Copy link
Member Author

aviks commented Apr 10, 2012

Couple of further comments, for the record

  • Most of this functionality is inspired by Ruby, but a bit simplified
  • We only support the Gregorian Calendar

@HarlanH
Copy link
Contributor

HarlanH commented Apr 10, 2012

I know that @StefanKarpinski has strong opinions about how dates should be dealt with. All I know is that it's a big hairy mess, and that people like the representations used by Joda Time in Java and Lubridate in R. In particular, they make distinctions among instants, intervals, durations, and periods, which can be incredibly useful in dealing with relative date/time offsets, descriptions of time slices, arithmetic on time objects, etc. For statisticians and people in finance, this is incredibly important functionality.

I'm super-glad someone's interested enough to work on this and contribute. I don't think borrowing C's date/time classes is a good idea at all.

@StefanKarpinski
Copy link
Member

This is great functionality to have. My only strong opinion, actually, is that I think that seconds since the epoch as a Float64 value is a great underlying representation for timestamps. The main reason being that operations on them are super easy and intuitive:

  1. subtract two timestamps to get an interval
  2. add an interval to a timestamp to get another timestamp
  3. there is a single canonical meaning (interpretation is a different matter)
  4. it scales in a very pleasant way:
    • close to the present, it has good resolution: eps(time()) => 2.384185791015625e-7
    • it can represent time well into the past and future, albeit with less resolution

(For more time resolution, using a Float128 might be desirable — 2e-7 isn't really amazing resolution.)

To elaborate on the third point, what I mean is that a single floating-point-since-the-epoch value has a completely unambiguous meaning: the instant when that amount of time has passed since Jan 1st, 1970 UTC. Now interpreting that moment is a different matter and can be ambiguous. That's where different calendar systems give you different results. But at least there's an unambiguous underlying time that is pinpointed.

All that being said, I'm by no means an expert on representing and working with times.

@StefanKarpinski
Copy link
Member

In fact, I can see having times be parameterized by any real type:

type Time{T<:Real}
  time::T
end

type Interval{T<:Real}
  interval::T
end

You could have Time{Int64} if you want second-resolution timestamps, Time{Float64} for 64-bit floating-point timestamps, Time{Rational{Int64}} for rational timestamps, Time{BigInt} of Time{BigFloat} for doing computations way into the future, Time{Float128} for 128-bit floating-point timestamps. Defining operations could just lean on the existing arithmetic promotions:

-(t1::Time, t2::Time) = Time(t1.time-t2.time)

@HarlanH
Copy link
Contributor

HarlanH commented Apr 10, 2012

Another library to consider stealing from would be the NumPy datetime64 data type: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html I think this is basically what Stefan is proposing, at first glance.

This said, it's extremely useful to be able to say "January 12th", without reference to a year, as you can in Joda Time/Lubridate, or an interval of "4 months", which works properly when added/subtracted to a timestamp, or a duration which is a pair of timestamps. It may well be possible to leverage these concepts on top of a representation like that that's being suggested, I don't know.

@aviks
Copy link
Member Author

aviks commented Apr 11, 2012

@HarlanH So this current implementation is essentially an instance in JodaTime terms. It supports one duration unit: of a day. Therefore, I think other units and classes can be layered over it.

I want to do a financial analytics library in Julia, which cant obviously cant be done without dates, hence this effort. I think I have all the primitives I need to now implement a business calendar

@StefanKarpinski I'd started with the idea of storing dates and times as milliseconds since epoch. Which is how java and ruby's Time class do it. However, the conversions to and from civil dates turn out to be much more involved. My implementation stores a canonical representation as a days (and fraction of a day for times.. coming soon) since julian epoch, which is 1/Jan/-4712 . You can implement exactly the same API using milli(micro(nano))seconds since 1/1/1970, including pretty much all functionality of the python impl HarlanH refers to above. Anyways, the point is, a single number is the canonical representation. Everything else (except possibly a timezone attribute) that's stored in composite types is effectively a cache for performance, and can be regenrated in constant time.

Storing julian days as float64 will give us a resolution of 1e-11 seconds

@pao
Copy link
Member

pao commented Apr 11, 2012

If it's relatively cheap to parametrize the type, it would be nice to have that option. Float64 should be enough even for GPS models (trusting your analysis), though I could imagine a stochastic GPS model potentially wanting more digits to play with to ensure a smooth distribution.

@aviks
Copy link
Member Author

aviks commented Apr 23, 2012

Just as a heads up, a new version should be ready any time now. Not had much time last week, but back on it now.

@aviks
Copy link
Member Author

aviks commented Apr 26, 2012

DateTimes are now parameterised by the type of the julian day number they store internally. Dates are an alias to DateTime{Integer}.

Precision for DateTime{Float64} is about 10E-5 . So you get at least millisecond accuracy.

Comments welcome.

type DateTime{T<:Real}
jd::T
off::Float
zone::ASCIIString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make these packable in arrays, could time-zones be a fixed-length character buffer instead of an arbitrary string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. A char[3] should do. The zone is actually only a cosmetic field, never used for computations. The offset is the canonical info, but its not enough unfortunately to display the zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are fewer than 256 timezones, so a Uint8 offset into an array would be best (an immutable one if we had those).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be possible to express timezone offsets as ±15-minute increments, ranging from -48 for UTC-12:00 to +56 for UTC+14:00. That's only 103 possible values, so it can be represented with an Int8 value easily. This avoids needing a lookup table and seems likely to make computation with timezones a lot easier and more immediate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timezone is actually stored in the off:Float field, as a day fraction.

The zone field was meant to store the string zone abbreviation, since there could potentially be a one to many relationship between time offsets, and named zones.

Thinking some more about it, i think this is a mixing of concerns, the date object should not care what the human readable zone it. It should be a concern for the data formatting routines. I am therefore minded to remove the string zone field altogether from the type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oo. Using an entire 8-byte float to store the time-zone seems excessive. Currently all values are stored in composite types as pointers to heap-allocated values, but the fact that offset is of an abstract type forces that representation. Once Jeff and I get around to implementing immutability of composite types, a field with a concrete type like Float64 or Int8 could be stored in the composite structure. Consider tz::Int8 instead that is the timezone offset from UTC in 15-minute units. Also, making something like a timezone offset subject to the complexities of floating-point arithmetic seems like asking for all kinds of trouble.

@JeffreySarnoff
Copy link
Contributor

a few lessons learned making accurate, high performance datetime stuff

(1) Users care about accuracy, speed and resolution

For important classes of apps (e.g. financial market analysis), it matters that datetime arithmetic is accurate and fast [no, faster].
For other apps (e.g. LAN packet analysis, high energy physics), it matters that datetime arithmetic is accurate, fast, and high resolution [no, higher].

(2) Nobody has requested sub-second resolution with dates before 1900. The astrophysics community has found that it takes a pair of 64-bit values to work with Julian dates at current levels of accuracy. GPS time signals are accurate to within (less than) 32 nanoseconds.

If representation compatible, having a long span datetime and higher resolution modern datetime is one good approach.

(3) With time-of-day, timezones matter (even when they are elided) and their robust, portable use will rely on iana's time zone database. Daylight savings time means that once a year, an hour is traversed twice -- that ambiguity should be resolved (by user input, perhaps with a default), not presumed.

UTC is as a 'common denominator' when working with timezones. Multiple conversions are much less error-prone when storing datetimes as, say, (UTC, timechange, isDST, zone) and converting to local time when show()ing it.

Indexing each timezone in the iana database requires 9 bits. Encoding timechange to/from UT in 15min increments requres another 8 bits (9 if timechange is for Standard Time and a separate bit is used to reflect DST -- a more flexible representation). An implementation, might use 16 msbs for maskable timezone, and 16 lsbs for timechange in, say, 4sec units + DST bit.

(4) Leap seconds exist (starting in 1972, may be discontinued 2016).

Most datetime systems ignore them for speed, which is ok only if the user does not care when UTC times are off by a second and the count of seconds from 1972 through 2012 is off by 25.

(5) While floating point is often used, integer types/structures with proper algorithms are safer, faster, and more robust for datetime representation.

(6) datetime users think of timepoints as moments and as boundriess; they think of, timespans (durations), and temporal intervals arithmetically.

It helps if the datetime implementation is designed for clean and snappy support of such types [of the sort used in the R package lubradate, JSR-310 has another take].

@StefanKarpinski
Copy link
Member

@JeffreySarnoff: this is an incredibly useful list of observations. Thank you so much. Is there a solid C library that implements what you describe sufficiently that we could wrap? Do you consider lubridate a good example to look at? What do they get wrong and right? What about other date/time libraries?

@JeffreySarnoff
Copy link
Contributor

@StefanKarpinski: (thanks) There are available libs that work, but not in the way suggested, Lubridate is full of solutions to prior R date/time hiccups and other date ops, in addition to the type stuff. Rather than send you there, I am happy to write more focused design notes over the next days: is there some place to deposit a pdf?

@StefanKarpinski
Copy link
Member

That would be great. The wiki would be a good place for that sort of thing although I'm not entirely sure how to put images up there. Maybe just email a PDF document to the dev list?

@nolta
Copy link
Member

nolta commented May 5, 2012

Let me also commend Jeff on his excellent post.

The C libraries i'm aware of are:

  • SOFA, for accurate conversion between "astronomer" time (e.g., UTC, UT1) and "physicist" time (e.g., TAI, GPS). Properly handles leap seconds (and even smaller corrections ;).
  • ICU4C, for calendar calculations and string formatting.

@JeffreySarnoff
Copy link
Contributor

@nolta: (thanks) I had not seen ICU, good to know; SOFA is an old friend.

@StefanKarpinski
Copy link
Member

SOFA is kind of terrifying.

@StefanKarpinski
Copy link
Member

Not that it looks bad or anything, it just does sooo much.

@JeffreySarnoff
Copy link
Contributor

I was too terse. SOFA is wholly inappropriate for Julia's core datetime functionality. I just meant that it is known to me and it is solid: I have used some of their routines to check other work, including spherical trigonometry.

... no worries ... toward the end of the week ... my comment will start: Time for Julia

@JeffreySarnoff
Copy link
Contributor

Julia's datetime encapsulates calendar, clock, microtimer and time zone information. Encoding compactly in C, that information exceeds 64 bits but does not exceed 128 (integer) bits.

(is this accurate)
Exposing the UInt128 and Int128 types would allow Julia to see and the C. Such type-specialized matching of equivalent bitwidth+type should allow Julia to receive C generated instance valuations transparently.
(if that is so ..)

This should enhance seamless handling of datetime-typed vectors when working with long timeseries. And would create flexibility -- it then becomes easy for Julia to give itself a package like R's zoo, xts.

The datetime realization would use the same C libraries' routines with or without UInt128 and Int128. Absent those types, we have a less facile facility; some of the higher level stuff is best done in Julia and requires the representation handshake to 'just work' seamlessly.

@StefanKarpinski
Copy link
Member

So it seems like this is a real-life use case for 128-bit integers? @JeffBezanson and I had discussed that at some point and were unable to come up with cases where 64 bits wasn't enough or it wouldn't be more appropriate to just use arbitrary precision integers. I'd still like to allow the programmer to choose how much precision they want, but having the option to use an Int128 for timestamps seems like a good thing. I once tried to add Int128 support to Julia but the effort stalled out. Might be time to try again.

@JeffreySarnoff
Copy link
Contributor

Time for Julia [a]

(a) do more by doing less

I have re-reviewed some accessible datetime libraries. After dismissing the problematic and the more restrictively licensed, and disqualifying some good ones because they were not C, FORTRAN or C++ libraries, few remained. Use of the IANA Time Zone database is crucial. Available APIs with that facility do not support leap seconds. Julia has a better time with both capablities.

The most reliable and well-maintained Julia-wrappable datetime API is within ICU (kudos to @nolta). Fortuitously, it provides an encoding which fits well with my proposal. Given (date, time, region) ICU4C would deliver (GMT datetime, timezone), that would be converted into (UTC datetime, timezone index) [leap seconds, fast lookup] then gently folded into a box. To show() it unfold, unleap, show(ccall(ICU)).

The ICU datetime API knows about regions and instants. Offering capabilities found in Joda-Time and Lubridate requires that and more: giving Julia facile understanding of regions, instants, intervals, durations and granules.

For some interested party, other parts of ICU are useful in making an internationalization package. "ICU is a cross-platform Unicode based globalization library. It includes support for locale-sensitive string comparison, date/time/number/currency/message formatting, text boundary detection, character set conversion and so on." - FAQ.

ICU http://site.icu-project.org/, License http://source.icu-project.org/repos/icu/icu/trunk/license.html,
ICU4C http://site.icu-project.org/download/49#TOC-ICU4C-Download.

@JeffreySarnoff
Copy link
Contributor

Time for Julia [b]

(b) links for Lubridate, Joda-Time, JSR-310:

Joda-Time brought its community a better class logic and mellowness. Lubridate co-opted some of that, broadened arithmetic ops and offered R-centric smoothness. This development arc continues with the work Stephen Colebourne is doing on JSR-310, A new Date and Time API for JAVA. Even in pre-release form it is a clear step up. Read "Why JSR-310 isn't Joda-Time": http://blog.joda.org/2009/11/why-jsr-310-isn-joda-time_4941.html

For more on Joda-Time and Lubridate: http://joda-time.sourceforge.net/userguide.html, http://www.jstatsoft.org/v40/i03/paper, http://cran.r-project.org/web/packages/lubridate/lubridate.pdf.
For more on JSR-310: http://londonjavacommunity.wordpress.com/2011/08/17/about-jsr-310-a-new-java-datetime-api/, http://iweb.dl.sourceforge.net/project/threeten/presentations/JSR-310-J108-ReviewAdjusted1.pdf, http://threeten.sourceforge.net/apidocs-2012-04-24/.

@JeffreySarnoff
Copy link
Contributor

Time for Julia [c]

(c) partial solution, for the most part

Some Time Measure Terms

Proleptic: The calendar system obtained by projecting a given calendar system back in time past its actual inception. For example, The Gregorian Calendar was first adopted in 1582, but the proleptic Gregorian calendar can be used to indicate earlier dates. To match such an earlier date with whatever calendar was in use at that earlier time, a conversion is required.

Local Time: The time read from an accurate wall clock.

Local Standard Time: Local Time when Daylight Savings Time is not in effect.

Local Daylight Time: Local Standard Time adjusted by Daylight Saving offset when Daylight Saving Time is in effect.

Timezone: A region where Local Standard Time is obtained by given offset from UTC and wherein Daylight Saving Time, if used, is in effect over given dates.

Second: The SI Second is a consistently measurable duration, roughly equal to 1/86400 of the [inexactly defined] mean solar day. Formally, it is the duration of 9,192,631,770 periods of radiation corresponding to the transition between two hyperfine levels of the ground state of the cesium 133 atom at 0K.
ref: http://www.bipm.org/en/si/si_brochure/chapter2/2-1/second.html

TAI (International Atomic Time): A time scale with unit interval exactly equal to 1 SI second. First measured in 1955, it uses 1958-Jan-01 for an origin. As each step of the TAI clock corresponds to a duration of 1 SI second, TAI does not reflect leapseconds. "It is recommended by the BIPM that systems which cannot handle leapseconds use TAI instead."
ref: http://www.usno.navy.mil/USNO/time/master-clock/leap-seconds

UTC (Coordinated Universal Time): The internationally adopted timebase from which Local Standard Time (and so, Local Daylight Time) is determined. It incorporates leapseconds. Introduced on 1972-Jan-01, from it differs from TAI by an integral number of seconds thence forward (proleptic use requires formulae for 1961-1971).
refs: http://hpiers.obspm.fr/eoppc/bul/bulc/BULLETINC.GUIDE, http://timeanddate.com/time/aboututc.html, http://hpiers.obspm.fr/eoppc/bul/bulc/UTC-TAI.history


These categories of timescale+resolution cover the uses I have seen:
(1) now +/- some centuries, in microseconds [common scale], (2) now +/- some millennia, in seconds [coarse scale], (3) now +/- some decades, in nanoseconds [fine scale], (4) now +/- some years, in picoseconds [maxres scale, specialty]

(perhaps interfacing to an updated, resolution customized version of libtai http://cr.yp.to/libtai.html with a separate interface to the timezone database is worth a look)

If the Julia community feel it important to have a flavor of datetime that performs as fast as possible, that is available for the 'common scale'. Doing so would entail having at least one other (probably two) specializations of generic datetime. Alternatively, one may subsume the four timescale categories in a single type of datetime, each instance using a larger multi-field structure.

One way of realizing the 'common scale' (a specialization of datetime) follows.
...

The Gregorian calendar has a largest natural cycle of 400 years. This cycle (re)started on 1600-Jan-01 and 2000-Jan-01 (and will restart on 2400-Jan-01). It is advantageous to work with this cycle, as it allows more coherence through lower level routines. For 'common scale', covering the two cycles surrounding 2000 offers sufficient breadth. Restricting the resolution to microseconds offers sufficient resolution. There are fewer than 600 timezone identifiers.

The Gregorian calendar has a 400 year cycle of 146097 days. Cardinally labeled as sequential daynumbers, to identify each day in 400 years uses 17 bits. The near cycles are: 1600..1999, 2000..2399. Together they cover 292194 days; day labeling 800 years uses 18 bits. It takes 19 bits to label each of the million microseconds within one second; it takes 29 bits to label each of the billion nanoseconds in a second. A single day has 86400|1 seconds, so 36 bits will label a day of microseconds.

The microsecond count must correspond to TAI or UTC (not local time). Local time is determined with UTC offset by timezone indexed lookup to ascertain offset from UTC for Local Time. Covering 292194 days of microseconds requires 54 bits. Allocating 9 bits to hold unique timezone index numbers would suffice. So, 63 bits allow representing the years 1600-2399 in microseconds, with timezone indices.

With the timezone index in the lower order bits, 'common scale' datetimes over different regions would sort correctly (not necesarily a stable sort) without premasking, but safely using the microsecond count requires pre-shifting. With the microsecond count in the lower order bits, masking out the timezone index is forced for most operations -- which may be safer.

@JeffreySarnoff
Copy link
Contributor

A reliable, more current calendar&clock, available-for-use library (needs a separate interface to the IANA timezone database) is http://code.google.com/p/cdatecalc/

@StefanKarpinski
Copy link
Member

Ok, I'm just trying to make sure I understand the issues here. Can you give an example of how two valid UTC strings can map to the same POSIX time?

@StefanKarpinski
Copy link
Member

This is relevant, of course: http://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_number

@nolta
Copy link
Member

nolta commented Dec 1, 2012

Like i said, take a look at the 2nd table, particularly the bold entries.

@StefanKarpinski
Copy link
Member

Right, ok. So it seems like converting from my proposed Time format to POSIX time would be non-trivial since you'd have to deal with leap seconds. Going from Time values to POSIX would be surjective but not injective and going from POSIX times to Time values would involve choosing between two valid Time values when the second happened to be a leap second. It seems like the only thing that can be done and other than just ignoring leap seconds like NumPy does, which just doesn't sit right with me.

@StefanKarpinski
Copy link
Member

Sorry about not seeing your comment before posting the same link, Mike. Once I refreshed the page I saw it.

@JeffreySarnoff
Copy link
Contributor

does just using a DateTime with sufficiently precise resolution give you what you want?

No -- although for floating point calcs, using ~2.390625 x the desired results' bitwidth is a really good heuristic.

The 64bit-limited version of our internal timebase is not of the same nature as the representations we may fully support and others with which we may interwork. OTOH all of these considerations are obviated with the 128bit type for most common uses (timing results from CERN over the life of some project is one exception, long-term ephemeris development from multisourced observations is another).

@JeffreySarnoff
Copy link
Contributor

POSIX time is profoundly mucked up -- requires waders. The first POSIX
conversion must be us-->POSIX, then the right way to handle POSIX-->us would,
with pain, become clear.

On Sat, Dec 1, 2012 at 5:42 PM, Stefan Karpinski
[email protected]:

Sorry about not seeing your comment before posting the same link, Mike.
Once I refreshed the page I saw it.


Reply to this email directly or view it on GitHubhttps://github.com//pull/698#issuecomment-10923253.

@StefanKarpinski
Copy link
Member

Quoting @JeffreySarnoff since his post is getting mangled:

does just using a DateTime with sufficiently precise resolution give you what you want?

No -- although for floating point calcs, using ~2.390625 x the desired results' bitwidth is a really good heuristic.

The 64bit-limited version of our internal timebase is not of the same nature as the representations we may fully support and others with which we may interwork. OTOH all of these considerations are obviated with the 128bit type for most common uses (timing results from CERN over the life of some project is one exception, long-term ephemeris development from multisourced observations is another).

I would very much like to have a design that could, e.g., be used for things like analyzing CERN data. I'm willing to go 128-bit, if that's what it takes. 128 bits seems like it should be enough for anything sane. I'm still not understanding this rounding/precision issue at all.

@JeffreySarnoff
Copy link
Contributor

do go 128 bits -- there is much that is nicer for users and it saves having
parallel code

The rounding/precision is not an issue that applies to many cases of any
given mapping and unmapping. For all but the most simply related sorts of
time, there are transformations where some few specific values cause
trouble; in such circumstances the value first given will not be
recovered exactly
without proactive coupling of arithmetic mannerisms with a few bits to
realize an essentially autocorrecting bidirectional map.

On Sat, Dec 1, 2012 at 5:57 PM, Stefan Karpinski
[email protected]:

Quoting @JeffreySarnoff https://github.com/JeffreySarnoff since his
post is getting mangled:

does just using a DateTime with sufficiently precise resolution give you
what you want?

No -- although for floating point calcs, using ~2.390625 x the desired
results' bitwidth is a really good heuristic.

The 64bit-limited version of our internal timebase is not of the same
nature as the representations we may fully support and others with which we
may interwork. OTOH all of these considerations are obviated with the
128bit type for most common uses (timing results from CERN over the life of
some project is one exception, long-term ephemeris development from
multisourced observations is another).

I would very much like to have a design that could, e.g., be used for
things like analyzing CERN data. I'm willing to go 128-bit, if that's what
it takes. 128 bits seems like it should be enough for anything sane. I'm
still not understanding this rounding/precision issue at all.


Reply to this email directly or view it on GitHubhttps://github.com//pull/698#issuecomment-10923383.

@JeffreySarnoff
Copy link
Contributor

Consider using +/- (2^111 - 1), reserving the remaining most
significant 16 bits
for now.
That allows us to represent the age of the universe in atomic units of time
( [image: Inline image 1] ).

On Sat, Dec 1, 2012 at 6:15 PM, Jeffrey Sarnoff
[email protected]:

do go 128 bits -- there is much that is nicer for users and it saves
having parallel code

The rounding/precision is not an issue that applies to many cases of any
given mapping and unmapping. For all but the most simply related sorts
of time, there are transformations where some few specific values cause
trouble; in such circumstances the value first given will not be recovered
exactly without proactive coupling of arithmetic mannerisms with a few bits
to realize an essentially autocorrecting bidirectional map.

On Sat, Dec 1, 2012 at 5:57 PM, Stefan Karpinski <[email protected]

wrote:

Quoting @JeffreySarnoff https://github.com/JeffreySarnoff since his
post is getting mangled:

does just using a DateTime with sufficiently precise resolution give
you what you want?

No -- although for floating point calcs, using ~2.390625 x the desired
results' bitwidth is a really good heuristic.

The 64bit-limited version of our internal timebase is not of the same
nature as the representations we may fully support and others with which we
may interwork. OTOH all of these considerations are obviated with the
128bit type for most common uses (timing results from CERN over the life of
some project is one exception, long-term ephemeris development from
multisourced observations is another).

I would very much like to have a design that could, e.g., be used for
things like analyzing CERN data. I'm willing to go 128-bit, if that's what
it takes. 128 bits seems like it should be enough for anything sane. I'm
still not understanding this rounding/precision issue at all.


Reply to this email directly or view it on GitHubhttps://github.com//pull/698#issuecomment-10923383.

@StefanKarpinski
Copy link
Member

I'm pretty weirded out by having 16 bits that we're not doing anything with? What are we going to stuff in there?

@JeffreySarnoff
Copy link
Contributor

And slightly akimbo, did not view it that way -- an app is not directly accessing the significance of those bits.
Julia developers may put them to good purpose, and I thought it reasonable to protect that opportunity.

@JeffreySarnoff
Copy link
Contributor

It would be very helpful to have minimal examples of best practice when defining a bitstype that is a subtype of this one.
I have found parameterless bitsubtypes [with respect abstract supertypes] somewhat more approachable ... still, when stemming from Signed or from a new abstraction with three intermediate abstract types getting everything working correctly seems more involved than expected.

Frequent trial and more frequent error has dogged my placing of or omitting of the parameter. It has been more an effort in permutation counting than understanding the whys and wherefores of explicit vs implicitly given at each possible location within a function signature.

Seeing the proper approach to dependency ordering for multi-parameterized bitstype dispatch-specific definitions of reinterpret, convert, promote_type, show and [what are the others that should be part of the mix?] would be great.

@nolta
Copy link
Member

nolta commented Dec 2, 2012

So what's the proposal here? That we use TAI as the internal representation? TAI doesn't exist in any form before 1955. How do we represent earlier dates? Do we instead switch to terrestrial time (TT), of which TAI is a realization? Since people usually specify dates as UT, we'd have to apply a ΔT = TT-UT correction.

@JeffreySarnoff
Copy link
Contributor

Like TAI's absence from history before 195x, the Gregorian calendar is absent from history before 156x and then remained unadopted everywhere for another ~20 years. The Julian calendar is used only irregularly before 6 CE, and is wholly absent before ~360 BCE (as I recollect). Established calendric frames with desirable properties are not historically covering. For an internal representation, we want the sands of time to be as glass, fused rather than occasionally shifting. [Allegorically, forest rescue teams advise staying put gives them the best chance to find someone. Adopting an internal representation that 'stays put' (is used in a proleptic manner) will allow easier coding of intercoversions and intrinsically squash some potential inexactness

The first way is to keep the internally monotonic and uniform day|second|femtosecond|Planck-time counter riven to the SI second (which itself may be redefined in a few years) and, therewith the TAI day of 86_400 SI seconds. Use proleptic astronomical [i.e. year number zero exists] Gregorian rules to obtain an internally consistent, perfectly unwindable internal labeling of year month day with SI-seconds+subsecond-tocs of the day.

The second way is to obtain best-in-show DeltaT values from NASA's polynomials (available for years after -2000) http://eclipse.gsfc.nasa.gov/SEcat5/deltatpoly.html and let TAI slip backwards as TT when internalizing time. Externalizing such time uses piecewise rational approximations to invert the quartic, quintic .. polynomials.

@nolta
Copy link
Member

nolta commented Dec 2, 2012

Could you explain what your "first way" in more detail?

@JeffreySarnoff
Copy link
Contributor

Some simplification does not change the What and Why of the first way.

The internal timebase covers all years -3999 .. +3999 and supports no others.

One Day_SI (defined as 86_400 Seconds_SI [60 Secs_SI * 60 MinsPerSec_SI * 24 HoursPerMin_SI]) is adopted as the nominal unit of internal time (nominal because our time has a heartbeat: uint(beats) times each Day_SI); then there is no conflation of TAI proleptic handling with Gregorian proleptic handling. The temporal perspective that the "first way" evinces applies with the "first way" applies pari passu.

Here is that. The daynumber computation enfolds the first way. A well known algorithm provides the return trip.

DayNumber.jl

@JeffreySarnoff
Copy link
Contributor

These daynumbers are not a translation of Julian dates. Julian dates are tallying days of 86_400 Seconds_SI.
Any sense of commensurablity is an illusion, and one reason that most systems read time at a 1st grade level.

@nolta
Copy link
Member

nolta commented Dec 3, 2012

Every historical date is astronomical. If you're going to represent time as "proleptic TAI", then you're going to have to apply a ΔT correction to convert back and forth.

@JeffreySarnoff
Copy link
Contributor

    • I sought to convey (let me coin) "proleptic SI" [ PSI ] predilection

We should not choose proleptic use of the historical TAI for our internal timebase. TAI has only been well aligned with the SI second for the past 10+ years. Before this, the same name has a different nature. DeltaT improves accuracy of distant glimpses, and has is of import to many. The sorts of interframe shifting and intraframe sliding entailing DeltaT is not naturally self-managed. A cleaner mechanism obtains when the core competency our internal timeway governs.

@JeffreySarnoff
Copy link
Contributor

"the first way seen a second way"

M

The first way is to keep the internally monotonic and uniform day|second|femtosecond|Planck-time counter riven to the SI second (which itself may be redefined in a few years) and, therewith the TAI day of 86_400 SI seconds. Use proleptic astronomical [i.e. year number zero exists] Gregorian rules to obtain an internally consistent, perfectly unwindable internal labeling of year month day with SI-seconds+subsecond-tocs of the day.

  • choose resolution independent uniformity over representational diversity
    • such constancy allows one to reach out ease, and so to greet others
    • the other is a vagabond polyglot, extending tangles
  • all days the same exact duration: 86_000 second_SI
    • and so for every week_SI, hour_SI, minute_SI, and
    • nth second_SI (n=a*..*d & a..d in factor(92926311770))
  • the calendar has year 0, so leapyears are sign symmetric
    • a proleptic Gregorian calendar may be use
    • maybe refined to better match diurnal/nocternal

@JeffreySarnoff
Copy link
Contributor

As I ready files for public display ( fresh DeltaT, daily specials ) and see something of specific interest ...

# jasQuoRem
#
#    This is a variation of  divmod(n,d) == (div(n,d), mod(n,d))
#    that is used with counts and displacements in units of time
#    for computations that resolve time at multiple resolutions.
#
#    NUMERICAL STABILITY
#    This function is designed for robustness and stability. 
#    If integer division is applied to either returned value
#    (or one directly derived from a returned value), fld()
#    may be safer than div().  Avoid flipping between div()
#    and fld() just for convenience and not by design.
#
#
#    argument     signifying               has domain
#
#    n            a_count                  nonnegative integers
#    n            a_displacement           integers
#    d            a_subsumptive_ratio      positive integers
#
#
# jasQuoRem( tally          , multiresolution )
#            (n>=0 seconds) , SecondsPerDay     --> (days >= 0 , seconds >= 0)
#
# jasQuoRem( shift          , multiresolution )
#            (n>=0 seconds) , SecondsPerDay     --> (days >= 0 , seconds >= 0)
#            (n< 0 seconds) , SecondsPerDay     --> (days <  0 , seconds >= 0)
#
#
# with numbers
#
#   const SecondsPerMinute = 60
#
#   q,r = jasQuoRem(  7, SecondsPerMinute)   #  ( 0,  7)  
#   q*SecondsPerMinute + r ==   7            #    7 seconds is +0 minutes +7 seconds  
#   q,r = jasQuoRem( -7, SecondsPerMinute)   #  (-1, 53)
#   q*SecondsPerMinute + r ==  -7            #   -7 seconds is -1 minute +53 seconds
#                                            #    7 seconds earlier is obtained
#                                            #      by  subtracting 1 minute
#                                            #      and adding 53 seconds
#
#   q,r = jasQuoRem( 67, SecondsPerMinute)   #  ( 1,  7)  
#   q*SecondsPerMinute + r ==  67            #   67 seconds is +1 minute   +7 seconds  
#   q,r = jasQuoRem(-67, SecondsPerMinute)   #  (-2, 53)
#   q*SecondsPerMinute + r == -67            #  -67 seconds is -2 minutes +53 seconds
#                                            #   67 seconds earlier is obtained
#                                            #      by  subtracting 2 minutes
#                                            #      and adding 53 seconds

jasQuoRem(n::Signed, d::Unsigned) = begin q=fld(n,d); (q, (n-(d*q))) end
jasQuoRem(n::Signed, d::Signed  ) = jasQuoRem(n, convert(Unsigned, abs(d)))

@JeffreySarnoff
Copy link
Contributor

jasQuoRem(n::Signed, d::Unsigned) = begin q=fld(n,d); (q, Convert(Signed,(n-(d*q)))) end
jasQuoRem(n::Signed, d::Signed ) = jasQuoRem(n, convert(Unsigned, abs(d)))

jasQuoRem(n::Int64, d::Uint64) = begin q=fld(n,d); (q, Convert(Int64,(n-(d_q)))) end
jasQuoRem(n::Int32, d::Uint32) = begin q=fld(n,d); (q, Convert(Int32,(n-(d_q)))) end

jasQuoRem(n::Int64, d::Int64 ) = jasQuoRem(n, convert(Unsigned, abs(d)))
jasQuoRem(n::Int32, d::Int32 ) = jasQuoRem(n, convert(Unsigned, abs(d)))

@JeffreySarnoff
Copy link
Contributor

Carrying timezone information in Julia without a Composite Type requires two parameters. I find three works best.
Without the third, that same value would need to be redetermined each time an object participates in a calculation.
Here is the idea.

abstract Temporal                                   # (?) abstract Temporal <: Signed
abstract CalendaredTime{C}   <: Temporal            # { Cs[1] => :Gregorian, ... }
abstract ZonedTime{C,Z}      <: CalendaredTime{C}   # { Zs[1]=>  "America/New_York", ... }

bitstype N Timeliness{C,Z,I} <: ZonedTime{C,Z}      # I indexes Z's vector of transitions

@GlenHertz
Copy link
Contributor

Hi,

I want to do some work with dates and want to know if there is any code to use. I couldn't find a module related to dates inside ~/.julia/.

Glen

@ViralBShah
Copy link
Member

I have a feeling that we have a bunch of Date and Time stuff, but I can't see it in the packages. Would be nice to have it in there.

@johnmyleswhite
Copy link
Member

There's the Calendar package as well Stefan's draft DateTime type.

@GlenHertz
Copy link
Contributor

From what I see in Calendar and tm4julia they look really good. Seems much more intuitive than Python! I'll use Calendar until something better is packaged.

Out of curiosity, anyone know where Stephan's DataTime lives?

@JeffreySarnoff
Copy link
Contributor

Glen --
Stephan wrote this a draft (and exemplar) https://github.com/JuliaLang/julia/blob/master/extras/time.jl
Using Calendar is likely your best bet today; underlying the package is ICU, a very reliable and well maintained library.

I appreciate your good mention of tm4julia. I am progressing better than Sisyphus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants