-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series support in Base? #3524
Comments
+1 If it's ironed out enough, I think time series support is a very natural expectation of Base material. |
I really wish that this didn't entail depending on ICU, but we probably should. |
Please ping me when this lands, as I should probably update build recipes and debian package requirements to include |
How much functionality can we retain without ICU? |
I've started digging into it a little bit, and @nolta can speak to it more, but currently |
@karbarcca It depends on what you mean by "majority of the functionality". Implementing zulu time in pure julia would be easy. Adding timezone support, however, would be quite difficult. As i see it, proper timezone handling is Calendar's raison d'etre. |
@karbarcca agree with @nolta. Zulu time is implemented in pure Julia in https://github.com/aviks/SimpleDate.jl (the api is different from Calendar, but that would be a trivial change) Adding timezone support is the missing piece. I had some julia code to parse the olson database, but never got to implementing the conversions. On top of that, one needs leap second support. Another large piece of functionality that ICU provides is date formatting/parsing. All of which is certainly doable, but is a large chunk of work. |
ICU is big, but it is easy to get on Linux and is included in OS X. I believe it also works on Windows. Even if we can reimplement much of what we need in Windows, it will be quite a chore to support. I am in favour of using ICU and bringing Calendar into Base. Over time, we could reduce our dependence on ICU, if circumstances demand. |
My vote would be to start out with a simpler featureset in pure Julia (a la SimpleDate.jl) with a clean API that can be expanded over time and exclude the ICU dependence. Starting out with a more SQL-like support with Date, Time, DateTime types, lubridate-style arithmetic, duration, period, and interval support, and IO support with parsing/formatting is a solid foundation that provides a lot of basic functionality. As for leap seconds, we could take ICU's approach and ignore them :) (basically leave it to the operating system to figure out). Timezone support is definitely a bigger chunk of work to do manually, but there are simple ways to include basic functionality (as @aviks mentioned) and marking it as a future feature for full support I think is reasonable. |
@karbarcca happy to give you commit access to SimpleDate if you want to run with that codebase. |
In general, I like @karbarcca's gung ho attitude. My major worry is that we can't start using times without leap seconds and then introduce them later unless we tell people that the date time support is a draft at best. |
Thanks @aviks, I've enjoyed going over your code today and I'm taking a stab at adding features while trying to follow Calendar.jl's conceptual framework to get a working julia Calendar/Timezone implementation. The thing with leapseconds is there really isn't a lot of consensus on how to handle them at all. Most languages/applications (including ICU) have taken the approach that they're not going to worry about it and let it be an OS problem. Implementation can be tricky, but we can take a stab at it if we deem it important enough. I found blog post here with a good walkthrough of considerations, and also came across a slightly hacky way that Google deals with leapseconds. The problem with a lot of "solutions" (hacks mainly) is that they're all pretty much clever ways to trick servers into dealing with an extra second every once in a while and not really a formal API or anything. One thing I thought of was doing a simple cache of year seconds/milliseconds and basing our date parsing off of them. That would allow us to easily manually add seconds as needed, calculate accurate durations/intervals, and also has the added benefit of giving us much faster date parsing. Anyway, I'll plug away a little more and see if I can't push something for review. |
I suspect that ignoring leap seconds might be best since:
I think the easiest and least confusing solution would be to define a TAI timezone (or a seperate TAI datetime type), and define a method for converting between that and UTC by adding/subtracting the appropriate number of seconds: that way, if you want the length of an interval for between t2 - t1 # duration ignoring leap seconds
TAItime(t2) - TAItime(t1) # duration accounting for leap seconds |
Ok, I just created new repo with a bunch of stuff I've been working on the last 2 weeks. Basically it ended up being a much larger beast than I anticipated, but that most of you were probably aware of. :) I think it's a really good start though for Date, Periods, TimeZone, and DateTime support in pure Julia. The main influences for the code are as follows:
High-level framework/concepts:
Potentially useful additions:
This is definitely a first draft, and I'm positive there are holes to be plugged and lots of refinement needed. I would really appreciate any questions, critiques, discussion to push this forward. Sources: |
Thanks @karbarcca , this is great. A couple of quick comments while I have more of a play with this
|
|
Ok, I added Overall, I'm really pleased with how far the performance has come. @timholy's profiler was a great help (and hopefully on windows soon?). Performance-wise, The remaining performance issues are when timezones are specified. I'd say it's at an acceptable/working level (compared to the first draft), but still 2x-4x slower than Calendar.jl. Right now, the timezone data is serialized in matrices for each timezone in the Anyway, it's been a ton of fun working on this stuff and I've really enjoyed how much I've learned about bitstypes and type parameters through the process; it's definitely expanded my understanding on how Julia works and the potential there really is thru the type system. Feedback welcome! |
This is really quite amazing. I am a bit taken for the moment, but will certainly jump into this in the next few days. |
Vacations are always nice to mull things over. I've thought a lot about the Datetime stuff and particularly about timezone/leap second support and how to do it in a way that's both efficient and maintainable. Here's what I'd like to propose:
Feel free to check out the new Datetime2 package (it's really fast!), and I'd love to hear everybody's thoughts on the proposal. |
I've started using this and think it would be great to have in Base. Getting this finalized will still take some work, but this is very close to the kind of design I'd like to see (as a person without any detailed expertise in time representations). |
Also cc: @milktrader |
Neat stuff! I have some reservations about the API, however, in particular the way periods are handled. Eager down-conversion, e.g.,
I also don't think splitting the package in two is a great idea. |
@nolta, can you elaborate more on why you think splitting the package in two is a bad idea? I agree that at first glance, it seems unintuitive and a little weird, but I think the advantages I mentioned in having always-up-to-date timezone/leap second information is a major win. w.r.t timezones, every other major datetime package (Joda, Noda, etc.) ships with a static repo of the timezone data and details a long, complicated download-reformat-recompliation process for manually updating. And for leap seconds, I would argue that we shouldn't support leap seconds in Base under any circumstance. The fact that a new leap second can occur within 6 months would quickly render a static release useless (imagine running a server logging timestamps, expecting leap second support). We'd put ourselves in the same camp as Joda/Noda detailing a manual update process that would surely turn off users. With the package system stabilizing, I think it provides an excellent--and simple--way to provide updates of timezone/leap second data. As for the period arithmetic, I agree that there's a possible gotcha, but there's also not a clear solution without losing some expected behavior or have inconsistencies (e.g. not allow years/months, but allow days). I see a few options:
|
I agree with all of this. But these arguments work equally well with the proposition "we shouldn't merge Datetime into Base". Splitting Datetime and merging part of it into Base is liable to create two tightly-coupled modules with different release schedules. So instead, let's not split Datetime up, and leave it as a package. As you say, the package system works great.
This is also what Calendar does, so it's the solution i prefer. But if Datetime remains a package, then you're free to implement the solution you prefer. |
I think DateTime is important enough that there should be a single canonical implementation. Imagine a system where DataFrames depends on one kind of datetime object, and @milktrader 's TimeSeries package usage a different type of Date. One would be converting between different types of dates all over one's codebase. This arises quite often in Java projects, where one dependent library uses JODATime, and another uses java.lang.Date ... (at least there are converters available in this case) While one may make such an argument about any facility, I believe datetimes are fundamental enough that this matters a lot. The best (only?) way of ensuring this is to have a solid date time implementation in Base. |
I'm not sure I follow/understand you here. It would probably be clearer if Is there some concern or potential misstep I'm missing with this kind of Jacob |
It seems like the main concern about putting everything in Base is the lack |
Have some time this week to jump into this conversation. Great stuff. |
@karbarcca Thanks for the details. But i read your plan and think, "This sounds like a hassle. Why bother?" @aviks I don't really buy this argument. There are lots of important packages not in base. If Datetime is high quality, people will use it. If interop becomes a problem, we can ask maintainers to switch. Maybe i'm wrong, but my gut feeling is that the benefit of including this in base is modest at best, and not worth the cost of splitting up the package. |
I agree with most of @nolta's points. Preventing fracturing of date/time representations is largely a social issue and partly a technical issue of having an official date/time package that's good enough that everyone wants to use it instead of rolling their own. The biggest argument to me for having a time representation in base is that we might want to have functions in base return objects of that type. The |
R has some experience iterating through ways of dealing with time and there is a good man page at I think base would be well-served to have at least a foundational time-based type. It can have a timezone field that defaults to Being able to plot seasonal monthly birth rates from a DataFrame should be available out of the box (base) while those who want more precision and ability to aggregate across specific time periods can access a package. |
So after having a go at splitting the So while splitting the package in two conceptually seems like a great idea for maintainability, practically it doesn't seem to be the best solution. I've merged the revised/enhanced codebase of I'm happy to help support anything that would like to be included in Base, otherwise, |
@nolta @StefanKarpinski Well, to me the notion of a programming language without date/time support in its standard library seems incomplete. I suppose in the same way a language without BLAS/LAPACK support will seem incomplete to most Julia users. So I would want some kind of date time module in base. But, at the end of day, that's a pretty subjective opinion. |
Yes, but things can develop outside and be brought into Base later. |
Good point. We haven't gotten beyond 0.2 yet. How about a milestone, say by 0.5? |
Re @staticfloat thinking outside the box. If DateTime moves into base, the issues regarding keeping the leap seconds and timezone information up to date come to the forefront. What if there was a package like |
@ggggggggg, that's a great idea! And we've actually been discussing doing just that over here. I just pushed some changes the other day that should allow us to do this. |
Auto-updating anything is not ok. You can't start julia and have it make network connections you didn't ask for. In general anything like that should be opt-in not opt-out. |
I think including the data updates in point releases is probably fine. |
Good clarification @StefanKarpinski. Yes, we wouldn't be pushing anything automatically, but it could still be as simple as installing an |
Reopening; this was closed by the accidental bizarro-merge in #7825. |
Is it possible to merge Calendar into Base, possibly after some bikeshedding? We had so much discussion about time series support last year, then Calendar became the de facto time series tool without ever entering Base.
The text was updated successfully, but these errors were encountered: