Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date queries for a single day do not return tracks within the final hour of that day #2652

Closed
discopatrick opened this issue Aug 10, 2017 · 9 comments
Labels
bug bugs that are confirmed and actionable

Comments

@discopatrick
Copy link
Member

Problem

Date queries for a single day do not return tracks within the final hour of that day, e.g.

$ beet ls added+ added:2017-02-14

What's interesting is that if I print out the SQL clause that is being run, I get this:

('added >= ? AND added < ?', [1487026800, 1487113200])

And if I feed those timestamps into a timestamp-to-datetime converter, I get these dates:
February 13, 2017 11:00:00 PM GMT+00:00
February 14, 2017 11:00:00 PM GMT+00:00

The period actually runs from 11pm to 11pm. This would explain why my query doesn't return tracks from the final hour of the day. Note: GMT/UTC and my local timezone are "the same" in February, so I don't think this is an issue of timezone conversion.

I've done some digging through the code, and the start and end dates of the query appear to be correct as they get passed between the various components of beets - however, one thing did catch my eye - the point at which beets converts a datetime into a timestamp that can be passed to the db in the query.:

https://github.com/beetbox/beets/blob/v1.4.5/beets/dbcore/query.py#L530-L532

        epoch = datetime.fromtimestamp(0)
        delta = date - epoch
        return int(delta.total_seconds())

If I print the epoch variable to the terminal, I see 1970-01-01 01:00:00. Shouldn't this be midnight, and not 1am? Is there something wrong here?

I'll keep digging and working on a pull request, but any help would be much appreciated.

Setup

  • OS: OSX Yosemite 10.10.5
  • Python version: 2.7.13
  • beets version: 1.4.3 (I rolled back to this version to ensure that it wasn't caused by the addition of datetime queries in 1.4.5)
  • Turning off plugins made problem go away (yes/no): Not yet tried.
@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Aug 10, 2017
@sampsyo
Copy link
Member

sampsyo commented Aug 10, 2017

Well! That's very tricky—thank you for investigating this. In my experience, time- and time-zone-related bugs can be among the most monstrous.

I think you're right to be suspicious of that _to_epoch_time conversion. The idea there is to get a time as a timestamp that matches how timestamps are created to be stored in the database. Namely, in library.py, you can see the DateType class has a parse function that's responsible for doing roughly the same thing based on a string. And Model in dbcore/db.py has a line self.added = time.time() that's also supposed to create a timestamp number. All of these should use timestamps that are on the same "scale" and use the same epoch point for reference, but I suspect that one or more of them is somehow out of sync. Maybe it's worth printing out all of those to check whether they seem to align when they all are engineered to refer to "now"?

FWIW, in my time zone (US Eastern), here's what that epoch returns:

>>> print(datetime.datetime.fromtimestamp(0))
1969-12-31 19:00:00

@discopatrick
Copy link
Member Author

I'll try that, thanks. It might be that we need to consolidate all methods of calculating times into a single helper function.

So this is interesting - for a few years (including the year of 1970) the UK ran an experiment where the timezone was GMT +01:00 for the whole year round - which might explain why the epoch in my timezone is 1am. https://en.wikipedia.org/wiki/British_Summer_Time#Periods_of_deviation

There's some further discussion on this from a Python point of view here: https://stackoverflow.com/questions/40332018/python-able-to-handle-timestamps-timezone-insensitively-except-in-gmt

Timezones in Python are definitely something I want to understand fully, so I'd be happy to take this on.

One approach I've seen used is that you always store timestamps in UTC, and you only convert them to local time on output (e.g. displaying added to the user), or convert back to UTC on input (e.g. when the user queries on added).

The downside to storing in UTC could be that we then lose the context of the local time of that particular datetime. It perhaps doesn't matter so much when it comes to added. But what if one day we wanted beets to keep count of plays of each track, so we could then do a query like "show me all the music I tend to play between 8pm and midnight". We would want to be able to know what the local time was for each play. That's fine if the user always stays in the same timezone, we just keep applying the same conversion... but what if they move country one day, or just happen to travel a lot? I'm thinking way ahead here, but that's just how my mind works!

I think you can save the timezone in a ISO datetime string by appending e.g. +01:00 to the end of the time, denoting how many hours ahead or behind UTC this particular time is. Whether or not you can also do this with timestamps is something I'll need to investigate.

@sampsyo
Copy link
Member

sampsyo commented Aug 11, 2017

Got it; thanks for looking into it!

Your points about keeping local time information are well taken. I think it makes the most sense to (try to) align everything to UTC for now, since that seems to be the most universal. I like the idea of augmenting this with TZ information down the road for the kinds of use cases you posted out.

@discopatrick
Copy link
Member Author

Some good news - we are probably already storing added as a UTC timestamp. I say "probably", because of the following:

We are using time.time() to get the current time to use as the value for added:

self.added = time.time() (https://github.com/beetbox/beets/blob/v1.4.5/beets/dbcore/db.py#L446)

...and, on most systems, time.time() returns a UTC timestamp - although on some systems that may or may not include leap seconds... which I'm going to try to ignore for the moment! https://stackoverflow.com/a/16554944/3293805

I think the problem may stem from where we try to convert a date for a query into a timestamp, and the clue is in the docstring:

def _to_epoch_time(date):
    """Convert a `datetime` object to an integer number of seconds since
    the (local) Unix epoch.
    """

...so we're converting to a local timestamp, when I think we should be converting to a UTC timestamp.

In my timezone (GMT) this has the small but noticeable effect of my queries being 1 hour out (and in fact this gets cancelled out during the summertime BST). However, in other timezones it would have a much larger effect.

Hence, I'm wondering why this bug hasn't been noticed before now? I would like to make sure this is a problem for everyone and not just me, before I start changing the code. @sampsyo (or indeed anyone not in GMT/BST) - could you try to recreate the issue from your timezone, by attempting to query (by date added) some tracks that you know were added in a specific hour, and see if you get back the expected results?

@wisp3rwind
Copy link
Member

FWIW, this started happening to me just after midnight

======================================================================
FAIL: test_single_day_match_fast (test.test_datequery.DateQueryTestRelative)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../test/test_datequery.py", line 200, in test_single_day_match_fast
    self.assertEqual(len(matched), 1)
AssertionError: 0 != 1
-------------------- >> begin captured logging << --------------------
beets: DEBUG: Sending event: database_change
beets: DEBUG: Sending event: database_change
beets: DEBUG: Sending event: database_change
--------------------- >> end captured logging << ---------------------

but only in the Python 2.7 tox environment, not for Python 3.6. Clock is CEST.

(tbh, I didn't read the full thread, I just remembered having seen this when the error popped up)

@sampsyo
Copy link
Member

sampsyo commented Jan 1, 2018

Happy new year! Looks like this same problem just triggered some test problems because we're within a few hours of a new month. See on Travis, for example:
https://travis-ci.org/beetbox/beets/jobs/323702148

The specific problem uncovered here is that, in test_single_month_nonmatch_slow and test_single_month_match_slow, the following two don't match:

  • In DateQueryTestRelative.setUp, we try to set the added field to the current time. This appears to use local time.
  • In DateQuery.match, we use utcfromtimestamp to recover the datetime object from the timestamp. This, needless to say, uses UTC. As a result, the time is several hours later than it should be in my time zone (and Travis's), which moves it into the next month and matches the query incorrectly.

@discopatrick
Copy link
Member Author

discopatrick commented Jan 16, 2018

Hi Adrian, happy new year! How dedicated of you to be watching the builds over NYE :)

I'm afraid I put this problem on the backburner as I needed to spend more time with my actual music rather than the code! Also it was making my head spin...

I will try to come back to it at some point, but if anyone wants to pick this up in the meantime (no pun intended), feel free.

@sampsyo
Copy link
Member

sampsyo commented Jan 16, 2018

Sounds good. It's hard to blame you for focusing on music itself rather than hacking about music. 😃

@sampsyo sampsyo added bug bugs that are confirmed and actionable and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels Feb 21, 2018
sampsyo added a commit that referenced this issue Feb 21, 2018
@sampsyo
Copy link
Member

sampsyo commented Feb 21, 2018

OK; I've finally committed the above-mentioned change to fromtimestamp from utcfromtimestamp, complete with tests that failed before this change.

I convinced myself that this was the right fix using two facts. First, an observation that:

>>> datetime.datetime.fromtimestamp(time.time())

returns a datetime tuple that actually reflects what I think of as the current time. (The utcfromtimestamp returns the time that people in London think it is right now.) Second, recall that this is about query matching, so we've parsed the user's local time queries from strings. We want to represent both the query bounds and the currently-stored matched-against time as local times. Switching to this method gets us that.

Whew! Time zone stuff is rough. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs that are confirmed and actionable
Projects
None yet
Development

No branches or pull requests

3 participants