-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GuestBook bug? 633,795 downloads with no timestamps #3324
Comments
@kcondon - you spurred a quick discussion about this during backlog grooming. You said that this may be as expected and not a bug? Can you leave some details here? Thanks! |
Yes, this is likely due to migration and expected behavior. Leonid would best be able to confirm since I believe he worked on this aspect of migration. |
@landreev - can you take a look and see if this is expected? Thanks! |
So yes, this is expected behavior. In a sense that those are grandfathered in, older download entries for which we don't have dates/timestamps recorded. (Before we started logging individual downloads, we only had download counters). There is nothing we can do about it - it's just missing data. However, when we generate access reports/otherwise display this data, we can think of presenting it in some sensible way: like, instead of listing all these downloads with no recorded times, we should probably just say "plus N downloads were recorded before [earliest download date recorded]; no further information is available about those prehistoric downloads, sorry for the inconvenience." |
I'm measuring the rate of downloads over time for a particular dataverse on Harvard Dataverse, and trying to account for download entries with no timestamps. Thought others digging into this issue might find these details helpful: It seems fair to say that only download entries associated with files migrated from 3.x to 4.0 have no timestamps. The migration happened in April 2015. Among the download entries with no timestamps (i.e. guestbookresponse.responsetime is null), the latest createdate (dvobject.createdate) of the associated files is April 23, 2015.
The earliest download date (guestbookresponse.responsetime) recorded is 2008-07-31. This makes me think that Dataverse was adding timestamps to download entries long before Harvard Dataverse's April 2015 migration to Dataverse 4.0. So instead of "plus N downloads were recorded before [earliest download date recorded]," would it be more accurate to say "plus N downloads were recorded before April 2015"? |
There was tracking of download date in DVN 3 (i.e. before April 2015); but not from beginning of the project (i.e. 2006). So adding before April 2015 doesn't make sense, since we do have recorded ones before that. The ones without timestamps would all be (I think) before the earliest date). |
Spoke with @scolapasta, who suggested also looking at the guestbookresponse.id that the database assigns to each guestbook entry. The id number increases chronologically:
Just to be sure: The earliest timestamp of the guestbook entries whose ids are greater than 1954821 5 is 2015-04-05 It seems fair to say that:
1 select min(id), max(id) 2 select min(id), max(id) 3 Copy of database I query was updated April 2, 2018, so data recorded after April 2 isn't included. 4 select count(*) 5 select min(responsetime) |
Bug 1
bug 1 update: @kcondon mentioned that these may be downloads ported from the 3.x system
I have a July snapshot of data. I just noticed that 633,795 entries in the table
guestbookresponse
have NULL responsetime. This is fully 1/3 of the guestbookresponse entries.In other words
633,795
file downloads have no timestamp attached (and don't appear in the metrics visualizations).Both sql statements below give the
633,795
numberFix:
responsetime
NOT NULLThe text was updated successfully, but these errors were encountered: