-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time/Duration format in extract_metadata #220
Comments
Hi, thanks for the report. The ReadStat library is time-agnostic and simply returns raw numbers alongside the format string – it's up to the client program to make sense of "DATE5", etc. Different formats use different conventions - SAS and Stata use seconds since 1960, certain Stata formats use days since 1960, and SPSS uses seconds since the advent of the Gregorian calendar in 1582 (!). ReadStat is essentially silent on these issues. I believe Part of that logic exists in a separate library here: https://github.com/WizardMac/TimeFormatStrings But this is not utilized by ReadStat or extract_metadata. Note that all data types are extracted exactly as stored, so it's "just" an issue of formatting. There may be other formats such as Hex or currency that are not presented as expected. |
Hi Evan! Thanks for your prompt and extensive answer. As for date formats, I believe we have implemented the logic to correctly process SPSS/STATA dates from our Go package, which looks like that: package spss
import (
"time"
)
// ConvertDate converts an SPSS date into a standard time struct, where `d`
// is the number of seconds since `1582-10-14`
func ConvertDate(d int64) time.Time {
return time.Unix(d+epochDelta+adjustment, 0).UTC()
}
var (
// Epoch contains the number of seconds between the 1582-10-14 to 1970-01-01
//
// Dates in SPSS are recorded in seconds since October 14, 1582,
// the date of the beginning of the Julian calendar
epochDelta int64 = -12219379200
// adjustment is the number of seconds adjusted from the julian
// to the gregorian calendar
adjustment int64 = 864000
) This works because {
"type": "SPSS",
"variables": [
{
"type": "DATE",
"name": "Date1",
"label": "Date format 1"
},
{
"type": "DATE",
"name": "Date2",
"label": "Date format 2"
},
{
"type": "NUMERIC",
"name": "Heure",
"label": "Date - heure/seconde",
"representation": "duration", // Just an example
"format": "hh:mm", // Just an example
},
{
"type": "STRING",
"name": "Texte",
"label": "Blabla"
}
]
} |
@evanmiller I've released a new version of our project Otherwise, I think we can consider this feature done and thus close this issue. |
@basgys Sounds good! |
Hi WizardMac team!
First of all, thank you very much for your hard work on this open source project.
I've started to build a tool to read SPSS files and I have a problem with time/duration columns. I use readstat/extract_metadata and
extract_metadata
extracts time column as simple integer. However, when I use WizardMac, it recognises the time column. (See screenshot)Our tool vs WizardMac
extract_metadata output
I tested with both 1.1.4 and 1.1.5.
Source file
date.sav.zip
Time-related PRs
Metadata
Besides time/duration, is there another known data type currently not extracted?
Cheers!
The text was updated successfully, but these errors were encountered: