-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting the new course ID format? #52
Comments
It looks like these two spots in the code are expecting a "/" delimiter in the course ID, rather than a '+' character as the delimiter. https://github.com/mitodl/edx2bigquery/blob/master/edx2bigquery/main.py#L28 |
Supporting the new course ID format was pretty easy with this two changes:
But now the load_course_sql script is looking for a users.csv file. I downloaded the one from the /sysadmin dashboard, but this doesn't seem to be the right format, because i'm now getting an error which seems to indicate that it's looking for an ID column of type integer.
Is there a sample data dir that shows all the files that the scripts expect to be there, and the format they should be in? |
Hi Nate, Try by starting with the "waldofy" command. This normalizes the SQL from edX, including renaming files to match HarvardX's standard. edx2bigquery and XAnalytics both stick to using the "slash separated course_id" format. The funky "opaque keys" v1 + version is unsupported, because, to this point, it's been unnecessary. All the opaque format course_id's are converted to slash separated keys, during ingestion. See, for example,
in addition to the lines in load_course_sql.py. Note that XAnalytics also depends on the course_id being in slash separated format. A welcome contribution would be to do this conversion uniformly, e.g. using the opaque keys library. |
Thanks Ike. I did run the waldofy command, but still having problems. I think what would be really helpful would be to provide a sample directory structure of the directory names that need to exist (below db_backups), and a matching edx2bigquery.conf file. In other words, could you go into a working implementation of edx2bigquery, and run the "tree" command to show what directories and files need to be in the tree in order for the cmd to execute properly. I'm getting hung up on the naming, so this would be really useful from a documentation perspective, to avoid all the trial and error that I'm having to do. |
Does edx2bigquery support the new way of formatting course IDs?
course-v1:MITx+24.00x+2013_SOND
or does it only support the old style way of representing course IDs?
MITx/24.00x/2013_SOND
When I try to load a course using the new format, I get an error:
Note: I'm not actually using that ID but a different ID, but it has the same format as this one.
The text was updated successfully, but these errors were encountered: