-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8 fixes to csv -> hive upload #4488
Conversation
I think simply using the |
c02fbd7
to
5f991db
Compare
@john-bodley Done. Good call. |
8dc9d42
to
42f4ffa
Compare
superset/db_engine_specs.py
Outdated
@@ -868,16 +868,17 @@ def get_column_names(filepath): | |||
secure_filename(form.csv_file.data.filename) | |||
column_names = get_column_names(upload_path) | |||
schema_definition = ', '.join( | |||
[s + ' STRING ' for s in column_names]) | |||
[s.decode('utf-8') + ' STRING ' for s in column_names]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use decode
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right. It is no longer needed after using unicodecsv. Thanks!
|
||
s3 = boto3.client('s3') | ||
location = os.path.join('s3a://', bucket_path, upload_prefix, table_name) | ||
s3.upload_file( | ||
upload_path, 'airbnb-superset', | ||
upload_path, bucket_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
42f4ffa
to
eccd942
Compare
eccd942
to
87f069c
Compare
LGTM |
os.path.join(upload_prefix, table_name, filename)) | ||
sql = """CREATE EXTERNAL TABLE {table_name} ( {schema_definition} ) | ||
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS | ||
TEXTFILE LOCATION '{location}'""".format(**locals()) | ||
TEXTFILE LOCATION '{location}' | ||
tblproperties ('skip.header.line.count'='1')""".format(**locals()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the skip.header.line.count
property will only work in Presto for v0.199 or later per here.
This pr fixes 4 issues.
How was this tested?
Tested on my development machine.
@john-bodley @mistercrunch