Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reading non-utf8 stdin, emit a more specific warning; for Python 3.7+, use stdin.reconfigure() #1038

Closed
dannguyen opened this issue Jul 25, 2019 · 2 comments

Comments

@dannguyen
Copy link
Contributor

dannguyen commented Jul 25, 2019

Sorry if this issue is closely related to this one that was closed last year:
Change --encoding error text if tool receives input from stdin #898

The fix for that issue was to have csvkit mention PYTHONIOENCODING in the message for an encoding error:

Your file is not "utf-8" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable. Use the -v flag to see the complete error.

I've skimmed the relevant parts in the source code but haven't yet dug in too deep, so a couple of quick comments/questions:

1. Have the error message be more explicit when stdin is used?

Is it possible/non-trivial to adjust the warning message to say something specifically about stdin when stdin is the input, especially if the user has set the -e flag? I have to admit all this time when piping into a csvkit util and getting an encoding error, I interpreted the message with the -e flag or with the PYTHONIOENCODING environment variable to mean that I should use either -e or set PYTHONIOENCODING – i.e. if -e wasn't working, it was because I hadn't figured out the proper encoding (though I guess I could have interpreted Your file is not "utf-8" encoded to mean that csvkit wasn't seeing my -e flag at all)

Something like:

Your file is not "utf-8" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable. Use the -v flag to see the complete error.

Note: if you are reading input from stdin, the -e flag is ignored and you must set the PYTHONIOENCODING variable, e.g.

 $  cat mydata.csv | PYTHONIOENCODING='windows-1252' csvformat

Or:

 $  PYTHONIOENCODING='windows-1252' csvformat < mydata.csv

2. Automatically configure the encoding for stdin for Pythons 3.7+

I saw that Python 3.7 adds a new stdin method to set its encoding:

sys.stdin.reconfigure(encoding='windows-1252')

I know the 3.7 userbase is probably still a relative minority, but is it worth adding in conditional behavior to cli.py when six detects version > 3.7?

@jpmckinney
Copy link
Member

Thanks for this! I have scheduled it for the next release (sometime before April 2020).

@calebeaires
Copy link

calebeaires commented Jul 31, 2020

Same problem here. On Ubuntu 18, Python 3.8.5 with this errors.

/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py:400: DeprecationWarning: `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
Your file is not "utf-8" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable. Use the -v flag to see the complete error.

Full error

  File "/home/ubunpc/.local/bin/sql2csv", line 8, in <module>
    sys.exit(launch_new_instance())
  File "/home/ubunpc/.local/lib/python3.8/site-packages/csvkit/utilities/sql2csv.py", line 80, in launch_new_instance
    utility.run()
  File "/home/ubunpc/.local/lib/python3.8/site-packages/csvkit/cli.py", line 118, in run
    self.main()
  File "/home/ubunpc/.local/lib/python3.8/site-packages/csvkit/utilities/sql2csv.py", line 65, in main
    rows = connection.execution_options(no_parameters=True).execute(query)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 939, in execute
    return self._execute_text(object, multiparams, params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1092, in _execute_text
    ret = self._execute_context(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1184, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1167, in _execute_context
    self.dialect.do_execute_no_params(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 473, in do_execute_no_params
    cursor.execute(statement)
  File "/home/ubunpc/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/home/ubunpc/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 321, in _query
    self._post_get_result()
  File "/home/ubunpc/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 355, in _post_get_result
    self._rows = self._fetch_row(0)
  File "/home/ubunpc/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 328, in _fetch_row
    return self._result.fetch_row(size, self._fetch_type)
  File "/usr/lib/python3.8/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants