-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Exception when parsing log #20966 #21053
Conversation
…X: invalid start byte File "/opt/work/python395/lib/python3.9/site-packages/airflow/hooks/subprocess.py", line 89, in run_command line = raw_line.decode(output_encoding).rstrip() # raw_line == b'\x00\x00\x00\x11\xa9\x01\n' UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 4: invalid start byte
Another alternative is: try-catch it. e.g. ``` line = '' for raw_line in iter(self.sub_process.stdout.readline, b''): try: line = raw_line.decode(output_encoding).rstrip() except UnicodeDecodeError as err: print(err, output_encoding, raw_line) self.log.info("%s", line) ```
Would it be possible to add a test for this? |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Okay, I'll add the test code later. |
Static checks need fixing - I recmmend installing pre-commit and |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Tests is failing:
|
for line in logs: | ||
timestamp, message = self.parse_log_line(line.decode('utf-8')) | ||
line = line.decode('utf-8', errors="backslashreplace") | ||
timestamp, message = self.parse_log_line(line) | ||
self.log.info(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for raw_line in logs:
line = raw_line.decode('utf-8', errors="backslashreplace")
timestamp, message = self.parse_log_line(line)
@microhuang can you please fix the test? |
@microhuang do you wish to finish the work on this PR? |
@microhuang where you @?
|
Only formatting isn't right. It's almost 3 months without any activity. Is it okay if someone else takes over it? |
If @microhuang doesn't wish to finish this PR, you can take over and open a new PR |
You could also check out the branch, commit onto it, and open a new PR. The PR would belong to you, but those existing commits on the branch will belong to @microhuang. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
When the task outputs some strange characters, decoding will lead to "UnicodeDecodeError" error due to inconsistent encoding. We can try-catch it or ignore it by specifying the parameter: errors="backslashreplace" .
e.g.
closes: #20966