Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Exception when parsing log #20966 #21053

Closed
wants to merge 11 commits into from
Closed

Conversation

microhuang
Copy link
Contributor

@microhuang microhuang commented Jan 24, 2022

When the task outputs some strange characters, decoding will lead to "UnicodeDecodeError" error due to inconsistent encoding. We can try-catch it or ignore it by specifying the parameter: errors="backslashreplace" .

e.g.

            line = ''
            for raw_line in iter(self.sub_process.stdout.readline, b''):
                try:
                    line = raw_line.decode(output_encoding).rstrip()
                except UnicodeDecodeError as err:
                    print(err, output_encoding, raw_line)
                self.log.info("%s", line)

closes: #20966

…X: invalid start byte

  File "/opt/work/python395/lib/python3.9/site-packages/airflow/hooks/subprocess.py", line 89, in run_command
    line = raw_line.decode(output_encoding).rstrip()            # raw_line ==  b'\x00\x00\x00\x11\xa9\x01\n'
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 4: invalid start byte
 Another alternative is: try-catch it. 

e.g.

```
            line = ''
            for raw_line in iter(self.sub_process.stdout.readline, b''):
                try:
                    line = raw_line.decode(output_encoding).rstrip()
                except UnicodeDecodeError as err:
                    print(err, output_encoding, raw_line)
                self.log.info("%s", line)
```
@boring-cyborg boring-cyborg bot added area:core-operators Operators, Sensors and hooks within Core Airflow provider:cncf-kubernetes Kubernetes provider related issues area:providers labels Jan 24, 2022
@uranusjr
Copy link
Member

Would it be possible to add a test for this?

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jan 24, 2022
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@microhuang
Copy link
Contributor Author

Would it be possible to add a test for this?

Okay, I'll add the test code later.

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

Static checks need fixing - I recmmend installing pre-commit and commit --amend - you will get them fixed for you automatically.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 24, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@eladkal
Copy link
Contributor

eladkal commented Jan 28, 2022

Tests is failing:

airflow/providers/cncf/kubernetes/utils/pod_manager.py:202: error: Incompatible
types in assignment (expression has type "str", variable has type "bytes") 
[assignment]
                        line = line.decode('utf-8', errors="backslashrepla...
                               ^
airflow/providers/cncf/kubernetes/utils/pod_manager.py:203: error: Argument 1
to "parse_log_line" of "PodManager" has incompatible type "bytes"; expected
"str"  [arg-type]
                        timestamp, message = self.parse_log_line(line)
                                                                 ^

Comment on lines 201 to 204
for line in logs:
timestamp, message = self.parse_log_line(line.decode('utf-8'))
line = line.decode('utf-8', errors="backslashreplace")
timestamp, message = self.parse_log_line(line)
self.log.info(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                for raw_line in logs:
                    line = raw_line.decode('utf-8', errors="backslashreplace")
                    timestamp, message = self.parse_log_line(line)

@eladkal eladkal added this to the Airflow 2.2.4 milestone Feb 1, 2022
@eladkal
Copy link
Contributor

eladkal commented Feb 14, 2022

@microhuang can you please fix the test?

@eladkal
Copy link
Contributor

eladkal commented Mar 17, 2022

@microhuang do you wish to finish the work on this PR?

@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Mar 22, 2022
@ephraimbuddy ephraimbuddy removed this from the Airflow 2.2.5 milestone Mar 22, 2022
@qazi0
Copy link

qazi0 commented Apr 7, 2022

@microhuang where you @?

@microhuang do you wish to finish the work on this PR?

@jakubno
Copy link
Contributor

jakubno commented Apr 20, 2022

Only formatting isn't right. It's almost 3 months without any activity. Is it okay if someone else takes over it?

@eladkal
Copy link
Contributor

eladkal commented Apr 20, 2022

Only formatting isn't right. It's almost 3 months without any activity. Is it okay if someone else takes over it?

If @microhuang doesn't wish to finish this PR, you can take over and open a new PR

@uranusjr
Copy link
Member

You could also check out the branch, commit onto it, and open a new PR. The PR would belong to you, but those existing commits on the branch will belong to @microhuang.

@github-actions
Copy link

github-actions bot commented Jun 5, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 5, 2022
@github-actions github-actions bot closed this Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow area:providers full tests needed We need to run full set of tests for this PR to merge okay to merge It's ok to merge this PR as it does not require more tests provider:cncf-kubernetes Kubernetes provider related issues stale Stale PRs per the .github/workflows/stale.yml policy file type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception when parsing log
8 participants