Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot use a string pattern on a bytes-like object #8

Open
AndrewH-Lab49 opened this issue Jun 11, 2021 · 1 comment
Open

Comments

@AndrewH-Lab49
Copy link

AndrewH-Lab49 commented Jun 11, 2021

using a live connection to my clients workday:

tap-workday-raas | File "/src/streams/workday-s3/.meltano/extractors/tap-workday-raas/venv/lib/python3.8/site-packages/tap_workday_raas/client.py", line 46, in stream_report
tap-workday-raas | coro.send(chunk)
tap-workday-raas | File "/src/streams/workday-s3/.meltano/extractors/tap-workday-raas/venv/lib/python3.8/site-packages/ijson/backends/python.py", line 39, in Lexer
tap-workday-raas | match = LEXEME_RE.search(buf, pos)
tap-workday-raas | TypeError: cannot use a string pattern on a bytes-like object

in client.py on line 46 I replace

coro.send(chunk)

with

coro.send(chunk.decode(resp.encoding))

and I get:

tap-workday-raas | INFO Done syncing.
meltano | Incremental state has been updated at 2021-06-11 12:33:28.720672.
meltano | Extract & load complete!

I am not convinced that this is the best solution. Perhaps using the Content-Type first to get the xml encoding before using requests guess at encoding might be better? The above example was just trying to be helpful.

I am not sure how this impacts the existing unit test?

I had a hard time working with the unit tests without spending too much time. For instance I do not know where tap_tester comes from. It didn't pip install and wasn't part of setup process. I don't believe I have access to the circle docker image, S3....

Please excuse me if I missed something.

@AndrewH-Lab49
Copy link
Author

AndrewH-Lab49 commented Jun 11, 2021

maybe use something like this to get the encoding from the header content type first?

# Get the header as a dictionary and Split the Content-Type string value into a list by '; '
# filter list by 'charset='
# return the first item in the list (the only item)
# strip 'charset='

content_headers_list = resp.headers['Content-Type'].split('; ')
v_encoding_key = 'charset='

try:
    v_encoding = next(filter(lambda x: x.startswith(v_encoding_key), content_headers_list)).lstrip(v_encoding_key)
except:
    v_encoding = resp.encoding

and then

coro.send(chunk.decode(v_encoding))

@AndrewH-Lab49 AndrewH-Lab49 changed the title cannot use a string pattern on a bytes-like object TypeError: cannot use a string pattern on a bytes-like object Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant