You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
le.jimmy91: I'm working on a Selenium web scraping project, everything works fine when running locally - however, when I register the flow with Prefect Cloud, I'm not able to launch a Chrome Driver.
It appears to be a serialization problem since I'm getting a TypeError: cannot pickle '_thread.lock' object .
Has anyone run into a similar problem? Any suggestions would be appreciated!
dylan: Hi <@U018TAB16PR>!
What executor are you using? It sounds like a Task is trying to share an un-pickleable object with another Task. This can happen if you’re trying to achieve some parallelism and also sharing a created Client for something that’s not thread safe.
le.jimmy91: I’m using the default LocalExecutor. I’ll give it another go today with the DaskExecutor.
dylan: hmmm
dylan: Would you be comfortable sharing your flow code here?
sarieddine.marwan: <@UKVFX6N3B> - I have faced this in a slightly different context trying to use attrs(https://www.attrs.org/en/stable/examples.html) constructed classes and then register a flow
flow.py
import attr
from prefect import Flow, task
@attr.s(auto_attribs=True, kw_only=True)
class A:
size: int
@task
def get_size():
a = A(size=2)
return a.size
with Flow("test-flow") as flow:
get_size()
flow.register("test-flows")
traceback:
Traceback (most recent call last):
File "attr_flow.py", line 19, in <module>
flow.register("test-flows")
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1608, in register
registered_flow = client.register(
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/client/client.py", line 734, in register
serialized_flow = flow.serialize(build=build) # type: Any
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1451, in serialize
self.storage.add_flow(self)
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/environments/storage/local.py", line 140, in add_flow
flow_location = flow.save(flow_location)
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1520, in save
cloudpickle.dump(self, f)
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 55, in dump
CloudPickler(
File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread._local' object
sarieddine.marwan: given attrs is quite the popular library when it comes to building classes - this is a bit of disappointment to be honest
sarieddine.marwan: but I realize this has to do with the choice of cloudpickle - and cloudpickle not being to pickle attrs constructed classes cloudpipe/cloudpickle#320
dylan: Using Local storage and the stored_as_script=True option might solve this issue
sarieddine.marwan: thanks for the tip - my current workaround is to place the class in a utils file and add it to a custom dockerfile using Docker or S3 storage - this way it doesn't have to be pickled ...
le.jimmy91: <@UKVFX6N3B> You genius you. It worked like a charm.
felix.vemmer: <@UKVFX6N3B> first of all thank you so much for helping us! 🙂 Thanks to your help, I was able to resolve my selenium issues, with using stored_as_script=True.
However, if I have for example a authenticate.py module where If I have some basic common selenium tasks such as:
• create_driver -> returns selenium driver
• login_xyz -> returns selenium driver in logged in state
And I want to import those into another script I get again:
TypeError: cannot pickle '_thread.lock' object
So as long as they are all in one file it’s fine but importing reintroduces the old issue. Any idea on some fix, otherwise I’ll just pile all the code together into one big flow 🙂
Thanks!
chris: <@ULVA73B9P> archive “How to integrate Selenium with Prefect?”
Archived from the Prefect Public Slack Community
le.jimmy91: I'm working on a Selenium web scraping project, everything works fine when running locally - however, when I register the flow with Prefect Cloud, I'm not able to launch a Chrome Driver.
It appears to be a serialization problem since I'm getting a
TypeError: cannot pickle '_thread.lock' object
.Has anyone run into a similar problem? Any suggestions would be appreciated!
dylan: Hi <@U018TAB16PR>!
What executor are you using? It sounds like a Task is trying to share an un-pickleable object with another Task. This can happen if you’re trying to achieve some parallelism and also sharing a created
Client
for something that’s not thread safe.le.jimmy91: I’m using the default LocalExecutor. I’ll give it another go today with the DaskExecutor.
dylan: hmmm
dylan: Would you be comfortable sharing your flow code here?
sarieddine.marwan: <@UKVFX6N3B> - I have faced this in a slightly different context trying to use attrs(https://www.attrs.org/en/stable/examples.html) constructed classes and then register a flow
flow.py
traceback:
sarieddine.marwan: given attrs is quite the popular library when it comes to building classes - this is a bit of disappointment to be honest
sarieddine.marwan: but I realize this has to do with the choice of cloudpickle - and cloudpickle not being to pickle attrs constructed classes
cloudpipe/cloudpickle#320
dylan: Using Local storage and the
stored_as_script=True
option might solve this issuedylan: https://docs.prefect.io/api/latest/environments/storage.html#local
dylan: It bypasses almost all of the pickle logic
sarieddine.marwan: thanks for the tip - my current workaround is to place the class in a utils file and add it to a custom dockerfile using Docker or S3 storage - this way it doesn't have to be pickled ...
le.jimmy91: <@UKVFX6N3B> You genius you. It worked like a charm.
Registered my flow and ran my agent afterwards.
Screenshot of victory attached.
dylan: Awesome! Glad I could help 😄
felix.vemmer: <@UKVFX6N3B> first of all thank you so much for helping us! 🙂 Thanks to your help, I was able to resolve my selenium issues, with using
stored_as_script=True
.However, if I have for example a
authenticate.py module
where If I have some basic common selenium tasks such as:• create_driver -> returns selenium driver
• login_xyz -> returns selenium driver in logged in state
And I want to import those into another script I get again:
TypeError: cannot pickle '_thread.lock' object
So as long as they are all in one file it’s fine but importing reintroduces the old issue. Any idea on some fix, otherwise I’ll just pile all the code together into one big flow 🙂
Thanks!
chris: <@ULVA73B9P> archive “How to integrate Selenium with Prefect?”
Original thread can be found here.
The text was updated successfully, but these errors were encountered: