Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate Selenium with Prefect? #3669

Closed
marvin-robot opened this issue Nov 15, 2020 · 0 comments
Closed

How to integrate Selenium with Prefect? #3669

marvin-robot opened this issue Nov 15, 2020 · 0 comments

Comments

@marvin-robot
Copy link
Member

Archived from the Prefect Public Slack Community

le.jimmy91: I'm working on a Selenium web scraping project, everything works fine when running locally - however, when I register the flow with Prefect Cloud, I'm not able to launch a Chrome Driver.

It appears to be a serialization problem since I'm getting a TypeError: cannot pickle '_thread.lock' object .

Has anyone run into a similar problem? Any suggestions would be appreciated!

dylan: Hi <@U018TAB16PR>!

What executor are you using? It sounds like a Task is trying to share an un-pickleable object with another Task. This can happen if you’re trying to achieve some parallelism and also sharing a created Client for something that’s not thread safe.

le.jimmy91: I’m using the default LocalExecutor. I’ll give it another go today with the DaskExecutor.

dylan: hmmm

dylan: Would you be comfortable sharing your flow code here?

sarieddine.marwan: <@UKVFX6N3B> - I have faced this in a slightly different context trying to use attrs(https://www.attrs.org/en/stable/examples.html) constructed classes and then register a flow

flow.py

import attr
from prefect import Flow, task


@attr.s(auto_attribs=True, kw_only=True)
class A:
    size: int


@task
def get_size():
    a = A(size=2)
    return a.size


with Flow("test-flow") as flow:
    get_size()

flow.register("test-flows")

traceback:

Traceback (most recent call last):
  File "attr_flow.py", line 19, in &lt;module&gt;
    flow.register("test-flows")
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1608, in register
    registered_flow = client.register(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/client/client.py", line 734, in register
    serialized_flow = flow.serialize(build=build)  # type: Any
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1451, in serialize
    self.storage.add_flow(self)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/environments/storage/local.py", line 140, in add_flow
    flow_location = flow.save(flow_location)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1520, in save
    cloudpickle.dump(self, f)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 55, in dump
    CloudPickler(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread._local' object

sarieddine.marwan: given attrs is quite the popular library when it comes to building classes - this is a bit of disappointment to be honest

sarieddine.marwan: but I realize this has to do with the choice of cloudpickle - and cloudpickle not being to pickle attrs constructed classes
cloudpipe/cloudpickle#320

dylan: Using Local storage and the stored_as_script=True option might solve this issue

dylan: https://docs.prefect.io/api/latest/environments/storage.html#local

dylan: It bypasses almost all of the pickle logic

sarieddine.marwan: thanks for the tip - my current workaround is to place the class in a utils file and add it to a custom dockerfile using Docker or S3 storage - this way it doesn't have to be pickled ...

le.jimmy91: <@UKVFX6N3B> You genius you. It worked like a charm.

f.storage = Local(path="path/to/flow.py", stored_as_script=True)

Registered my flow and ran my agent afterwards.

Screenshot of victory attached.

dylan: Awesome! Glad I could help 😄

felix.vemmer: <@UKVFX6N3B> first of all thank you so much for helping us! 🙂 Thanks to your help, I was able to resolve my selenium issues, with using stored_as_script=True.

However, if I have for example a authenticate.py module where If I have some basic common selenium tasks such as:
• create_driver -> returns selenium driver
• login_xyz -> returns selenium driver in logged in state
And I want to import those into another script I get again:

TypeError: cannot pickle '_thread.lock' object

So as long as they are all in one file it’s fine but importing reintroduces the old issue. Any idea on some fix, otherwise I’ll just pile all the code together into one big flow 🙂

Thanks!

chris: <@ULVA73B9P> archive “How to integrate Selenium with Prefect?”

Original thread can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant