You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @b-sai , thx for trying crawl4ai. You can store custom metadata as kwargs when triggering your hooks, and then retrieve or modify them inside the hook callback. For example, you can include something like original_url=... in the execute_hook() call before navigation, and then read or update the final URL in after_goto or on_execution_started hooks. The hooks support arbitrary keyword arguments, so you can pass a dictionary or extra parameters containing the original URL.
For instance:
asyncdefbefore_goto_hook(page, context=None, **kwargs):
# kwargs might contain original_url and session_id etc.original_url=kwargs.get("original_url")
# Store original_url somewhere if needed, or printprint(f"Original URL: {original_url}")
asyncdefafter_goto_hook(page, context=None, **kwargs):
original_url=kwargs.get("original_url")
final_url=page.urlprint(f"Original URL: {original_url}, Final URL: {final_url}")
# You can return these values or store them globallycrawler_strategy.set_hook('before_goto', before_goto_hook)
crawler_strategy.set_hook('after_goto', after_goto_hook)
# When calling your crawl method:awaitcrawler_strategy.execute_hook('before_goto', page, context=context, original_url="http://example.com")
awaitpage.goto("http://example.com")
awaitcrawler_strategy.execute_hook('after_goto', page, context=context, original_url="http://example.com")
By doing this, you are free to pass original_url or any other metadata you need through the execute_hook() calls. Each hook gets the kwargs so you can store and retrieve the needed information across the hooks.
Another approach is using sessions_id. Use session_id to maintain state for each URL:
I have a series of links I am trying to analyze the redirection for. a simple 301 redirect is not working so using playwright
I know I can do page.url to get the final url in a hook, but I need a way to track original URL and final URL for each link i have.
How can I pass/store this metadata in the hooks?
The text was updated successfully, but these errors were encountered: