Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using js_code and wait_for together is broken in 0.4.22 #350

Open
Udbhav8 opened this issue Dec 15, 2024 · 5 comments
Open

using js_code and wait_for together is broken in 0.4.22 #350

Udbhav8 opened this issue Dec 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Udbhav8
Copy link

Udbhav8 commented Dec 15, 2024

If i pass in any js_code in the crawler it returns this error
Screenshot 2024-12-15 at 2 36 33 PM

i have also explained the issue here

I think commit 0982c63 broke this
probably just need a null check for response in here, i fixed it right now with manually copying this file with the null check into my docker build

@unclecode
Copy link
Owner

@Udbhav8 Can you share the code snippet and URL? I can't replicate this error. Please share those with me, and I will see what is causing that. Right npw the following code works well:

async def main():
    # Configure the browser settings
    browser_config = BrowserConfig()

    # Set run configurations, including cache mode and markdown generator
    crawl_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        page_timeout=60000,
        js_code="(()=> {console.log('hi');})()",
        log_console=True,
    )

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url='https://kidocode.com/',
            config=crawl_config
        )

        if result.success:
            print("Raw Markdown Length:", len(result.markdown_v2.raw_markdown))
            print("Citations Markdown Length:", len(result.markdown_v2.markdown_with_citations))

if __name__ == "__main__":
    asyncio.run(main())

You can check this in Colab here https://colab.research.google.com/drive/1Ge5GvHwwAgM9LtIhjjJIcLGx8VXEKq2V?usp=sharing

@unclecode unclecode self-assigned this Dec 16, 2024
@unclecode unclecode added the bug Something isn't working label Dec 16, 2024
@Udbhav8 Udbhav8 closed this as completed Dec 17, 2024
@Udbhav8 Udbhav8 reopened this Dec 17, 2024
@Udbhav8
Copy link
Author

Udbhav8 commented Dec 17, 2024

 self.crawler_args = {
            "headless": True,
            "remove_overlay_elements": True,
            "verbose": True,
            "always_bypass_cache": True,
            "bypass_cache": True,
            "light_mode": True,
            "user_agent_mode": "random",
            "user_agent_generator_config": {
                "device_type": "mobile",
                "os_type": "android",
            },
        }
        js_code = """
        // Function to check if next page exists and click it
        const nextButton = document.querySelector('kendo-pager-next-buttons span[title="Go to the next page"]');
        console.log('Next button found:', nextButton);
        if (nextButton) {
            nextButton.click();
            console.log('Clicked next button');
        } else {
            console.log('No next button found - might be on last page');
        }
        """

wait_condition = """() => {
            // Then check if document is ready and navigation is complete
            if (document.readyState !== 'complete') {
                console.log('Document not ready yet:', document.readyState);
                return false;
            }

            // Then check for job cells
            const jobCells = document.querySelectorAll('td[kendogridcell] a[href*="/vendor/jobs/details/"]');
            console.log('Number of job cells found:', jobCells.length);
            return jobCells.length > 0;
        }"""
 result = await crawler.arun(
                        session_id=session_id,
                        url="https://app.lotusone.com/#/vendor/jobs",
                        js_code=js_code,
                        wait_for=f"js:{wait_condition}",
                        log_console=True,


                    )

and this is the logs it prints
Screenshot 2024-12-16 at 7 07 07 PM

its a page with login so I will also have to give you the cookies for it - could you suggest me a time i can send it to you so it doesn't expire and somewhere to send it to you?

I can also confirm changing the code in async_crawler_strategy.py to this worked for me but now i have to do these changed in my dockerfile for everything to work as expected

                await self.execute_hook("before_goto", page, context=context)

                try:
                    response = await page.goto(
                        url,
                        wait_until=config.wait_until,
                        timeout=config.page_timeout,
                    )
                except Error as e:
                    raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{e!s}")

                await self.execute_hook("after_goto", page, context=context)
                if response:
                    status_code = response.status
                    response_headers = response.headers
                else:
                    status_code = 200
                    response_headers = {}
            else:
                status_code = 200
                response_headers = {}

@Udbhav8 Udbhav8 changed the title using js_code is broken in 0.4.22 using js_code and wait_for together is broken in 0.4.22 Dec 17, 2024
@unclecode
Copy link
Owner

@Udbhav8 Please try to send me a message by Thursday, 19 December, at 2 p.m. Singapore time. Maybe you can create an entry in the calendar using my email address, and then we can align and communicate together ([email protected]). Besides this, I also suggest that you try to manage the browser, especially for your case. I am providing you with two links to other issues where I gave very detailed answers, and I believe that will help you a lot. Finally, I really want to continue addressing this error. I want to know the situations in which the response is a non-type; that is interesting to me. Before I use an if and else statement to manage it, I need to know when that happens.

#341 (comment)
#341 (comment)

@Udbhav8
Copy link
Author

Udbhav8 commented Dec 19, 2024

@Udbhav8 Please try to send me a message by Thursday, 19 December, at 2 p.m. Singapore time. Maybe you can create an entry in the calendar using my email address, and then we can align and communicate together ([email protected]). Besides this, I also suggest that you try to manage the browser, especially for your case. I am providing you with two links to other issues where I gave very detailed answers, and I believe that will help you a lot. Finally, I really want to continue addressing this error. I want to know the situations in which the response is a non-type; that is interesting to me. Before I use an if and else statement to manage it, I need to know when that happens.

#341 (comment) #341 (comment)

Perfect I have sent you a meeting invite for exactly that time, I will also send you an email with the storage_state exactly at 2pm so you can look in case you aren't able to join the meet

@Udbhav8
Copy link
Author

Udbhav8 commented Dec 19, 2024

i have sent you an email with the storage_state object from [email protected] @unclecode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants