Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'find_all' #154

Open
ryanelittle opened this issue Oct 6, 2021 · 10 comments
Open

AttributeError: 'NoneType' object has no attribute 'find_all' #154

ryanelittle opened this issue Oct 6, 2021 · 10 comments
Labels
bug Something isn't working
Milestone

Comments

@ryanelittle
Copy link
Contributor

ryanelittle commented Oct 6, 2021

I have been using Court Scraper to scrape OSCN. Counties that do not use DailyFilings will not return a list of case numbers when searching for all case numbers in a given year (start_date = 20TK-1-1, end_date = 20TK-12-31).

---> 16         self.results = self.site.search_by_date(
     17             start_date=self.start_date,
     18             end_date=self.end_date

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\site.py in search_by_date(self, start_date, end_date, case_details)
     80         if not start_date:
     81             start_date, end_date = self.current_day, self.current_day
---> 82         results = search_obj.search(start_date, end_date, case_details=case_details)
     83         return results

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search.py in search(self, start_date, end_date, extra_params, case_details)
     44             # Merge any additional query parameters
     45             search_params.update(extra_params)
---> 46             html, basic_case_data = self._run_search(search_params)
     47             # Skip if there were no results for date
     48             if not basic_case_data:

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search.py in _run_search(self, search_params)
     76         html = response.text
     77         page = SearchResultsPage(self.place_id, html)
---> 78         return html, page.results
     79 
     80     @property

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search_results.py in results(self)
     22         results = {}
     23         # Only grab result rows (i.e. skip header)
---> 24         for row in self.soup.table.find_all('tr', class_='resultTableRow'):
     25             case_id_cell, filing_date, case_name, found_party = row.find_all('td')
     26             case_id = case_id_cell.a.text.strip()

AttributeError: 'NoneType' object has no attribute 'find_all'

Looking in the code, I found this note: "Always limit query to a single filing date, to minimize chances of truncate results." I did not expect this behavior based on the documentation. Could the code be changed to behave in the same way as DailyFilings? I.E. When provided a date range, Search searches each date and provides results for a large range?

@ryanelittle
Copy link
Contributor Author

The error persists even when supplying single dates.

@zstumgoren
Copy link
Member

@ryanelittle Can you share the code or CLI command that is triggering the error?

@ryanelittle
Copy link
Contributor Author

I am using Site.search_by_date in a custom class. This is my function:

    def get_case_numbers(self, county, start_date, end_date):
        self.county = county
        self.start_date = start_date
        self.end_date = end_date
        self.site = Site(self.county)
        self.results = self.site.search_by_date(
            start_date=self.start_date,
            end_date=self.end_date
        )
        self.case_numbers = []
        for self.result in self.results:
            self.case_numbers.append(self.result.number)

@zstumgoren
Copy link
Member

@ryanelittle Great. Can you also provide the date ranges you're using? Sounds like it may generally be broken, but I wouldn't mind trying to test with the exact parameters you've tried so far.

@zstumgoren
Copy link
Member

@ryanelittle oh, also if you could supply the value stored in self.county, that'll let my replicate your test

@ryanelittle
Copy link
Contributor Author

I tried a few. None of them worked. Just tried 'ok_atoka', '2020-03-01', '2020-03-01', did not work.

@zstumgoren
Copy link
Member

@ryanelittle The bug appears to be due to the OSCN site now rejecting web requests with the default Python User-Agent supplied by the requests library. This must be new(ish) behavior, since the code was working a few months back when we created it. Anyhow, the site now treats such requests as unauthorized and returns a 403 error page, which does not contain the expected elements and therefore triggers the error we're seeing at the BeautifulSoup layer.

Providing a realistic User-Agent header appears to fix the problem. Updating the code in search.py to pass in a User-request that mimics a more realistic browser specs should fix the issue.

In the short term, if you need to press forward on your project, I would just fork and hard-code a User-Agent.

@ryanelittle
Copy link
Contributor Author

Thank you for the fix @zstumgoren.

@zstumgoren zstumgoren added the bug Something isn't working label Oct 7, 2021
@zstumgoren zstumgoren added this to the v0.1.2 milestone Oct 7, 2021
@zstumgoren
Copy link
Member

@ryanelittle Sure thing. We'll try to ship a proper release to PyPI containing the bug fix in the near future. We'll leave this ticket open until then. Meantime, thanks for bringing it to our attention!

@ryanelittle
Copy link
Contributor Author

@zstumgoren I've used fake-useragent (https://pypi.org/project/fake-useragent/) to randomize my useragents in the past. It might be a good solution so court scraper doesn't have the same header for everyone who uses it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants