AttributeError: 'NoneType' object has no attribute 'find_all' #154

ryanelittle · 2021-10-06T21:36:31Z

I have been using Court Scraper to scrape OSCN. Counties that do not use DailyFilings will not return a list of case numbers when searching for all case numbers in a given year (start_date = 20TK-1-1, end_date = 20TK-12-31).

---> 16         self.results = self.site.search_by_date(
     17             start_date=self.start_date,
     18             end_date=self.end_date

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\site.py in search_by_date(self, start_date, end_date, case_details)
     80         if not start_date:
     81             start_date, end_date = self.current_day, self.current_day
---> 82         results = search_obj.search(start_date, end_date, case_details=case_details)
     83         return results

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search.py in search(self, start_date, end_date, extra_params, case_details)
     44             # Merge any additional query parameters
     45             search_params.update(extra_params)
---> 46             html, basic_case_data = self._run_search(search_params)
     47             # Skip if there were no results for date
     48             if not basic_case_data:

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search.py in _run_search(self, search_params)
     76         html = response.text
     77         page = SearchResultsPage(self.place_id, html)
---> 78         return html, page.results
     79 
     80     @property

c:\users\rlitt\code\my-packages\court-scraper\court_scraper\platforms\oscn\pages\search_results.py in results(self)
     22         results = {}
     23         # Only grab result rows (i.e. skip header)
---> 24         for row in self.soup.table.find_all('tr', class_='resultTableRow'):
     25             case_id_cell, filing_date, case_name, found_party = row.find_all('td')
     26             case_id = case_id_cell.a.text.strip()

AttributeError: 'NoneType' object has no attribute 'find_all'

Looking in the code, I found this note: "Always limit query to a single filing date, to minimize chances of truncate results." I did not expect this behavior based on the documentation. Could the code be changed to behave in the same way as DailyFilings? I.E. When provided a date range, Search searches each date and provides results for a large range?

The text was updated successfully, but these errors were encountered:

ryanelittle · 2021-10-07T14:51:28Z

The error persists even when supplying single dates.

zstumgoren · 2021-10-07T15:52:39Z

@ryanelittle Can you share the code or CLI command that is triggering the error?

ryanelittle · 2021-10-07T15:56:58Z

I am using Site.search_by_date in a custom class. This is my function:

    def get_case_numbers(self, county, start_date, end_date):
        self.county = county
        self.start_date = start_date
        self.end_date = end_date
        self.site = Site(self.county)
        self.results = self.site.search_by_date(
            start_date=self.start_date,
            end_date=self.end_date
        )
        self.case_numbers = []
        for self.result in self.results:
            self.case_numbers.append(self.result.number)

zstumgoren · 2021-10-07T16:01:12Z

@ryanelittle Great. Can you also provide the date ranges you're using? Sounds like it may generally be broken, but I wouldn't mind trying to test with the exact parameters you've tried so far.

zstumgoren · 2021-10-07T16:02:50Z

@ryanelittle oh, also if you could supply the value stored in self.county, that'll let my replicate your test

ryanelittle · 2021-10-07T16:04:05Z

I tried a few. None of them worked. Just tried 'ok_atoka', '2020-03-01', '2020-03-01', did not work.

zstumgoren · 2021-10-07T16:38:47Z

@ryanelittle The bug appears to be due to the OSCN site now rejecting web requests with the default Python User-Agent supplied by the requests library. This must be new(ish) behavior, since the code was working a few months back when we created it. Anyhow, the site now treats such requests as unauthorized and returns a 403 error page, which does not contain the expected elements and therefore triggers the error we're seeing at the BeautifulSoup layer.

Providing a realistic User-Agent header appears to fix the problem. Updating the code in search.py to pass in a User-request that mimics a more realistic browser specs should fix the issue.

In the short term, if you need to press forward on your project, I would just fork and hard-code a User-Agent.

ryanelittle · 2021-10-07T16:41:40Z

Thank you for the fix @zstumgoren.

zstumgoren · 2021-10-07T16:53:00Z

@ryanelittle Sure thing. We'll try to ship a proper release to PyPI containing the bug fix in the near future. We'll leave this ticket open until then. Meantime, thanks for bringing it to our attention!

ryanelittle · 2021-10-07T18:45:21Z

@zstumgoren I've used fake-useragent (https://pypi.org/project/fake-useragent/) to randomize my useragents in the past. It might be a good solution so court scraper doesn't have the same header for everyone who uses it.

zstumgoren added the bug Something isn't working label Oct 7, 2021

zstumgoren added this to the v0.1.2 milestone Oct 7, 2021

ryanelittle added a commit that referenced this issue Oct 8, 2021

implement fake_useragent headers to solve #154

0b4a6b3

ryanelittle mentioned this issue Oct 8, 2021

implement fake_useragent headers to solve #154 #155

Open

palewire mentioned this issue Dec 13, 2022

Oklahoma's search_by_date method is broken for counties without a dedicated latest filings page #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NoneType' object has no attribute 'find_all' #154

AttributeError: 'NoneType' object has no attribute 'find_all' #154

ryanelittle commented Oct 6, 2021 •

edited

Loading

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

AttributeError: 'NoneType' object has no attribute 'find_all' #154

AttributeError: 'NoneType' object has no attribute 'find_all' #154

Comments

ryanelittle commented Oct 6, 2021 • edited Loading

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

zstumgoren commented Oct 7, 2021

ryanelittle commented Oct 7, 2021

ryanelittle commented Oct 6, 2021 •

edited

Loading