Skip to content

Commit

Permalink
Fix read_html_live example
Browse files Browse the repository at this point in the history
I checked the read_html_live example and saw that the css selectors changed and a cookie consent banner was added.

This PR is to changed the read_html_live() example, so it can reject cookies and extract organizations with the new page version. Scroll was needed to force the JSON file download.

I used |>, but I can change my PR to %>% if required.
  • Loading branch information
jrosell authored Oct 17, 2024
1 parent c9be5b8 commit 425088e
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions R/live.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,21 @@
#' # When we retrieve the raw HTML for this site, it doesn't contain the
#' # data we're interested in:
#' static <- read_html("https://www.forbes.com/top-colleges/")
#' static %>% html_elements(".TopColleges2023_tableRow__BYOSU")
#' static |> html_elements(".ListTable_listTable__-N5U5")
#'
#' # Instead, we need to run the site in a real web browser, causing it to
#' # download a JSON file and then dynamically generate the html:
#'
#' sess <- read_html_live("https://www.forbes.com/top-colleges/")
#' sess$view()
#' rows <- sess %>% html_elements(".TopColleges2023_tableRow__BYOSU")
#' rows %>% html_element(".TopColleges2023_organizationName__J1lEV") %>% html_text()
#' rows %>% html_element(".grant-aid") %>% html_text()
#' sess$scroll_into_view("#top-colleges")
#' cookies_seen <- length(html_elements(sess, "button[aria-label='Reject All']"))
#' if (cookies_seen) {
#' sess$click("button[aria-label='Accept All']")
#' }
#' rows <- sess |> html_elements("#top-colleges .ListTable_listTable__-N5U5")
#' rows |>
#' html_elements("#top-colleges tbody tr td:nth-of-type(2)") |>
#' html_text()
#' }
read_html_live <- function(url) {
check_installed(c("chromote", "R6"))
Expand Down

0 comments on commit 425088e

Please sign in to comment.