diff --git a/dev/CODE_OF_CONDUCT.html b/dev/CODE_OF_CONDUCT.html
index 5676888..2263752 100644
--- a/dev/CODE_OF_CONDUCT.html
+++ b/dev/CODE_OF_CONDUCT.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/CONTRIBUTING.html b/dev/CONTRIBUTING.html
index 43727cb..90714cc 100644
--- a/dev/CONTRIBUTING.html
+++ b/dev/CONTRIBUTING.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/LICENSE-text.html b/dev/LICENSE-text.html
index b5de806..1cfbb9a 100644
--- a/dev/LICENSE-text.html
+++ b/dev/LICENSE-text.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/LICENSE.html b/dev/LICENSE.html
index f81f2ca..a7aa9a7 100644
--- a/dev/LICENSE.html
+++ b/dev/LICENSE.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/SUPPORT.html b/dev/SUPPORT.html
index 77315f5..f9a869c 100644
--- a/dev/SUPPORT.html
+++ b/dev/SUPPORT.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/articles/index.html b/dev/articles/index.html
index a56fd1d..e7fa369 100644
--- a/dev/articles/index.html
+++ b/dev/articles/index.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/articles/rvest.html b/dev/articles/rvest.html
index 532649d..be654d9 100644
--- a/dev/articles/rvest.html
+++ b/dev/articles/rvest.html
@@ -36,7 +36,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/articles/selectorgadget.html b/dev/articles/selectorgadget.html
index b025216..58feae1 100644
--- a/dev/articles/selectorgadget.html
+++ b/dev/articles/selectorgadget.html
@@ -36,7 +36,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/articles/starwars.html b/dev/articles/starwars.html
index f619095..b5ac9b7 100644
--- a/dev/articles/starwars.html
+++ b/dev/articles/starwars.html
@@ -36,7 +36,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/authors.html b/dev/authors.html
index 623fad4..eb31655 100644
--- a/dev/authors.html
+++ b/dev/authors.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
@@ -80,13 +80,13 @@ <h2 id="citation">Citation</h2>
 
       <p>Wickham H (2024).
 <em>rvest: Easily Harvest (Scrape) Web Pages</em>.
-R package version 1.0.3.9000, https://github.com/tidyverse/rvest, <a href="https://rvest.tidyverse.org/">https://rvest.tidyverse.org/</a>. 
+R package version 1.0.4.9000, https://github.com/tidyverse/rvest, <a href="https://rvest.tidyverse.org/">https://rvest.tidyverse.org/</a>. 
 </p>
       <pre>@Manual{,
   title = {rvest: Easily Harvest (Scrape) Web Pages},
   author = {Hadley Wickham},
   year = {2024},
-  note = {R package version 1.0.3.9000, https://github.com/tidyverse/rvest},
+  note = {R package version 1.0.4.9000, https://github.com/tidyverse/rvest},
   url = {https://rvest.tidyverse.org/},
 }</pre>
     </div>
diff --git a/dev/index.html b/dev/index.html
index fa390cd..1b7735b 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -38,7 +38,7 @@
     
     <a class="navbar-brand me-2" href="index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/news/index.html b/dev/news/index.html
index 7592b9c..cf76f66 100644
--- a/dev/news/index.html
+++ b/dev/news/index.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
@@ -64,6 +64,9 @@ <h6 class="dropdown-header" data-toc-skip>Releases</h6>
 
     <div class="section level2">
 <h2 class="pkg-version" data-toc-text="development version" id="rvest-development-version">rvest (development version)<a class="anchor" aria-label="anchor" href="#rvest-development-version"></a></h2>
+</div>
+    <div class="section level2">
+<h2 class="pkg-version" data-toc-text="1.0.4" id="rvest-104">rvest 1.0.4<a class="anchor" aria-label="anchor" href="#rvest-104"></a></h2><p class="text-muted">CRAN release: 2024-02-12</p>
 <ul><li><p>New <code><a href="../reference/read_html_live.html">read_html_live()</a></code> reads HTML into a real, live, HTML browser, meaning that you can scrape HTML generated by javascript. It returns a <code>LiveHTML</code> object which you can also use to simulate user interactions with the page, like clicking, typing, and scrolling (<a href="https://github.com/tidyverse/rvest/issues/245" class="external-link">#245</a>).</p></li>
 <li><p><code><a href="../reference/html_table.html">html_table()</a></code> discards rows without cells (<a href="https://github.com/epiben" class="external-link">@epiben</a>, <a href="https://github.com/tidyverse/rvest/issues/360" class="external-link">#360</a>).</p></li>
 </ul></div>
diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml
index f4bb024..8899c11 100644
--- a/dev/pkgdown.yml
+++ b/dev/pkgdown.yml
@@ -5,7 +5,7 @@ articles:
   selectorgadget: selectorgadget.html
   rvest: rvest.html
   starwars: starwars.html
-last_built: 2024-02-12T17:14Z
+last_built: 2024-02-12T22:04Z
 urls:
   reference: https://rvest.tidyverse.org/reference
   article: https://rvest.tidyverse.org/articles
diff --git a/dev/reference/LiveHTML.html b/dev/reference/LiveHTML.html
index 0bfdfe8..2aab46d 100644
--- a/dev/reference/LiveHTML.html
+++ b/dev/reference/LiveHTML.html
@@ -28,7 +28,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/google_form.html b/dev/reference/google_form.html
index 755caf5..d758743 100644
--- a/dev/reference/google_form.html
+++ b/dev/reference/google_form.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_attr.html b/dev/reference/html_attr.html
index 94463a4..02d895d 100644
--- a/dev/reference/html_attr.html
+++ b/dev/reference/html_attr.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_children.html b/dev/reference/html_children.html
index 8d99aad..50fce45 100644
--- a/dev/reference/html_children.html
+++ b/dev/reference/html_children.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_element.html b/dev/reference/html_element.html
index d8c5d90..adaddb1 100644
--- a/dev/reference/html_element.html
+++ b/dev/reference/html_element.html
@@ -16,7 +16,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_encoding_guess.html b/dev/reference/html_encoding_guess.html
index 7c4863c..1b5b5cd 100644
--- a/dev/reference/html_encoding_guess.html
+++ b/dev/reference/html_encoding_guess.html
@@ -16,7 +16,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_form.html b/dev/reference/html_form.html
index 4eaae68..c975cbb 100644
--- a/dev/reference/html_form.html
+++ b/dev/reference/html_form.html
@@ -12,7 +12,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_name.html b/dev/reference/html_name.html
index c64ba45..00cf073 100644
--- a/dev/reference/html_name.html
+++ b/dev/reference/html_name.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_table.html b/dev/reference/html_table.html
index 9bc1ce8..d5e4ee5 100644
--- a/dev/reference/html_table.html
+++ b/dev/reference/html_table.html
@@ -12,7 +12,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/html_text.html b/dev/reference/html_text.html
index ddbd47b..6d80d1e 100644
--- a/dev/reference/html_text.html
+++ b/dev/reference/html_text.html
@@ -28,7 +28,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/index.html b/dev/reference/index.html
index 8b580c2..fbc32d7 100644
--- a/dev/reference/index.html
+++ b/dev/reference/index.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/minimal_html.html b/dev/reference/minimal_html.html
index 02863e4..6f2444d 100644
--- a/dev/reference/minimal_html.html
+++ b/dev/reference/minimal_html.html
@@ -10,7 +10,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/read_html.html b/dev/reference/read_html.html
index cbd4f7d..dad4ac5 100644
--- a/dev/reference/read_html.html
+++ b/dev/reference/read_html.html
@@ -24,7 +24,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/read_html_live.html b/dev/reference/read_html_live.html
index f182c6f..dad1d0b 100644
--- a/dev/reference/read_html_live.html
+++ b/dev/reference/read_html_live.html
@@ -32,7 +32,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/reexports.html b/dev/reference/reexports.html
index f7753d8..010e21f 100644
--- a/dev/reference/reexports.html
+++ b/dev/reference/reexports.html
@@ -32,7 +32,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/rename.html b/dev/reference/rename.html
index 082d2f0..4e8ece1 100644
--- a/dev/reference/rename.html
+++ b/dev/reference/rename.html
@@ -50,7 +50,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/repair_encoding.html b/dev/reference/repair_encoding.html
index 1b55296..340216d 100644
--- a/dev/reference/repair_encoding.html
+++ b/dev/reference/repair_encoding.html
@@ -14,7 +14,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/rvest-package.html b/dev/reference/rvest-package.html
index 0981f4a..b1ed5dc 100644
--- a/dev/reference/rvest-package.html
+++ b/dev/reference/rvest-package.html
@@ -12,7 +12,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/reference/session.html b/dev/reference/session.html
index 2f813ac..d9e8be7 100644
--- a/dev/reference/session.html
+++ b/dev/reference/session.html
@@ -36,7 +36,7 @@
     
     <a class="navbar-brand me-2" href="../index.html">rvest</a>
 
-    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.3.9000</small>
+    <small class="nav-text text-danger me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="In-development version">1.0.4.9000</small>
 
     
     <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
diff --git a/dev/search.json b/dev/search.json
index 8933d36..47d0de5 100644
--- a/dev/search.json
+++ b/dev/search.json
@@ -1 +1 @@
-[{"path":[]},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to rvest","title":"Contributing to rvest","text":"outlines propose change rvest. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to rvest","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to rvest","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to rvest","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"tidyverse/rvest\", fork = TRUE). Install development dependences devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to rvest","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to rvest","text":"Please note rvest project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://rvest.tidyverse.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 rvest authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":null,"dir":"","previous_headings":"","what":"Getting help with rvest","title":"Getting help with rvest","text":"Thanks using rvest! filing issue, places explore pieces put together make process smooth possible.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"make-a-reprex","dir":"","previous_headings":"","what":"Make a reprex","title":"Getting help with rvest","text":"Start making minimal reproducible example using reprex package. haven’t heard used reprex , ’re treat! Seriously, reprex make R-question-asking endeavors easier (pretty insane ROI five ten minutes ’ll take learn ’s ). additional reprex pointers, check Get help! section tidyverse site.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"where-to-ask","dir":"","previous_headings":"","what":"Where to ask?","title":"Getting help with rvest","text":"Armed reprex, next step figure ask. ’s question: start community.rstudio.com, /StackOverflow. people answer questions. ’s bug: ’re right place, file issue. ’re sure: let community help figure ! problem bug feature request, can easily return report . opening new issue, sure search issues pull requests make sure bug hasn’t reported /already fixed development version. default, search pre-populated :issue :open. can edit qualifiers (e.g. :pr, :closed) needed. example, ’d simply remove :open search issues repo, open closed.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"what-happens-next","dir":"","previous_headings":"","what":"What happens next?","title":"Getting help with rvest","text":"efficient possible, development tidyverse packages tends bursty, shouldn’t worry don’t get immediate response. Typically don’t look repo sufficient quantity issues accumulates, ’s burst intense activity focus efforts. makes development efficient avoids expensive context switching problems, cost taking longer get back . process makes good reprex particularly important might multiple months initial report start working . can’t reproduce bug, can’t fix !","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"html-basics","dir":"Articles","previous_headings":"","what":"HTML basics","title":"Web scraping 101","text":"HTML stands “HyperText Markup Language” looks like : HTML hierarchical structure formed elements consist start tag (e.g. <tag>), optional attributes (id='first'), end tag1 (like <\/tag>), contents (everything start end tag). Since < > used start end tags, can’t write directly. Instead use HTML escapes &gt; (greater ) &lt; (less ). since escapes use &, want literal ampersand escape &amp;. wide range possible HTML escapes don’t need worry much rvest automatically handles .","code":"<html> <head>   <title>Page title<\/title> <\/head> <body>   <h1 id='first'>A heading<\/h1>   <p>Some text &amp; <b>some bold text.<\/b><\/p>   <img src='myimg.png' width='100' height='100'> <\/body>"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"elements","dir":"Articles","previous_headings":"HTML basics","what":"Elements","title":"Web scraping 101","text":", 100 HTML elements. important : Every HTML page must <html> element, must two children: <head>, contains document metadata like page title, <body>, contains content see browser. Block tags like <h1> (heading 1), <p> (paragraph), <ol> (ordered list) form overall structure page. Inline tags like <b> (bold), <> (italics), <> (links) formats text inside block tags. encounter tag ’ve never seen , can find little googling. recommend MDN Web Docs produced Mozilla, company makes Firefox web browser.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"contents","dir":"Articles","previous_headings":"HTML basics","what":"Contents","title":"Web scraping 101","text":"elements can content start end tags. content can either text elements. example, following HTML contains paragraph text, one word bold. Hi! name Hadley. children node refers elements, <p> element one child, <b> element. <b> element children, contents (text “name”). elements, like <img> can’t children. elements depend solely attributes behavior.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"attributes","dir":"Articles","previous_headings":"HTML basics","what":"Attributes","title":"Web scraping 101","text":"Tags can named attributes look like name1='value1' name2='value2'. Two important attributes id class, used conjunction CSS (Cascading Style Sheets) control visual appearance page. often useful scraping data page.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"reading-html-with-rvest","dir":"Articles","previous_headings":"","what":"Reading HTML with rvest","title":"Web scraping 101","text":"’ll usually start scraping process read_html(). returns xml_document2 object ’ll manipulate using rvest functions: examples experimentation, rvest also includes function lets create xml_document literal HTML: Regardless get HTML, ’ll need way identify elements contain data care . rvest provides two options: CSS selectors XPath expressions. ’ll focus CSS selectors ’re simpler still sufficiently powerful scraping tasks.","code":"html <- read_html(\"http://rvest.tidyverse.org/\") class(html) #> [1] \"xml_document\" \"xml_node\" html <- minimal_html(\"   <p>This is a paragraph<p>   <ul>     <li>This is a bulleted list<\/li>   <\/ul> \") html #> {html_document} #> <html> #> [1] <head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset ... #> [2] <body>\\n<p>This is a paragraph<\/p>\\n<p>\\n  <\/p>\\n<ul>\\n<li>This is  ..."},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"css-selectors","dir":"Articles","previous_headings":"","what":"CSS selectors","title":"Web scraping 101","text":"CSS short cascading style sheets, tool defining visual styling HTML documents. CSS includes miniature language selecting elements page called CSS selectors. CSS selectors define patterns locating HTML elements, useful scraping provide concise way describing elements want extract. CSS selectors can quite complex, fortunately need simplest rvest, can also write R code complicated situations. four important selectors : p: selects <p> elements. .title: selects elements class “title”. p.special: selects <p> elements class “special”. #title: selects element id attribute equals “title”. Id attributes must unique within document, ever select single element. want learn CSS selectors recommend starting fun CSS dinner tutorial referring MDN web docs. Lets try important selectors simple example: rvest can extract single element html_element() matching elements html_elements(). functions take document3 css selector: Selectors can also combined various ways using combinators. example,important combinator ” “, descendant combination, p selects <> elements child <p> element. don’t know exactly selector need, highly recommend using SelectorGadget, lets automatically generate selector need supplying positive negative examples browser.","code":"html <- minimal_html(\"   <h1>This is a heading<\/h1>   <p id='first'>This is a paragraph<\/p>   <p class='important'>This is an important paragraph<\/p> \") html %>% html_element(\"h1\") #> {html_node} #> <h1> html %>% html_elements(\"p\") #> {xml_nodeset (2)} #> [1] <p id=\"first\">This is a paragraph<\/p> #> [2] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\".important\") #> {xml_nodeset (1)} #> [1] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\"#first\") #> {xml_nodeset (1)} #> [1] <p id=\"first\">This is a paragraph<\/p>"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"extracting-data","dir":"Articles","previous_headings":"","what":"Extracting data","title":"Web scraping 101","text":"Now ’ve got elements care , ’ll need get data . ’ll usually get data either text contents attribute. , sometimes (’re lucky!), data need HTML table.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"text","dir":"Articles","previous_headings":"Extracting data","what":"Text","title":"Web scraping 101","text":"Use html_text2() extract plain text contents HTML element: Note escaped ampersand automatically converted &; ’ll ever see HTML escapes source HTML, data returned rvest. might wonder used html_text2(), since seems give result html_text(): main difference two functions handle white space. HTML, white space largely ignored, ’s structure elements defines text laid . html_text2() best follow rules, giving something similar ’d see browser. Take example contains bunch white space HTML ignores. html_text2() gives expect: two paragraphs text separated blank line. Whereas html_text() returns garbled raw underlying text:","code":"html <- minimal_html(\"   <ol>     <li>apple &amp; pear<\/li>     <li>banana<\/li>     <li>pineapple<\/li>   <\/ol> \") html %>%    html_elements(\"li\") %>%    html_text2() #> [1] \"apple & pear\" \"banana\"       \"pineapple\" html %>%    html_elements(\"li\") %>%    html_text() #> [1] \"apple & pear\" \"banana\"       \"pineapple\" html <- minimal_html(\"<body>   <p>   This is   a   paragraph.<\/p><p>This is another paragraph.      It has two sentences.<\/p> \") html %>%    html_element(\"body\") %>%    html_text2() %>%    cat() #> This is a paragraph. #>  #> This is another paragraph. It has two sentences. html %>%    html_element(\"body\") %>%    html_text() %>%    cat() #>  #>    #>   This is #>   a #>   paragraph.This is another paragraph. #>    #>   It has two sentences."},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"attributes-1","dir":"Articles","previous_headings":"Extracting data","what":"Attributes","title":"Web scraping 101","text":"Attributes used record destination links (href attribute <> elements) source images (src attribute <img> element): value attribute can retrieved html_attr(): Note html_attr() always returns string, may need post-process .integer()/readr::parse_integer() similar.","code":"html <- minimal_html(\"   <p><a href='https://en.wikipedia.org/wiki/Cat'>cats<\/a><\/p>   <img src='https://cataas.com/cat' width='100' height='200'> \") html %>%    html_elements(\"a\") %>%    html_attr(\"href\") #> [1] \"https://en.wikipedia.org/wiki/Cat\"  html %>%    html_elements(\"img\") %>%    html_attr(\"src\") #> [1] \"https://cataas.com/cat\" html %>%    html_elements(\"img\") %>%    html_attr(\"width\") #> [1] \"100\"  html %>%    html_elements(\"img\") %>%    html_attr(\"width\") %>%    as.integer() #> [1] 100"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"tables","dir":"Articles","previous_headings":"Extracting data","what":"Tables","title":"Web scraping 101","text":"HTML tables composed four main elements: <table>, <tr> (table row), <th> (table heading), <td> (table data). ’s simple HTML table two columns three rows: tables common way store data, rvest includes handy html_table() converts table data frame:","code":"html <- minimal_html(\"   <table>     <tr>       <th>x<\/th>       <th>y<\/th>     <\/tr>     <tr>       <td>1.5<\/td>       <td>2.7<\/td>     <\/tr>     <tr>       <td>4.9<\/td>       <td>1.3<\/td>     <\/tr>     <tr>       <td>7.2<\/td>       <td>8.1<\/td>     <\/tr>   <\/table>   \") html %>%    html_node(\"table\") %>%    html_table() #> # A tibble: 3 × 2 #>       x     y #>   <dbl> <dbl> #> 1   1.5   2.7 #> 2   4.9   1.3 #> 3   7.2   8.1"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"element-vs-elements","dir":"Articles","previous_headings":"","what":"Element vs elements","title":"Web scraping 101","text":"using rvest, eventual goal usually build data frame, want row correspond repeated unit HTML page. case, generally start using html_elements() select elements contain observation use html_element() extract variables observation. guarantees ’ll get number values variable html_element() always returns number outputs inputs. illustrate problem take look simple example constructed using entries dplyr::starwars: try extract name, species, weight directly, end one vector length four two vectors length three, way align : Instead, use html_elements() find element corresponds character, use html_element() extract variable observations: html_element() automatically fills NA elements match, keeping variables aligned making easy create data frame:","code":"html <- minimal_html(\"   <ul>     <li><b>C-3PO<\/b> is a <i>droid<\/i> that weighs <span class='weight'>167 kg<\/span><\/li>     <li><b>R2-D2<\/b> is a <i>droid<\/i> that weighs <span class='weight'>96 kg<\/span><\/li>     <li><b>Yoda<\/b> weighs <span class='weight'>66 kg<\/span><\/li>     <li><b>R4-P17<\/b> is a <i>droid<\/i><\/li>   <\/ul>   \") html %>% html_elements(\"b\") %>% html_text2() #> [1] \"C-3PO\"  \"R2-D2\"  \"Yoda\"   \"R4-P17\" html %>% html_elements(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" \"droid\" html %>% html_elements(\".weight\") %>% html_text2() #> [1] \"167 kg\" \"96 kg\"  \"66 kg\" characters <- html %>% html_elements(\"li\")  characters %>% html_element(\"b\") %>% html_text2() #> [1] \"C-3PO\"  \"R2-D2\"  \"Yoda\"   \"R4-P17\" characters %>% html_element(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" NA      \"droid\" characters %>% html_element(\".weight\") %>% html_text2() #> [1] \"167 kg\" \"96 kg\"  \"66 kg\"  NA data.frame(   name = characters %>% html_element(\"b\") %>% html_text2(),   species = characters %>% html_element(\"i\") %>% html_text2(),   weight = characters %>% html_element(\".weight\") %>% html_text2() ) #>     name species weight #> 1  C-3PO   droid 167 kg #> 2  R2-D2   droid  96 kg #> 3   Yoda    <NA>  66 kg #> 4 R4-P17   droid   <NA>"},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"installation","dir":"Articles","previous_headings":"","what":"Installation","title":"SelectorGadget","text":"install , open page browser, drag following link bookmark bar: SelectorGadget.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"use","dir":"Articles","previous_headings":"","what":"Use","title":"SelectorGadget","text":"use , open page want scrape, : Click SelectorGadget entry bookmark bar. Click element want select. SelectorGadget make first guess css selector want. ’s likely bad since one example learn , ’s start. Elements match selector highlighted yellow. Click elements shouldn’t selected. turn red. Click elements selected. turn green. Iterate elements want selected. SelectorGadget isn’t perfect sometimes won’t able find useful css selector. Sometimes starting different element helps.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"example","dir":"Articles","previous_headings":"","what":"Example","title":"SelectorGadget","text":"example, imagine want find names movies listed vignette(\"starwars\"). Start opening https://rvest.tidyverse.org/articles/starwars.html web browser. Click SelectorGadget link bookmarks. SelectorGadget console appear bottom screen, element currently mouse highlighted orange.  Click movie name select . element selected highlighted green. SelectorGadget guesses css selector want (h2 case), highlights matches yellow (see total count equal 7 indicated “Clear” button).  Scroll around document verify selected desired movie titles nothing else. case, looks like SelectorGadget figured first try, can use selector R code: Now let’s try something little challenging: selecting paragraphs movie intro. Start way , opening website using SelectorGadget bookmark, time click first paragraph intro.  obviously selects many elements, click one paragraphs shouldn’t match. turns red indicating element shouldn’t matched.  looks good, convert R code: correct, ’ve lost connection title intro. fix problem need take step back see can find element identifies data one movie. carefully hovering, can figure section selector seems job: can get title film: contents intro: pretty common experience — SelectorGadget get started finding useful selectors ’ll often combine code.","code":"library(rvest) html <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\") html %>%    html_element(\"h2\") %>%    html_text2() #> [1] \"The Phantom Menace\" html %>%    html_elements(\".crawl p\") %>%    html_text2() %>%    .[1:4] #> [1] \"Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute.\"                                                                                                                #> [2] \"Hoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo.\"                                                                          #> [3] \"While the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict….\" #> [4] \"There is unrest in the Galactic Senate. Several thousand solar systems have declared their intentions to leave the Republic.\" films <- html %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleas ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nRelease ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-0 ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nRel ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: ... films %>%    html_element(\"h2\") %>%    html_text2() #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\" films %>%    html_element(\".crawl\") %>%    html_text2() %>%    .[[1]] %>%    writeLines() #> Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute. #>  #> Hoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo. #>  #> While the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict…."},{"path":"https://rvest.tidyverse.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hadley Wickham. Author, maintainer. . Copyright holder, funder.","code":""},{"path":"https://rvest.tidyverse.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Wickham H (2024). rvest: Easily Harvest (Scrape) Web Pages. R package version 1.0.3.9000, https://github.com/tidyverse/rvest, https://rvest.tidyverse.org/.","code":"@Manual{,   title = {rvest: Easily Harvest (Scrape) Web Pages},   author = {Hadley Wickham},   year = {2024},   note = {R package version 1.0.3.9000, https://github.com/tidyverse/rvest},   url = {https://rvest.tidyverse.org/}, }"},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"Easily Harvest (Scrape) Web Pages","text":"rvest helps scrape (harvest) data web pages. designed work magrittr make easy express common web scraping tasks, inspired libraries like beautiful soup RoboBrowser. ’re scraping multiple pages, highly recommend using rvest concert polite. polite package ensures ’re respecting robots.txt hammering site many requests.","code":""},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Easily Harvest (Scrape) Web Pages","text":"","code":"# The easiest way to get rvest is to install the whole tidyverse: install.packages(\"tidyverse\")  # Alternatively, install just rvest: install.packages(\"rvest\")"},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"Easily Harvest (Scrape) Web Pages","text":"page contains tabular data can convert directly data frame html_table():","code":"library(rvest)  # Start by reading a HTML page with read_html(): starwars <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\")  # Then find elements that match a css selector or XPath expression # using html_elements(). In this example, each <section> corresponds # to a different film films <- starwars %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased: 1999 ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleased: 20 ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nReleased: 200 ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-05-25\\n ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nReleased: ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased: 1983 ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: 2015- ...  # Then use html_element() to extract one element per film. Here # we the title is given by the text inside <h2> title <- films %>%    html_element(\"h2\") %>%    html_text2() title #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\"  # Or use html_attr() to get data out of attributes. html_attr() always # returns a string so we convert it to an integer using a readr function episode <- films %>%    html_element(\"h2\") %>%    html_attr(\"data-id\") %>%    readr::parse_integer() episode #> [1] 1 2 3 4 5 6 7 html <- read_html(\"https://en.wikipedia.org/w/index.php?title=The_Lego_Movie&oldid=998422565\")  html %>%    html_element(\".tracklist\") %>%    html_table() #> # A tibble: 29 × 4 #>    No.   Title                       `Performer(s)`                       Length #>    <chr> <chr>                       <chr>                                <chr>  #>  1 1.    \"\\\"Everything Is Awesome\\\"\" \"Tegan and Sara featuring The Lonel… 2:43   #>  2 2.    \"\\\"Prologue\\\"\"              \"\"                                   2:28   #>  3 3.    \"\\\"Emmett's Morning\\\"\"      \"\"                                   2:00   #>  4 4.    \"\\\"Emmett Falls in Love\\\"\"  \"\"                                   1:11   #>  5 5.    \"\\\"Escape\\\"\"                \"\"                                   3:26   #>  6 6.    \"\\\"Into the Old West\\\"\"     \"\"                                   1:00   #>  7 7.    \"\\\"Wyldstyle Explains\\\"\"    \"\"                                   1:21   #>  8 8.    \"\\\"Emmett's Mind\\\"\"         \"\"                                   2:17   #>  9 9.    \"\\\"The Transformation\\\"\"    \"\"                                   1:46   #> 10 10.   \"\\\"Saloons and Wagons\\\"\"    \"\"                                   3:38   #> # ℹ 19 more rows"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":null,"dir":"Reference","previous_headings":"","what":"Interact with a live web page — LiveHTML","title":"Interact with a live web page — LiveHTML","text":"construct LiveHTML object read_html_live() interact, like human, using methods described . debugging scraping script particularly useful use $view(), open live preview site, can actually see operations performed real site. rvest provides relatively simple methods scrolling, typing, clicking. richer interaction, probably want use package exposes powerful user interface, like selendir.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Interact with a live web page — LiveHTML","text":"session Underlying chromote session object. expert use .","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Interact with a live web page — LiveHTML","text":"LiveHTML$new() LiveHTML$print() LiveHTML$view() LiveHTML$html_elements() LiveHTML$click() LiveHTML$get_scroll_position() LiveHTML$scroll_into_view() LiveHTML$scroll_to() LiveHTML$scroll_by() LiveHTML$type() LiveHTML$press() LiveHTML$clone()","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Interact with a live web page — LiveHTML","text":"initialize object","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$new(url)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"url URL page.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-print-","dir":"Reference","previous_headings":"","what":"Method print()","title":"Interact with a live web page — LiveHTML","text":"Called print()ed","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$print(...)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"... Ignored","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-view-","dir":"Reference","previous_headings":"","what":"Method view()","title":"Interact with a live web page — LiveHTML","text":"Display live view site","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$view()"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-html-elements-","dir":"Reference","previous_headings":"","what":"Method html_elements()","title":"Interact with a live web page — LiveHTML","text":"Extract HTML elements current page.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$html_elements(css, xpath)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css, xpath CSS selector xpath expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-click-","dir":"Reference","previous_headings":"","what":"Method click()","title":"Interact with a live web page — LiveHTML","text":"Simulate click HTML element.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$click(css, n_clicks = 1)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. n_clicks Number clicks","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-get-scroll-position-","dir":"Reference","previous_headings":"","what":"Method get_scroll_position()","title":"Interact with a live web page — LiveHTML","text":"Get current scroll position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$get_scroll_position()"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-into-view-","dir":"Reference","previous_headings":"","what":"Method scroll_into_view()","title":"Interact with a live web page — LiveHTML","text":"Scroll selected element view.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_into_view(css)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-to-","dir":"Reference","previous_headings":"","what":"Method scroll_to()","title":"Interact with a live web page — LiveHTML","text":"Scroll specified location","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_to(top = 0, left = 0)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"top, left Number pixels top/left respectively.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-by-","dir":"Reference","previous_headings":"","what":"Method scroll_by()","title":"Interact with a live web page — LiveHTML","text":"Scroll specified amount","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_by(top = 0, left = 0)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"top, left Number pixels scroll /left/right respectively.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-type-","dir":"Reference","previous_headings":"","what":"Method type()","title":"Interact with a live web page — LiveHTML","text":"Type text selected element","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$type(css, text)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. text single string containing text type.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-press-","dir":"Reference","previous_headings":"","what":"Method press()","title":"Interact with a live web page — LiveHTML","text":"Simulate pressing single key (including special keys).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$press(css, key_code, modifiers = character())"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. Set NULL key_code Name key. can see complete list known keys https://pptr.dev/api/puppeteer.keyinput/. modifiers character vector modifiers. Must one \"Shift, \"Control\", \"Alt\", \"Meta\".","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Interact with a live web page — LiveHTML","text":"objects class cloneable method.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$clone(deep = FALSE)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"deep Whether make deep clone.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Interact with a live web page — LiveHTML","text":"","code":"if (FALSE) { # To retrieve data for this paginated site, we need to repeatedly push # the \"Load More\" button sess <- read_html_live(\"https://www.bodybuilding.com/exercises/finder\") sess$view()  sess %>% html_elements(\".ExResult-row\") %>% length() sess$click(\".ExLoadMore-btn\") sess %>% html_elements(\".ExResult-row\") %>% length() sess$click(\".ExLoadMore-btn\") sess %>% html_elements(\".ExResult-row\") %>% length() }"},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Make link to google form given id — google_form","title":"Make link to google form given id — google_form","text":"Make link google form given id","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make link to google form given id — google_form","text":"","code":"google_form(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make link to google form given id — google_form","text":"x Unique identifier form","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element attributes — html_attr","title":"Get element attributes — html_attr","text":"html_attr() gets single attribute; html_attrs() gets attributes.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element attributes — html_attr","text":"","code":"html_attr(x, name, default = NA_character_)  html_attrs(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element attributes — html_attr","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). name Name attribute retrieve. default string used default value attribute exist every element.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element attributes — html_attr","text":"character vector (html_attr()) list (html_attrs()) length x.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element attributes — html_attr","text":"","code":"html <- minimal_html('<ul>   <li><a href=\"https://a.com\" class=\"important\">a<\/a><\/li>   <li class=\"active\"><a href=\"https://c.com\">b<\/a><\/li>   <li><a href=\"https://c.com\">b<\/a><\/li>   <\/ul>')  html %>% html_elements(\"a\") %>% html_attrs() #> [[1]] #>            href           class  #> \"https://a.com\"     \"important\"  #>  #> [[2]] #>            href  #> \"https://c.com\"  #>  #> [[3]] #>            href  #> \"https://c.com\"  #>   html %>% html_elements(\"a\") %>% html_attr(\"href\") #> [1] \"https://a.com\" \"https://c.com\" \"https://c.com\" html %>% html_elements(\"li\") %>% html_attr(\"class\") #> [1] NA       \"active\" NA       html %>% html_elements(\"li\") %>% html_attr(\"class\", default = \"inactive\") #> [1] \"inactive\" \"active\"   \"inactive\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element children — html_children","title":"Get element children — html_children","text":"Get element children","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element children — html_children","text":"","code":"html_children(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element children — html_children","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element children — html_children","text":"","code":"html <- minimal_html(\"<ul><li>1<li>2<li>3<\/ul>\") ul <- html_elements(html, \"ul\") html_children(ul) #> {xml_nodeset (3)} #> [1] <li>1<\/li>\\n #> [2] <li>2<\/li>\\n #> [3] <li>3<\/li>  html <- minimal_html(\"<p>Hello <b>Hadley<\/b><i>!<\/i>\") p <- html_elements(html, \"p\") html_children(p) #> {xml_nodeset (2)} #> [1] <b>Hadley<\/b> #> [2] <i>!<\/i>"},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":null,"dir":"Reference","previous_headings":"","what":"Select elements from an HTML document — html_element","title":"Select elements from an HTML document — html_element","text":"html_element() html_elements() find HTML element using CSS selectors XPath expressions. CSS selectors particularly useful conjunction https://selectorgadget.com/, makes easy discover selector need.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Select elements from an HTML document — html_element","text":"","code":"html_element(x, css, xpath)  html_elements(x, css, xpath)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Select elements from an HTML document — html_element","text":"x Either document, node set single node. css, xpath Elements select. Supply one css xpath depending whether want use CSS selector XPath 1.0 expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Select elements from an HTML document — html_element","text":"html_element() returns nodeset length input. html_elements() flattens output direct way map output input.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"css-selector-support","dir":"Reference","previous_headings":"","what":"CSS selector support","title":"Select elements from an HTML document — html_element","text":"CSS selectors translated XPath selectors selectr package, port python cssselect library, https://pythonhosted.org/cssselect/. implements majority CSS3 selectors, described https://www.w3.org/TR/2011/REC-css3-selectors-20110929/. exceptions listed : Pseudo selectors require interactivity ignored: :hover, :active, :focus, :target, :visited. following pseudo classes work wild card element, *: *:first--type, *:last--type, *:nth--type, *:nth-last--type, *:--type supports :contains(text) can use !=, [foo!=bar] :([foo=bar]) :() accepts sequence simple selectors, just single simple selector.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Select elements from an HTML document — html_element","text":"","code":"html <- minimal_html(\"   <h1>This is a heading<\/h1>   <p id='first'>This is a paragraph<\/p>   <p class='important'>This is an important paragraph<\/p> \")  html %>% html_element(\"h1\") #> {html_node} #> <h1> html %>% html_elements(\"p\") #> {xml_nodeset (2)} #> [1] <p id=\"first\">This is a paragraph<\/p> #> [2] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\".important\") #> {xml_nodeset (1)} #> [1] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\"#first\") #> {xml_nodeset (1)} #> [1] <p id=\"first\">This is a paragraph<\/p>  # html_element() vs html_elements() -------------------------------------- html <- minimal_html(\"   <ul>     <li><b>C-3PO<\/b> is a <i>droid<\/i> that weighs <span class='weight'>167 kg<\/span><\/li>     <li><b>R2-D2<\/b> is a <i>droid<\/i> that weighs <span class='weight'>96 kg<\/span><\/li>     <li><b>Yoda<\/b> weighs <span class='weight'>66 kg<\/span><\/li>     <li><b>R4-P17<\/b> is a <i>droid<\/i><\/li>   <\/ul> \") li <- html %>% html_elements(\"li\")  # When applied to a node set, html_elements() returns all matching elements # beneath any of the inputs, flattening results into a new node set. li %>% html_elements(\"i\") #> {xml_nodeset (3)} #> [1] <i>droid<\/i> #> [2] <i>droid<\/i> #> [3] <i>droid<\/i>  # When applied to a node set, html_element() always returns a vector the # same length as the input, using a \"missing\" element where needed. li %>% html_element(\"i\") #> {xml_nodeset (4)} #> [1] <i>droid<\/i> #> [2] <i>droid<\/i> #> [3] NA #> [4] <i>droid<\/i> # and html_text() and html_attr() will return NA li %>% html_element(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" NA      \"droid\" li %>% html_element(\"span\") %>% html_attr(\"class\") #> [1] \"weight\" \"weight\" \"weight\" NA"},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":null,"dir":"Reference","previous_headings":"","what":"Guess faulty character encoding — html_encoding_guess","title":"Guess faulty character encoding — html_encoding_guess","text":"html_encoding_guess() helps handle web pages declare incorrect encoding. Use html_encoding_guess() generate list possible encodings, try using encoding argument read_html(). html_encoding_guess() replaces deprecated guess_encoding().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Guess faulty character encoding — html_encoding_guess","text":"","code":"html_encoding_guess(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Guess faulty character encoding — html_encoding_guess","text":"x character vector.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Guess faulty character encoding — html_encoding_guess","text":"","code":"# A file with bad encoding included in the package path <- system.file(\"html-ex\", \"bad-encoding.html\", package = \"rvest\") x <- read_html(path) x %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célèbre déjà vu.\"  html_encoding_guess(x) #>        encoding language confidence #> 1         UTF-8                1.00 #> 2  windows-1252       fr       0.31 #> 3  windows-1250       ro       0.22 #> 4      UTF-16BE                0.10 #> 5      UTF-16LE                0.10 #> 6       GB18030       zh       0.10 #> 7          Big5       zh       0.10 #> 8  windows-1254       tr       0.06 #> 9    IBM424_rtl       he       0.01 #> 10   IBM424_ltr       he       0.01 # Two valid encodings, only one of which is correct read_html(path, encoding = \"ISO-8859-1\") %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célèbre déjà vu.\" read_html(path, encoding = \"ISO-8859-2\") %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célčbre déjŕ vu.\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse forms and set values — html_form","title":"Parse forms and set values — html_form","text":"Use html_form() extract form, set values html_form_set(), submit html_form_submit().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse forms and set values — html_form","text":"","code":"html_form(x, base_url = NULL)  html_form_set(form, ...)  html_form_submit(form, submit = NULL)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse forms and set values — html_form","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). base_url Base url underlying HTML document. default, NULL, uses url HTML document underlying x. form form ... <dynamic-dots> Name-value pairs giving fields modify. Provide character vector set multiple checkboxes set select multiple values multi-select. submit button used submit form? NULL, default, uses first button. string selects button name. number selects button using relative position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse forms and set values — html_form","text":"html_form() returns S3 object class rvest_form applied single element. returns list rvest_form objects applied multiple elements document. html_form_set() returns rvest_form object. html_form_submit() submits form, returning httr response can parsed read_html().","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Parse forms and set values — html_form","text":"","code":"html <- read_html(\"http://www.google.com\") search <- html_form(html)[[1]]  search <- search %>% html_form_set(q = \"My little pony\", hl = \"fr\") #> Warning: Setting value of hidden field \"hl\".  # Or if you have a list of values, use !!! vals <- list(q = \"web scraping\", hl = \"en\") search <- search %>% html_form_set(!!!vals) #> Warning: Setting value of hidden field \"hl\".  # To submit and get result: if (FALSE) { resp <- html_form_submit(search) read_html(resp) }"},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element name — html_name","title":"Get element name — html_name","text":"Get element name","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element name — html_name","text":"","code":"html_name(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element name — html_name","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element name — html_name","text":"character vector length x","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element name — html_name","text":"","code":"url <- \"https://rvest.tidyverse.org/articles/starwars.html\" html <- read_html(url)  html %>%   html_element(\"div\") %>%   html_children() %>%   html_name() #> [1] \"a\"      \"small\"  \"button\" \"div\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse an html table into a data frame — html_table","title":"Parse an html table into a data frame — html_table","text":"algorithm mimics browser , repeats values merged cells every cell cover.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse an html table into a data frame — html_table","text":"","code":"html_table(   x,   header = NA,   trim = TRUE,   fill = deprecated(),   dec = \".\",   na.strings = \"NA\",   convert = TRUE )"},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse an html table into a data frame — html_table","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). header Use first row header? NA, use first row consists <th> tags. TRUE, column names left exactly source document, may require post-processing generate valid data frame. trim Remove leading trailing whitespace within cell? fill Deprecated - missing cells tables now always automatically filled NA. dec character used decimal place marker. na.strings Character vector values converted NA convert TRUE. convert TRUE, run type.convert() interpret texts integer, double, NA.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse an html table into a data frame — html_table","text":"applied single element, html_table() returns single tibble. applied multiple elements document, html_table() returns list tibbles.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Parse an html table into a data frame — html_table","text":"","code":"sample1 <- minimal_html(\"<table>   <tr><th>Col A<\/th><th>Col B<\/th><\/tr>   <tr><td>1<\/td><td>x<\/td><\/tr>   <tr><td>4<\/td><td>y<\/td><\/tr>   <tr><td>10<\/td><td>z<\/td><\/tr> <\/table>\") sample1 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 2 #>   `Col A` `Col B` #>     <int> <chr>   #> 1       1 x       #> 2       4 y       #> 3      10 z        # Values in merged cells will be duplicated sample2 <- minimal_html(\"<table>   <tr><th>A<\/th><th>B<\/th><th>C<\/th><\/tr>   <tr><td>1<\/td><td>2<\/td><td>3<\/td><\/tr>   <tr><td colspan='2'>4<\/td><td>5<\/td><\/tr>   <tr><td>6<\/td><td colspan='2'>7<\/td><\/tr> <\/table>\") sample2 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 3 #>       A     B     C #>   <int> <int> <int> #> 1     1     2     3 #> 2     4     4     5 #> 3     6     7     7  # If a row is missing cells, they'll be filled with NAs sample3 <- minimal_html(\"<table>   <tr><th>A<\/th><th>B<\/th><th>C<\/th><\/tr>   <tr><td colspan='2'>1<\/td><td>2<\/td><\/tr>   <tr><td colspan='2'>3<\/td><\/tr>   <tr><td>4<\/td><\/tr> <\/table>\") sample3 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 3 #>       A     B     C #>   <int> <int> <int> #> 1     1     1     2 #> 2     3     3    NA #> 3     4    NA    NA"},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element text — html_text","title":"Get element text — html_text","text":"two ways retrieve text element: html_text() html_text2(). html_text() thin wrapper around xml2::xml_text() returns just raw underlying text. html_text2() simulates text looks browser, using approach inspired JavaScript's innerText(). Roughly speaking, converts <br /> \"\\n\", adds blank lines around <p> tags, lightly formats tabular data. html_text2() usually want, much slower html_text() simple applications performance important may want use html_text() instead.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element text — html_text","text":"","code":"html_text(x, trim = FALSE)  html_text2(x, preserve_nbsp = FALSE)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element text — html_text","text":"x document, node, node set. trim TRUE trim leading trailing spaces. preserve_nbsp non-breaking spaces preserved? default, html_text2() converts ordinary spaces ease computation. preserve_nbsp TRUE, &nbsp; appear strings \"\\ua0\". often causes confusion prints way \" \".","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element text — html_text","text":"character vector length x","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element text — html_text","text":"","code":"# To understand the difference between html_text() and html_text2() # take the following html:  html <- minimal_html(   \"<p>This is a paragraph.     This another sentence.<br>This should start on a new line\" )  # html_text() returns the raw underlying text, which includes whitespace # that would be ignored by a browser, and ignores the <br> html %>% html_element(\"p\") %>% html_text() %>% writeLines() #> This is a paragraph. #>     This another sentence.This should start on a new line  # html_text2() simulates what a browser would display. Non-significant # whitespace is collapsed, and <br> is turned into a line break html %>% html_element(\"p\") %>% html_text2() %>% writeLines() #> This is a paragraph. This another sentence. #> This should start on a new line  # By default, html_text2() also converts non-breaking spaces to regular # spaces: html <- minimal_html(\"<p>x&nbsp;y<\/p>\") x1 <- html %>% html_element(\"p\") %>% html_text() x2 <- html %>% html_element(\"p\") %>% html_text2()  # When printed, non-breaking spaces look exactly like regular spaces x1 #> [1] \"x y\" x2 #> [1] \"x y\" # But aren't actually the same: x1 == x2 #> [1] FALSE # Which you can confirm by looking at their underlying binary # representaion: charToRaw(x1) #> [1] 78 c2 a0 79 charToRaw(x2) #> [1] 78 20 79"},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an HTML document from inline HTML — minimal_html","title":"Create an HTML document from inline HTML — minimal_html","text":"Create HTML document inline HTML","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an HTML document from inline HTML — minimal_html","text":"","code":"minimal_html(html, title = \"\")"},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an HTML document from inline HTML — minimal_html","text":"html HTML contents page. title Page title (required HTML spec).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create an HTML document from inline HTML — minimal_html","text":"","code":"minimal_html(\"<p>test<\/p>\") #> {html_document} #> <html> #> [1] <head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset ... #> [2] <body><p>test<\/p><\/body>"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":null,"dir":"Reference","previous_headings":"","what":"Static web scraping (with xml2) — read_html","title":"Static web scraping (with xml2) — read_html","text":"read_html() works performing HTTP request parsing HTML received using xml2 package. \"static\" scraping operates raw HTML file. works sites, cases need use read_html_live() parts page want scrape dynamically generated javascript. Generally, recommend using read_html() works, faster robust, fewer external dependencies (.e. rely Chrome web browser installed computer.)","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Static web scraping (with xml2) — read_html","text":"","code":"read_html(x, encoding = \"\", ..., options = c(\"RECOVER\", \"NOERROR\", \"NOBLANKS\"))"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Static web scraping (with xml2) — read_html","text":"x Usually string representing URL. See xml2::read_html() options. encoding Specify default encoding document. Unless otherwise specified XML documents assumed UTF-8 UTF-16. document UTF-8/16, lacks explicit encoding directive, allows supply default. ... Additional arguments passed methods. options Set parsing options libxml2 parser. Zero RECOVER recover errors NOENT substitute entities DTDLOAD load external subset DTDATTR default DTD attributes DTDVALID validate DTD NOERROR suppress error reports NOWARNING suppress warning reports PEDANTIC pedantic error reporting NOBLANKS remove blank nodes SAX1 use SAX1 interface internally XINCLUDE Implement XInclude substitition NONET Forbid network access NODICT reuse context dictionary NSCLEAN remove redundant namespaces declarations NOCDATA merge CDATA text nodes NOXINCNODE generate XINCLUDE START/END nodes COMPACT compact small text nodes; modification tree allowed afterwards (possibly crash try modify tree) OLD10 parse using XML-1.0 update 5 NOBASEFIX fixup XINCLUDE xml:base uris HUGE relax hardcoded limit parser OLDSAX parse using SAX2 interface 2.7.0 IGNORE_ENC ignore internal document encoding hint BIG_LINES Store big lines numbers text PSVI field","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Static web scraping (with xml2) — read_html","text":"","code":"# Start by reading a HTML page with read_html(): starwars <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\")  # Then find elements that match a css selector or XPath expression # using html_elements(). In this example, each <section> corresponds # to a different film films <- starwars %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleas ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nRelease ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-0 ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nRel ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: ...  # Then use html_element() to extract one element per film. Here # we the title is given by the text inside <h2> title <- films %>%   html_element(\"h2\") %>%   html_text2() title #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\"        # Or use html_attr() to get data out of attributes. html_attr() always # returns a string so we convert it to an integer using a readr function episode <- films %>%   html_element(\"h2\") %>%   html_attr(\"data-id\") %>%   readr::parse_integer() episode #> [1] 1 2 3 4 5 6 7"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":null,"dir":"Reference","previous_headings":"","what":"Live web scraping (with chromote) — read_html_live","title":"Live web scraping (with chromote) — read_html_live","text":"read_html() operates HTML source code downloaded server. works websites can fail site uses javascript generate HTML. read_html_live() provides alternative interface runs live web browser (Chrome) background. allows access elements HTML page generated dynamically javascript interact live page clicking buttons typing forms. Behind scenes, function uses chromote package, requires copy Google Chrome installed machine.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Live web scraping (with chromote) — read_html_live","text":"","code":"read_html_live(url)"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Live web scraping (with chromote) — read_html_live","text":"url Website url read .","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Live web scraping (with chromote) — read_html_live","text":"read_html_live() returns R6 LiveHTML object. can interact object using usual rvest functions, call methods, like $click(), $scroll_to(), $type() interact live page like human .","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Live web scraping (with chromote) — read_html_live","text":"","code":"if (FALSE) { # When we retrieve the raw HTML for this site, it doesn't contain the # data we're interested in: static <- read_html(\"https://www.forbes.com/top-colleges/\") static %>% html_elements(\".TopColleges2023_tableRow__BYOSU\")  # Instead, we need to run the site in a real web browser, causing it to # download a JSON file and then dynamically generate the html:  sess <- read_html_live(\"https://www.forbes.com/top-colleges/\") sess$view() rows <- sess %>% html_elements(\".TopColleges2023_tableRow__BYOSU\") rows %>% html_element(\".TopColleges2023_organizationName__J1lEV\") %>% html_text() rows %>% html_element(\".grant-aid\") %>% html_text() }"},{"path":"https://rvest.tidyverse.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. magrittr %>% xml2 url_absolute","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rename.html","id":null,"dir":"Reference","previous_headings":"","what":"Functions renamed in rvest 1.0.0 — rename","title":"Functions renamed in rvest 1.0.0 — rename","text":"rvest 1.0.0 renamed number functions ensure every function common prefix, matching tidyverse conventions emerged since rvest first created. set_values() -> html_form_set() submit_form() -> session_submit() xml_tag() -> html_name() xml_node() & html_node() -> html_element() xml_nodes() & html_nodes() -> html_elements() (html_node() html_nodes() superseded widely used.) Additionally session related functions gained common prefix: html_session() -> session() forward() -> session_forward() back() -> session_back() jump_to() -> session_jump_to() follow_link() -> session_follow_link()","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rename.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Functions renamed in rvest 1.0.0 — rename","text":"","code":"set_values(form, ...)  submit_form(session, form, submit = NULL, ...)  xml_tag(x)  xml_node(...)  xml_nodes(...)  html_nodes(...)  html_node(...)  back(x)  forward(x)  jump_to(x, url, ...)  follow_link(x, ...)  html_session(url, ...)"},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":null,"dir":"Reference","previous_headings":"","what":"Repair faulty encoding — repair_encoding","title":"Repair faulty encoding — repair_encoding","text":"function deprecated work. Instead re-read HTML file correct encoding argument.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Repair faulty encoding — repair_encoding","text":"","code":"repair_encoding(x, from = NULL)"},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Repair faulty encoding — repair_encoding","text":"encoding string actually . NULL, guess_encoding used.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rvest-package.html","id":null,"dir":"Reference","previous_headings":"","what":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","title":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","text":"Wrappers around 'xml2' 'httr' packages make easy download, manipulate, HTML XML.","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/rvest-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","text":"Maintainer: Hadley Wickham hadley@posit.co contributors: Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate a session in web browser — session","title":"Simulate a session in web browser — session","text":"set functions allows simulate user interacting website, using forms navigating page page. Create session session(url) Navigate specified url session_jump_to(), follow link page session_follow_link(). Submit html_form session_submit(). View history session_history() navigate back forward session_back() session_forward(). Extract page contents html_element() html_elements(), get complete HTML document read_html(). Inspect HTTP response httr::cookies(), httr::headers(), httr::status_code().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate a session in web browser — session","text":"","code":"session(url, ...)  is.session(x)  session_jump_to(x, url, ...)  session_follow_link(x, i, css, xpath, ...)  session_back(x)  session_forward(x)  session_history(x)  session_submit(x, form, submit = NULL, ...)"},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate a session in web browser — session","text":"url URL, either relative absolute, navigate . ... additional httr config use throughout session. x session. integer select ith link string match first link containing text (case sensitive). css, xpath Elements select. Supply one css xpath depending whether want use CSS selector XPath 1.0 expression. form html_form submit submit button used submit form? NULL, default, uses first button. string selects button name. number selects button using relative position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate a session in web browser — session","text":"","code":"s <- session(\"http://hadley.nz\") s %>%   session_jump_to(\"hadley-wickham.jpg\") %>%   session_jump_to(\"/\") %>%   session_history() #> Warning: Not Found (HTTP 404). #>   https://hadley.nz/ #>   https://hadley.nz/hadley-wickham.jpg #> - https://hadley.nz/  s %>%   session_jump_to(\"hadley-wickham.jpg\") %>%   session_back() %>%   session_history() #> Warning: Not Found (HTTP 404). #> - https://hadley.nz/ #>   https://hadley.nz/hadley-wickham.jpg  # \\donttest{ s %>%   session_follow_link(css = \"p a\") %>%   html_elements(\"p\") #> Navigating to <http://rstudio.com>. #> {xml_nodeset (16)} #>  [1] <p class=\"h5\">See you in Seattle August 12-14!<\/p> #>  [2] <p>Securely share data-science applications<br>\\n across your team ... #>  [3] <p>Our code is your code. Build on it. Share it. Improve people’s  ... #>  [4] <p>Take the time and effort out of uploading, storing, accessing,  ... #>  [5] <p class=\"sh4 uppercase mb-[8px] text-blue1\">\\n            Custome ... #>  [6] <p class=\"mt-[8px] body-md-regular text-blue1/[.62]\">\\n            ... #>  [7] <p class=\"mt-[16px] body-md-regular text-neutral-blue62 line-clamp ... #>  [8] <p class=\"mt-[16px] body-md-regular text-neutral-blue62 line-clamp ... #>  [9] <p class=\"description body-lg-regular text-neutral-light/70\" style ... #> [10] <p class=\"body-sm-regular text-blue1/[.62] mt-[25px]\">\\n           ... #> [11] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [12] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [13] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [14] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [15] <p class=\"ui-small uppercase text-blue1\">\\n                    con ... #> [16] <p class=\"body-md-regular body-sm-regular\">We use cookies to bring ... # }"},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-development-version","dir":"Changelog","previous_headings":"","what":"rvest (development version)","title":"rvest (development version)","text":"New read_html_live() reads HTML real, live, HTML browser, meaning can scrape HTML generated javascript. returns LiveHTML object can also use simulate user interactions page, like clicking, typing, scrolling (#245). html_table() discards rows without cells (@epiben, #360).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-103","dir":"Changelog","previous_headings":"","what":"rvest 1.0.3","title":"rvest 1.0.3","text":"CRAN release: 2022-08-19 Re-document fix HTML issues .Rd.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-102","dir":"Changelog","previous_headings":"","what":"rvest 1.0.2","title":"rvest 1.0.2","text":"CRAN release: 2021-10-16 Fixes CRAN html_table() converts empty tables empty tibbles (@epiben, #327).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-101","dir":"Changelog","previous_headings":"","what":"rvest 1.0.1","title":"rvest 1.0.1","text":"CRAN release: 2021-07-26 html_table() correctly handles tables cells contain blank values rowspan /colspan, e.g. <td rowspan=\"\"> parsed <td rowspan=1> (@epiben, #323). Fix broken example","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-100","dir":"Changelog","previous_headings":"","what":"rvest 1.0.0","title":"rvest 1.0.0","text":"CRAN release: 2021-03-09","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"new-features-1-0-0","dir":"Changelog","previous_headings":"","what":"New features","title":"rvest 1.0.0","text":"New html_text2() provides natural rendering HTML nodes text, converting <br> “”, removing non-significant whitespace (#175). default, also converts &nbsp; regular spaces, can suppress preserve_nbsp = TRUE (#284). html_table() re-written scratch closely mimic algorithm browsers use parsing tables. mean far fewer tables fails produce output (#63, #204, #215). fill argument deprecated since longer needed. html_table() now returns tibble rather data frame compatible rest tidyverse (#199). performance considerably improved (#237). also gains na.strings argument control values converted NA (#107), convert argument control whether run conversion (#311). New html_form_submit() allows submit form directly, without needing create session (#300). rvest now licensed MIT (#287).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"api-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"API changes","title":"rvest 1.0.0","text":"Since 1.0.0 release, included large number API changes make rvest compatible current tidyverse conventions. Older functions deprecated, existing code continue work (albeit new warnings). rvest now imports xml2 rather depending . cleaner avoids attaching xml2 functions ’re less likely use. reduce change breakages, rvest re-exports xml2 functions read_html() url_absolute(), code may now need explicit library(xml2). html_form() now returns object class rvest_form (instead form). Fields within form now class rvest_field, instead variety classes lacking rvest_ prefix. functions working forms common html_form_ prefix: set_values() became html_form_set(). submit_form() renamed session_submit() returns session. html_node() html_nodes() superseded favor html_element() html_elements() since (almost) always return elements, nodes (#298). html_session() now session() returns object class rvest_session (instead session). functions work session objects now common session_ prefix. Long deprecated html(), html_tag(), xml() functions removed. minimal_html() (doesn’t appear used package) arguments flipped make intuitive. guess_encoding() renamed html_encoding_guess() avoid clash stringr::guess_encoding() (#209). repair_encoding() deprecated doesn’t appear work. pluck() longer exported avoid clash purrr::pluck(); need use purrr::map_chr() friends instead (#209). xml_tag(), xml_node(), xml_nodes() formally deprecated favor html_ equivalents.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"minor-improvements-and-bug-fixes-1-0-0","dir":"Changelog","previous_headings":"","what":"Minor improvements and bug fixes","title":"rvest 1.0.0","text":"“harvesting web” vignette rewritten focus basics rvest, eliminating screenshots keep installed package svelte possible. ’s also renamed vignette(\"rvest\") since ’s vignette read first. SelectorGadget vignette now web-article, https://rvest.tidyverse.org/articles/articles/selectorgadget.html, can generous screenshots since ’re longer bundled every install package. Together rewrite vignette, means rvest now ~90 Kb instead ~1.1 Mb. uses IMDB eliminated since site explicitly prohibits scraping (#195). session_submit() errors form doesn’t url (#288). New session_forward() function complement session_back(). now allows pick submission button position (#156). ... argument deprecated; please use config instead. html_form_set() can now accept character vectors allowing select multiple checkboxes set select multiple values multi-<select> (#127, help @juba). also uses dynamic dots can use !!! list values (#189).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-036","dir":"Changelog","previous_headings":"","what":"rvest 0.3.6","title":"rvest 0.3.6","text":"CRAN release: 2020-07-25 Remove failing example","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-035","dir":"Changelog","previous_headings":"","what":"rvest 0.3.5","title":"rvest 0.3.5","text":"CRAN release: 2019-11-08 Use web archive fix broken example.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-034","dir":"Changelog","previous_headings":"","what":"rvest 0.3.4","title":"rvest 0.3.4","text":"CRAN release: 2019-05-15 Remove unneeded read_xml.response() method (#242).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-033","dir":"Changelog","previous_headings":"","what":"rvest 0.3.3","title":"rvest 0.3.3","text":"CRAN release: 2019-04-11 Fix R CMD check failure submit_request() now checks empty form-field-types select correct submit fields (@rentrop, #159)","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-032","dir":"Changelog","previous_headings":"","what":"rvest 0.3.2","title":"rvest 0.3.2","text":"CRAN release: 2016-06-17 Fixes follow_link() back() correctly manage session history. ’re using xml2 1.0.0, html_node() now return “missing node”. Parse rowspans colspans effectively filling using repetition left right (colspan) top bottom (rowspan) (#111) Updated examples demos website structure changed. Made compatible xml2 0.1.2 1.0.0.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-031","dir":"Changelog","previous_headings":"","what":"rvest 0.3.1","title":"rvest 0.3.1","text":"CRAN release: 2015-11-11 Fix invalid link SSA example. Parse <options> don’t value attribute (#85). Remove remaining uses html() favor read_html() (@jimhester, #113).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-030","dir":"Changelog","previous_headings":"","what":"rvest 0.3.0","title":"rvest 0.3.0","text":"CRAN release: 2015-09-23 rvest rewritten take advantage new xml2 package. xml2 provides fresh binding libxml2, avoiding many work-arounds previously needed XML package. Now rvest depends xml2 package, xml functions available, rvest adds thin wrapper html. number functions change names. old versions still work, deprecated removed rvest 0.4.0. html_tag() -> html_name() html() -> read_html() html_node() now throws error matches, warning ’s one match. think make likely fail clearly structure page changes. xml_structure() moved xml2. New html_structure() (also xml2) highlights id class attributes (#78). submit_form() now works forms use GET (#66). submit_request() (hence submit_form()) now case-insensitive, find <input type=SUBMIT> well <input type=\"submit\">. submit_request() (hence submit_form()) recognizes forms <input type=\"image\"> valid form submission button.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-020","dir":"Changelog","previous_headings":"","what":"rvest 0.2.0","title":"rvest 0.2.0","text":"CRAN release: 2015-01-01","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"new-features-0-2-0","dir":"Changelog","previous_headings":"","what":"New features","title":"rvest 0.2.0","text":"html() xml() pass ... httr::GET() can finely control request (#48). Add xml support: parse xml(), work using xml_node(), xml_attr(), xml_attrs(), xml_text() xml_tag() (#24). xml_structure(): new function displays structure (.e. tag attribute names) xml/html object (#10).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"bug-fixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"rvest 0.2.0","text":"follow_link() now accepts css xpath selectors. (#38, #41, #42) html() better job dealing encodings (passing problem XML::parseHTML()) instead trying (#25, #50). html_attr() returns default value input NULL (#49) Add missing html_node() method session. html_nodes() now returns empty list elements found (#31). submit_form() converts relative paths absolute URLs (#52). also deals better 0-length inputs (#29).","code":""}]
+[{"path":[]},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://rvest.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to rvest","title":"Contributing to rvest","text":"outlines propose change rvest. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to rvest","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to rvest","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to rvest","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"tidyverse/rvest\", fork = TRUE). Install development dependences devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to rvest","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://rvest.tidyverse.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to rvest","text":"Please note rvest project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://rvest.tidyverse.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 rvest authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":null,"dir":"","previous_headings":"","what":"Getting help with rvest","title":"Getting help with rvest","text":"Thanks using rvest! filing issue, places explore pieces put together make process smooth possible.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"make-a-reprex","dir":"","previous_headings":"","what":"Make a reprex","title":"Getting help with rvest","text":"Start making minimal reproducible example using reprex package. haven’t heard used reprex , ’re treat! Seriously, reprex make R-question-asking endeavors easier (pretty insane ROI five ten minutes ’ll take learn ’s ). additional reprex pointers, check Get help! section tidyverse site.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"where-to-ask","dir":"","previous_headings":"","what":"Where to ask?","title":"Getting help with rvest","text":"Armed reprex, next step figure ask. ’s question: start community.rstudio.com, /StackOverflow. people answer questions. ’s bug: ’re right place, file issue. ’re sure: let community help figure ! problem bug feature request, can easily return report . opening new issue, sure search issues pull requests make sure bug hasn’t reported /already fixed development version. default, search pre-populated :issue :open. can edit qualifiers (e.g. :pr, :closed) needed. example, ’d simply remove :open search issues repo, open closed.","code":""},{"path":"https://rvest.tidyverse.org/dev/SUPPORT.html","id":"what-happens-next","dir":"","previous_headings":"","what":"What happens next?","title":"Getting help with rvest","text":"efficient possible, development tidyverse packages tends bursty, shouldn’t worry don’t get immediate response. Typically don’t look repo sufficient quantity issues accumulates, ’s burst intense activity focus efforts. makes development efficient avoids expensive context switching problems, cost taking longer get back . process makes good reprex particularly important might multiple months initial report start working . can’t reproduce bug, can’t fix !","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"html-basics","dir":"Articles","previous_headings":"","what":"HTML basics","title":"Web scraping 101","text":"HTML stands “HyperText Markup Language” looks like : HTML hierarchical structure formed elements consist start tag (e.g. <tag>), optional attributes (id='first'), end tag1 (like <\/tag>), contents (everything start end tag). Since < > used start end tags, can’t write directly. Instead use HTML escapes &gt; (greater ) &lt; (less ). since escapes use &, want literal ampersand escape &amp;. wide range possible HTML escapes don’t need worry much rvest automatically handles .","code":"<html> <head>   <title>Page title<\/title> <\/head> <body>   <h1 id='first'>A heading<\/h1>   <p>Some text &amp; <b>some bold text.<\/b><\/p>   <img src='myimg.png' width='100' height='100'> <\/body>"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"elements","dir":"Articles","previous_headings":"HTML basics","what":"Elements","title":"Web scraping 101","text":", 100 HTML elements. important : Every HTML page must <html> element, must two children: <head>, contains document metadata like page title, <body>, contains content see browser. Block tags like <h1> (heading 1), <p> (paragraph), <ol> (ordered list) form overall structure page. Inline tags like <b> (bold), <> (italics), <> (links) formats text inside block tags. encounter tag ’ve never seen , can find little googling. recommend MDN Web Docs produced Mozilla, company makes Firefox web browser.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"contents","dir":"Articles","previous_headings":"HTML basics","what":"Contents","title":"Web scraping 101","text":"elements can content start end tags. content can either text elements. example, following HTML contains paragraph text, one word bold. Hi! name Hadley. children node refers elements, <p> element one child, <b> element. <b> element children, contents (text “name”). elements, like <img> can’t children. elements depend solely attributes behavior.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"attributes","dir":"Articles","previous_headings":"HTML basics","what":"Attributes","title":"Web scraping 101","text":"Tags can named attributes look like name1='value1' name2='value2'. Two important attributes id class, used conjunction CSS (Cascading Style Sheets) control visual appearance page. often useful scraping data page.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"reading-html-with-rvest","dir":"Articles","previous_headings":"","what":"Reading HTML with rvest","title":"Web scraping 101","text":"’ll usually start scraping process read_html(). returns xml_document2 object ’ll manipulate using rvest functions: examples experimentation, rvest also includes function lets create xml_document literal HTML: Regardless get HTML, ’ll need way identify elements contain data care . rvest provides two options: CSS selectors XPath expressions. ’ll focus CSS selectors ’re simpler still sufficiently powerful scraping tasks.","code":"html <- read_html(\"http://rvest.tidyverse.org/\") class(html) #> [1] \"xml_document\" \"xml_node\" html <- minimal_html(\"   <p>This is a paragraph<p>   <ul>     <li>This is a bulleted list<\/li>   <\/ul> \") html #> {html_document} #> <html> #> [1] <head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset ... #> [2] <body>\\n<p>This is a paragraph<\/p>\\n<p>\\n  <\/p>\\n<ul>\\n<li>This is  ..."},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"css-selectors","dir":"Articles","previous_headings":"","what":"CSS selectors","title":"Web scraping 101","text":"CSS short cascading style sheets, tool defining visual styling HTML documents. CSS includes miniature language selecting elements page called CSS selectors. CSS selectors define patterns locating HTML elements, useful scraping provide concise way describing elements want extract. CSS selectors can quite complex, fortunately need simplest rvest, can also write R code complicated situations. four important selectors : p: selects <p> elements. .title: selects elements class “title”. p.special: selects <p> elements class “special”. #title: selects element id attribute equals “title”. Id attributes must unique within document, ever select single element. want learn CSS selectors recommend starting fun CSS dinner tutorial referring MDN web docs. Lets try important selectors simple example: rvest can extract single element html_element() matching elements html_elements(). functions take document3 css selector: Selectors can also combined various ways using combinators. example,important combinator ” “, descendant combination, p selects <> elements child <p> element. don’t know exactly selector need, highly recommend using SelectorGadget, lets automatically generate selector need supplying positive negative examples browser.","code":"html <- minimal_html(\"   <h1>This is a heading<\/h1>   <p id='first'>This is a paragraph<\/p>   <p class='important'>This is an important paragraph<\/p> \") html %>% html_element(\"h1\") #> {html_node} #> <h1> html %>% html_elements(\"p\") #> {xml_nodeset (2)} #> [1] <p id=\"first\">This is a paragraph<\/p> #> [2] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\".important\") #> {xml_nodeset (1)} #> [1] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\"#first\") #> {xml_nodeset (1)} #> [1] <p id=\"first\">This is a paragraph<\/p>"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"extracting-data","dir":"Articles","previous_headings":"","what":"Extracting data","title":"Web scraping 101","text":"Now ’ve got elements care , ’ll need get data . ’ll usually get data either text contents attribute. , sometimes (’re lucky!), data need HTML table.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"text","dir":"Articles","previous_headings":"Extracting data","what":"Text","title":"Web scraping 101","text":"Use html_text2() extract plain text contents HTML element: Note escaped ampersand automatically converted &; ’ll ever see HTML escapes source HTML, data returned rvest. might wonder used html_text2(), since seems give result html_text(): main difference two functions handle white space. HTML, white space largely ignored, ’s structure elements defines text laid . html_text2() best follow rules, giving something similar ’d see browser. Take example contains bunch white space HTML ignores. html_text2() gives expect: two paragraphs text separated blank line. Whereas html_text() returns garbled raw underlying text:","code":"html <- minimal_html(\"   <ol>     <li>apple &amp; pear<\/li>     <li>banana<\/li>     <li>pineapple<\/li>   <\/ol> \") html %>%    html_elements(\"li\") %>%    html_text2() #> [1] \"apple & pear\" \"banana\"       \"pineapple\" html %>%    html_elements(\"li\") %>%    html_text() #> [1] \"apple & pear\" \"banana\"       \"pineapple\" html <- minimal_html(\"<body>   <p>   This is   a   paragraph.<\/p><p>This is another paragraph.      It has two sentences.<\/p> \") html %>%    html_element(\"body\") %>%    html_text2() %>%    cat() #> This is a paragraph. #>  #> This is another paragraph. It has two sentences. html %>%    html_element(\"body\") %>%    html_text() %>%    cat() #>  #>    #>   This is #>   a #>   paragraph.This is another paragraph. #>    #>   It has two sentences."},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"attributes-1","dir":"Articles","previous_headings":"Extracting data","what":"Attributes","title":"Web scraping 101","text":"Attributes used record destination links (href attribute <> elements) source images (src attribute <img> element): value attribute can retrieved html_attr(): Note html_attr() always returns string, may need post-process .integer()/readr::parse_integer() similar.","code":"html <- minimal_html(\"   <p><a href='https://en.wikipedia.org/wiki/Cat'>cats<\/a><\/p>   <img src='https://cataas.com/cat' width='100' height='200'> \") html %>%    html_elements(\"a\") %>%    html_attr(\"href\") #> [1] \"https://en.wikipedia.org/wiki/Cat\"  html %>%    html_elements(\"img\") %>%    html_attr(\"src\") #> [1] \"https://cataas.com/cat\" html %>%    html_elements(\"img\") %>%    html_attr(\"width\") #> [1] \"100\"  html %>%    html_elements(\"img\") %>%    html_attr(\"width\") %>%    as.integer() #> [1] 100"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"tables","dir":"Articles","previous_headings":"Extracting data","what":"Tables","title":"Web scraping 101","text":"HTML tables composed four main elements: <table>, <tr> (table row), <th> (table heading), <td> (table data). ’s simple HTML table two columns three rows: tables common way store data, rvest includes handy html_table() converts table data frame:","code":"html <- minimal_html(\"   <table>     <tr>       <th>x<\/th>       <th>y<\/th>     <\/tr>     <tr>       <td>1.5<\/td>       <td>2.7<\/td>     <\/tr>     <tr>       <td>4.9<\/td>       <td>1.3<\/td>     <\/tr>     <tr>       <td>7.2<\/td>       <td>8.1<\/td>     <\/tr>   <\/table>   \") html %>%    html_node(\"table\") %>%    html_table() #> # A tibble: 3 × 2 #>       x     y #>   <dbl> <dbl> #> 1   1.5   2.7 #> 2   4.9   1.3 #> 3   7.2   8.1"},{"path":"https://rvest.tidyverse.org/dev/articles/rvest.html","id":"element-vs-elements","dir":"Articles","previous_headings":"","what":"Element vs elements","title":"Web scraping 101","text":"using rvest, eventual goal usually build data frame, want row correspond repeated unit HTML page. case, generally start using html_elements() select elements contain observation use html_element() extract variables observation. guarantees ’ll get number values variable html_element() always returns number outputs inputs. illustrate problem take look simple example constructed using entries dplyr::starwars: try extract name, species, weight directly, end one vector length four two vectors length three, way align : Instead, use html_elements() find element corresponds character, use html_element() extract variable observations: html_element() automatically fills NA elements match, keeping variables aligned making easy create data frame:","code":"html <- minimal_html(\"   <ul>     <li><b>C-3PO<\/b> is a <i>droid<\/i> that weighs <span class='weight'>167 kg<\/span><\/li>     <li><b>R2-D2<\/b> is a <i>droid<\/i> that weighs <span class='weight'>96 kg<\/span><\/li>     <li><b>Yoda<\/b> weighs <span class='weight'>66 kg<\/span><\/li>     <li><b>R4-P17<\/b> is a <i>droid<\/i><\/li>   <\/ul>   \") html %>% html_elements(\"b\") %>% html_text2() #> [1] \"C-3PO\"  \"R2-D2\"  \"Yoda\"   \"R4-P17\" html %>% html_elements(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" \"droid\" html %>% html_elements(\".weight\") %>% html_text2() #> [1] \"167 kg\" \"96 kg\"  \"66 kg\" characters <- html %>% html_elements(\"li\")  characters %>% html_element(\"b\") %>% html_text2() #> [1] \"C-3PO\"  \"R2-D2\"  \"Yoda\"   \"R4-P17\" characters %>% html_element(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" NA      \"droid\" characters %>% html_element(\".weight\") %>% html_text2() #> [1] \"167 kg\" \"96 kg\"  \"66 kg\"  NA data.frame(   name = characters %>% html_element(\"b\") %>% html_text2(),   species = characters %>% html_element(\"i\") %>% html_text2(),   weight = characters %>% html_element(\".weight\") %>% html_text2() ) #>     name species weight #> 1  C-3PO   droid 167 kg #> 2  R2-D2   droid  96 kg #> 3   Yoda    <NA>  66 kg #> 4 R4-P17   droid   <NA>"},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"installation","dir":"Articles","previous_headings":"","what":"Installation","title":"SelectorGadget","text":"install , open page browser, drag following link bookmark bar: SelectorGadget.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"use","dir":"Articles","previous_headings":"","what":"Use","title":"SelectorGadget","text":"use , open page want scrape, : Click SelectorGadget entry bookmark bar. Click element want select. SelectorGadget make first guess css selector want. ’s likely bad since one example learn , ’s start. Elements match selector highlighted yellow. Click elements shouldn’t selected. turn red. Click elements selected. turn green. Iterate elements want selected. SelectorGadget isn’t perfect sometimes won’t able find useful css selector. Sometimes starting different element helps.","code":""},{"path":"https://rvest.tidyverse.org/dev/articles/selectorgadget.html","id":"example","dir":"Articles","previous_headings":"","what":"Example","title":"SelectorGadget","text":"example, imagine want find names movies listed vignette(\"starwars\"). Start opening https://rvest.tidyverse.org/articles/starwars.html web browser. Click SelectorGadget link bookmarks. SelectorGadget console appear bottom screen, element currently mouse highlighted orange.  Click movie name select . element selected highlighted green. SelectorGadget guesses css selector want (h2 case), highlights matches yellow (see total count equal 7 indicated “Clear” button).  Scroll around document verify selected desired movie titles nothing else. case, looks like SelectorGadget figured first try, can use selector R code: Now let’s try something little challenging: selecting paragraphs movie intro. Start way , opening website using SelectorGadget bookmark, time click first paragraph intro.  obviously selects many elements, click one paragraphs shouldn’t match. turns red indicating element shouldn’t matched.  looks good, convert R code: correct, ’ve lost connection title intro. fix problem need take step back see can find element identifies data one movie. carefully hovering, can figure section selector seems job: can get title film: contents intro: pretty common experience — SelectorGadget get started finding useful selectors ’ll often combine code.","code":"library(rvest) html <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\") html %>%    html_element(\"h2\") %>%    html_text2() #> [1] \"The Phantom Menace\" html %>%    html_elements(\".crawl p\") %>%    html_text2() %>%    .[1:4] #> [1] \"Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute.\"                                                                                                                #> [2] \"Hoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo.\"                                                                          #> [3] \"While the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict….\" #> [4] \"There is unrest in the Galactic Senate. Several thousand solar systems have declared their intentions to leave the Republic.\" films <- html %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleas ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nRelease ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-0 ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nRel ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: ... films %>%    html_element(\"h2\") %>%    html_text2() #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\" films %>%    html_element(\".crawl\") %>%    html_text2() %>%    .[[1]] %>%    writeLines() #> Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute. #>  #> Hoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo. #>  #> While the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict…."},{"path":"https://rvest.tidyverse.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hadley Wickham. Author, maintainer. . Copyright holder, funder.","code":""},{"path":"https://rvest.tidyverse.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Wickham H (2024). rvest: Easily Harvest (Scrape) Web Pages. R package version 1.0.4.9000, https://github.com/tidyverse/rvest, https://rvest.tidyverse.org/.","code":"@Manual{,   title = {rvest: Easily Harvest (Scrape) Web Pages},   author = {Hadley Wickham},   year = {2024},   note = {R package version 1.0.4.9000, https://github.com/tidyverse/rvest},   url = {https://rvest.tidyverse.org/}, }"},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"Easily Harvest (Scrape) Web Pages","text":"rvest helps scrape (harvest) data web pages. designed work magrittr make easy express common web scraping tasks, inspired libraries like beautiful soup RoboBrowser. ’re scraping multiple pages, highly recommend using rvest concert polite. polite package ensures ’re respecting robots.txt hammering site many requests.","code":""},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Easily Harvest (Scrape) Web Pages","text":"","code":"# The easiest way to get rvest is to install the whole tidyverse: install.packages(\"tidyverse\")  # Alternatively, install just rvest: install.packages(\"rvest\")"},{"path":"https://rvest.tidyverse.org/dev/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"Easily Harvest (Scrape) Web Pages","text":"page contains tabular data can convert directly data frame html_table():","code":"library(rvest)  # Start by reading a HTML page with read_html(): starwars <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\")  # Then find elements that match a css selector or XPath expression # using html_elements(). In this example, each <section> corresponds # to a different film films <- starwars %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased: 1999 ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleased: 20 ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nReleased: 200 ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-05-25\\n ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nReleased: ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased: 1983 ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: 2015- ...  # Then use html_element() to extract one element per film. Here # we the title is given by the text inside <h2> title <- films %>%    html_element(\"h2\") %>%    html_text2() title #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\"  # Or use html_attr() to get data out of attributes. html_attr() always # returns a string so we convert it to an integer using a readr function episode <- films %>%    html_element(\"h2\") %>%    html_attr(\"data-id\") %>%    readr::parse_integer() episode #> [1] 1 2 3 4 5 6 7 html <- read_html(\"https://en.wikipedia.org/w/index.php?title=The_Lego_Movie&oldid=998422565\")  html %>%    html_element(\".tracklist\") %>%    html_table() #> # A tibble: 29 × 4 #>    No.   Title                       `Performer(s)`                       Length #>    <chr> <chr>                       <chr>                                <chr>  #>  1 1.    \"\\\"Everything Is Awesome\\\"\" \"Tegan and Sara featuring The Lonel… 2:43   #>  2 2.    \"\\\"Prologue\\\"\"              \"\"                                   2:28   #>  3 3.    \"\\\"Emmett's Morning\\\"\"      \"\"                                   2:00   #>  4 4.    \"\\\"Emmett Falls in Love\\\"\"  \"\"                                   1:11   #>  5 5.    \"\\\"Escape\\\"\"                \"\"                                   3:26   #>  6 6.    \"\\\"Into the Old West\\\"\"     \"\"                                   1:00   #>  7 7.    \"\\\"Wyldstyle Explains\\\"\"    \"\"                                   1:21   #>  8 8.    \"\\\"Emmett's Mind\\\"\"         \"\"                                   2:17   #>  9 9.    \"\\\"The Transformation\\\"\"    \"\"                                   1:46   #> 10 10.   \"\\\"Saloons and Wagons\\\"\"    \"\"                                   3:38   #> # ℹ 19 more rows"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":null,"dir":"Reference","previous_headings":"","what":"Interact with a live web page — LiveHTML","title":"Interact with a live web page — LiveHTML","text":"construct LiveHTML object read_html_live() interact, like human, using methods described . debugging scraping script particularly useful use $view(), open live preview site, can actually see operations performed real site. rvest provides relatively simple methods scrolling, typing, clicking. richer interaction, probably want use package exposes powerful user interface, like selendir.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Interact with a live web page — LiveHTML","text":"session Underlying chromote session object. expert use .","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Interact with a live web page — LiveHTML","text":"LiveHTML$new() LiveHTML$print() LiveHTML$view() LiveHTML$html_elements() LiveHTML$click() LiveHTML$get_scroll_position() LiveHTML$scroll_into_view() LiveHTML$scroll_to() LiveHTML$scroll_by() LiveHTML$type() LiveHTML$press() LiveHTML$clone()","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Interact with a live web page — LiveHTML","text":"initialize object","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$new(url)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"url URL page.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-print-","dir":"Reference","previous_headings":"","what":"Method print()","title":"Interact with a live web page — LiveHTML","text":"Called print()ed","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$print(...)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"... Ignored","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-view-","dir":"Reference","previous_headings":"","what":"Method view()","title":"Interact with a live web page — LiveHTML","text":"Display live view site","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$view()"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-html-elements-","dir":"Reference","previous_headings":"","what":"Method html_elements()","title":"Interact with a live web page — LiveHTML","text":"Extract HTML elements current page.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$html_elements(css, xpath)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css, xpath CSS selector xpath expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-click-","dir":"Reference","previous_headings":"","what":"Method click()","title":"Interact with a live web page — LiveHTML","text":"Simulate click HTML element.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$click(css, n_clicks = 1)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. n_clicks Number clicks","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-get-scroll-position-","dir":"Reference","previous_headings":"","what":"Method get_scroll_position()","title":"Interact with a live web page — LiveHTML","text":"Get current scroll position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$get_scroll_position()"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-into-view-","dir":"Reference","previous_headings":"","what":"Method scroll_into_view()","title":"Interact with a live web page — LiveHTML","text":"Scroll selected element view.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_into_view(css)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-to-","dir":"Reference","previous_headings":"","what":"Method scroll_to()","title":"Interact with a live web page — LiveHTML","text":"Scroll specified location","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_to(top = 0, left = 0)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"top, left Number pixels top/left respectively.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-scroll-by-","dir":"Reference","previous_headings":"","what":"Method scroll_by()","title":"Interact with a live web page — LiveHTML","text":"Scroll specified amount","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$scroll_by(top = 0, left = 0)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"top, left Number pixels scroll /left/right respectively.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-type-","dir":"Reference","previous_headings":"","what":"Method type()","title":"Interact with a live web page — LiveHTML","text":"Type text selected element","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$type(css, text)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. text single string containing text type.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-press-","dir":"Reference","previous_headings":"","what":"Method press()","title":"Interact with a live web page — LiveHTML","text":"Simulate pressing single key (including special keys).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$press(css, key_code, modifiers = character())"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"css CSS selector xpath expression. Set NULL key_code Name key. can see complete list known keys https://pptr.dev/api/puppeteer.keyinput/. modifiers character vector modifiers. Must one \"Shift, \"Control\", \"Alt\", \"Meta\".","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Interact with a live web page — LiveHTML","text":"objects class cloneable method.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Interact with a live web page — LiveHTML","text":"","code":"LiveHTML$clone(deep = FALSE)"},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interact with a live web page — LiveHTML","text":"deep Whether make deep clone.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/LiveHTML.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Interact with a live web page — LiveHTML","text":"","code":"if (FALSE) { # To retrieve data for this paginated site, we need to repeatedly push # the \"Load More\" button sess <- read_html_live(\"https://www.bodybuilding.com/exercises/finder\") sess$view()  sess %>% html_elements(\".ExResult-row\") %>% length() sess$click(\".ExLoadMore-btn\") sess %>% html_elements(\".ExResult-row\") %>% length() sess$click(\".ExLoadMore-btn\") sess %>% html_elements(\".ExResult-row\") %>% length() }"},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Make link to google form given id — google_form","title":"Make link to google form given id — google_form","text":"Make link google form given id","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make link to google form given id — google_form","text":"","code":"google_form(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/google_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make link to google form given id — google_form","text":"x Unique identifier form","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element attributes — html_attr","title":"Get element attributes — html_attr","text":"html_attr() gets single attribute; html_attrs() gets attributes.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element attributes — html_attr","text":"","code":"html_attr(x, name, default = NA_character_)  html_attrs(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element attributes — html_attr","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). name Name attribute retrieve. default string used default value attribute exist every element.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element attributes — html_attr","text":"character vector (html_attr()) list (html_attrs()) length x.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_attr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element attributes — html_attr","text":"","code":"html <- minimal_html('<ul>   <li><a href=\"https://a.com\" class=\"important\">a<\/a><\/li>   <li class=\"active\"><a href=\"https://c.com\">b<\/a><\/li>   <li><a href=\"https://c.com\">b<\/a><\/li>   <\/ul>')  html %>% html_elements(\"a\") %>% html_attrs() #> [[1]] #>            href           class  #> \"https://a.com\"     \"important\"  #>  #> [[2]] #>            href  #> \"https://c.com\"  #>  #> [[3]] #>            href  #> \"https://c.com\"  #>   html %>% html_elements(\"a\") %>% html_attr(\"href\") #> [1] \"https://a.com\" \"https://c.com\" \"https://c.com\" html %>% html_elements(\"li\") %>% html_attr(\"class\") #> [1] NA       \"active\" NA       html %>% html_elements(\"li\") %>% html_attr(\"class\", default = \"inactive\") #> [1] \"inactive\" \"active\"   \"inactive\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element children — html_children","title":"Get element children — html_children","text":"Get element children","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element children — html_children","text":"","code":"html_children(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element children — html_children","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_children.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element children — html_children","text":"","code":"html <- minimal_html(\"<ul><li>1<li>2<li>3<\/ul>\") ul <- html_elements(html, \"ul\") html_children(ul) #> {xml_nodeset (3)} #> [1] <li>1<\/li>\\n #> [2] <li>2<\/li>\\n #> [3] <li>3<\/li>  html <- minimal_html(\"<p>Hello <b>Hadley<\/b><i>!<\/i>\") p <- html_elements(html, \"p\") html_children(p) #> {xml_nodeset (2)} #> [1] <b>Hadley<\/b> #> [2] <i>!<\/i>"},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":null,"dir":"Reference","previous_headings":"","what":"Select elements from an HTML document — html_element","title":"Select elements from an HTML document — html_element","text":"html_element() html_elements() find HTML element using CSS selectors XPath expressions. CSS selectors particularly useful conjunction https://selectorgadget.com/, makes easy discover selector need.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Select elements from an HTML document — html_element","text":"","code":"html_element(x, css, xpath)  html_elements(x, css, xpath)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Select elements from an HTML document — html_element","text":"x Either document, node set single node. css, xpath Elements select. Supply one css xpath depending whether want use CSS selector XPath 1.0 expression.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Select elements from an HTML document — html_element","text":"html_element() returns nodeset length input. html_elements() flattens output direct way map output input.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"css-selector-support","dir":"Reference","previous_headings":"","what":"CSS selector support","title":"Select elements from an HTML document — html_element","text":"CSS selectors translated XPath selectors selectr package, port python cssselect library, https://pythonhosted.org/cssselect/. implements majority CSS3 selectors, described https://www.w3.org/TR/2011/REC-css3-selectors-20110929/. exceptions listed : Pseudo selectors require interactivity ignored: :hover, :active, :focus, :target, :visited. following pseudo classes work wild card element, *: *:first--type, *:last--type, *:nth--type, *:nth-last--type, *:--type supports :contains(text) can use !=, [foo!=bar] :([foo=bar]) :() accepts sequence simple selectors, just single simple selector.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_element.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Select elements from an HTML document — html_element","text":"","code":"html <- minimal_html(\"   <h1>This is a heading<\/h1>   <p id='first'>This is a paragraph<\/p>   <p class='important'>This is an important paragraph<\/p> \")  html %>% html_element(\"h1\") #> {html_node} #> <h1> html %>% html_elements(\"p\") #> {xml_nodeset (2)} #> [1] <p id=\"first\">This is a paragraph<\/p> #> [2] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\".important\") #> {xml_nodeset (1)} #> [1] <p class=\"important\">This is an important paragraph<\/p> html %>% html_elements(\"#first\") #> {xml_nodeset (1)} #> [1] <p id=\"first\">This is a paragraph<\/p>  # html_element() vs html_elements() -------------------------------------- html <- minimal_html(\"   <ul>     <li><b>C-3PO<\/b> is a <i>droid<\/i> that weighs <span class='weight'>167 kg<\/span><\/li>     <li><b>R2-D2<\/b> is a <i>droid<\/i> that weighs <span class='weight'>96 kg<\/span><\/li>     <li><b>Yoda<\/b> weighs <span class='weight'>66 kg<\/span><\/li>     <li><b>R4-P17<\/b> is a <i>droid<\/i><\/li>   <\/ul> \") li <- html %>% html_elements(\"li\")  # When applied to a node set, html_elements() returns all matching elements # beneath any of the inputs, flattening results into a new node set. li %>% html_elements(\"i\") #> {xml_nodeset (3)} #> [1] <i>droid<\/i> #> [2] <i>droid<\/i> #> [3] <i>droid<\/i>  # When applied to a node set, html_element() always returns a vector the # same length as the input, using a \"missing\" element where needed. li %>% html_element(\"i\") #> {xml_nodeset (4)} #> [1] <i>droid<\/i> #> [2] <i>droid<\/i> #> [3] NA #> [4] <i>droid<\/i> # and html_text() and html_attr() will return NA li %>% html_element(\"i\") %>% html_text2() #> [1] \"droid\" \"droid\" NA      \"droid\" li %>% html_element(\"span\") %>% html_attr(\"class\") #> [1] \"weight\" \"weight\" \"weight\" NA"},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":null,"dir":"Reference","previous_headings":"","what":"Guess faulty character encoding — html_encoding_guess","title":"Guess faulty character encoding — html_encoding_guess","text":"html_encoding_guess() helps handle web pages declare incorrect encoding. Use html_encoding_guess() generate list possible encodings, try using encoding argument read_html(). html_encoding_guess() replaces deprecated guess_encoding().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Guess faulty character encoding — html_encoding_guess","text":"","code":"html_encoding_guess(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Guess faulty character encoding — html_encoding_guess","text":"x character vector.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_encoding_guess.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Guess faulty character encoding — html_encoding_guess","text":"","code":"# A file with bad encoding included in the package path <- system.file(\"html-ex\", \"bad-encoding.html\", package = \"rvest\") x <- read_html(path) x %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célèbre déjà vu.\"  html_encoding_guess(x) #>        encoding language confidence #> 1         UTF-8                1.00 #> 2  windows-1252       fr       0.31 #> 3  windows-1250       ro       0.22 #> 4      UTF-16BE                0.10 #> 5      UTF-16LE                0.10 #> 6       GB18030       zh       0.10 #> 7          Big5       zh       0.10 #> 8  windows-1254       tr       0.06 #> 9    IBM424_rtl       he       0.01 #> 10   IBM424_ltr       he       0.01 # Two valid encodings, only one of which is correct read_html(path, encoding = \"ISO-8859-1\") %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célèbre déjà vu.\" read_html(path, encoding = \"ISO-8859-2\") %>% html_elements(\"p\") %>% html_text() #> [1] \"Émigré cause célčbre déjŕ vu.\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse forms and set values — html_form","title":"Parse forms and set values — html_form","text":"Use html_form() extract form, set values html_form_set(), submit html_form_submit().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse forms and set values — html_form","text":"","code":"html_form(x, base_url = NULL)  html_form_set(form, ...)  html_form_submit(form, submit = NULL)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse forms and set values — html_form","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). base_url Base url underlying HTML document. default, NULL, uses url HTML document underlying x. form form ... <dynamic-dots> Name-value pairs giving fields modify. Provide character vector set multiple checkboxes set select multiple values multi-select. submit button used submit form? NULL, default, uses first button. string selects button name. number selects button using relative position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse forms and set values — html_form","text":"html_form() returns S3 object class rvest_form applied single element. returns list rvest_form objects applied multiple elements document. html_form_set() returns rvest_form object. html_form_submit() submits form, returning httr response can parsed read_html().","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/html_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Parse forms and set values — html_form","text":"","code":"html <- read_html(\"http://www.google.com\") search <- html_form(html)[[1]]  search <- search %>% html_form_set(q = \"My little pony\", hl = \"fr\") #> Warning: Setting value of hidden field \"hl\".  # Or if you have a list of values, use !!! vals <- list(q = \"web scraping\", hl = \"en\") search <- search %>% html_form_set(!!!vals) #> Warning: Setting value of hidden field \"hl\".  # To submit and get result: if (FALSE) { resp <- html_form_submit(search) read_html(resp) }"},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element name — html_name","title":"Get element name — html_name","text":"Get element name","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element name — html_name","text":"","code":"html_name(x)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element name — html_name","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element name — html_name","text":"character vector length x","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_name.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element name — html_name","text":"","code":"url <- \"https://rvest.tidyverse.org/articles/starwars.html\" html <- read_html(url)  html %>%   html_element(\"div\") %>%   html_children() %>%   html_name() #> [1] \"a\"      \"small\"  \"button\" \"div\""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse an html table into a data frame — html_table","title":"Parse an html table into a data frame — html_table","text":"algorithm mimics browser , repeats values merged cells every cell cover.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse an html table into a data frame — html_table","text":"","code":"html_table(   x,   header = NA,   trim = TRUE,   fill = deprecated(),   dec = \".\",   na.strings = \"NA\",   convert = TRUE )"},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse an html table into a data frame — html_table","text":"x document (read_html()), node set (html_elements()), node (html_element()), session (session()). header Use first row header? NA, use first row consists <th> tags. TRUE, column names left exactly source document, may require post-processing generate valid data frame. trim Remove leading trailing whitespace within cell? fill Deprecated - missing cells tables now always automatically filled NA. dec character used decimal place marker. na.strings Character vector values converted NA convert TRUE. convert TRUE, run type.convert() interpret texts integer, double, NA.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse an html table into a data frame — html_table","text":"applied single element, html_table() returns single tibble. applied multiple elements document, html_table() returns list tibbles.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_table.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Parse an html table into a data frame — html_table","text":"","code":"sample1 <- minimal_html(\"<table>   <tr><th>Col A<\/th><th>Col B<\/th><\/tr>   <tr><td>1<\/td><td>x<\/td><\/tr>   <tr><td>4<\/td><td>y<\/td><\/tr>   <tr><td>10<\/td><td>z<\/td><\/tr> <\/table>\") sample1 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 2 #>   `Col A` `Col B` #>     <int> <chr>   #> 1       1 x       #> 2       4 y       #> 3      10 z        # Values in merged cells will be duplicated sample2 <- minimal_html(\"<table>   <tr><th>A<\/th><th>B<\/th><th>C<\/th><\/tr>   <tr><td>1<\/td><td>2<\/td><td>3<\/td><\/tr>   <tr><td colspan='2'>4<\/td><td>5<\/td><\/tr>   <tr><td>6<\/td><td colspan='2'>7<\/td><\/tr> <\/table>\") sample2 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 3 #>       A     B     C #>   <int> <int> <int> #> 1     1     2     3 #> 2     4     4     5 #> 3     6     7     7  # If a row is missing cells, they'll be filled with NAs sample3 <- minimal_html(\"<table>   <tr><th>A<\/th><th>B<\/th><th>C<\/th><\/tr>   <tr><td colspan='2'>1<\/td><td>2<\/td><\/tr>   <tr><td colspan='2'>3<\/td><\/tr>   <tr><td>4<\/td><\/tr> <\/table>\") sample3 %>%   html_element(\"table\") %>%   html_table() #> # A tibble: 3 × 3 #>       A     B     C #>   <int> <int> <int> #> 1     1     1     2 #> 2     3     3    NA #> 3     4    NA    NA"},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":null,"dir":"Reference","previous_headings":"","what":"Get element text — html_text","title":"Get element text — html_text","text":"two ways retrieve text element: html_text() html_text2(). html_text() thin wrapper around xml2::xml_text() returns just raw underlying text. html_text2() simulates text looks browser, using approach inspired JavaScript's innerText(). Roughly speaking, converts <br /> \"\\n\", adds blank lines around <p> tags, lightly formats tabular data. html_text2() usually want, much slower html_text() simple applications performance important may want use html_text() instead.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get element text — html_text","text":"","code":"html_text(x, trim = FALSE)  html_text2(x, preserve_nbsp = FALSE)"},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get element text — html_text","text":"x document, node, node set. trim TRUE trim leading trailing spaces. preserve_nbsp non-breaking spaces preserved? default, html_text2() converts ordinary spaces ease computation. preserve_nbsp TRUE, &nbsp; appear strings \"\\ua0\". often causes confusion prints way \" \".","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get element text — html_text","text":"character vector length x","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/html_text.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get element text — html_text","text":"","code":"# To understand the difference between html_text() and html_text2() # take the following html:  html <- minimal_html(   \"<p>This is a paragraph.     This another sentence.<br>This should start on a new line\" )  # html_text() returns the raw underlying text, which includes whitespace # that would be ignored by a browser, and ignores the <br> html %>% html_element(\"p\") %>% html_text() %>% writeLines() #> This is a paragraph. #>     This another sentence.This should start on a new line  # html_text2() simulates what a browser would display. Non-significant # whitespace is collapsed, and <br> is turned into a line break html %>% html_element(\"p\") %>% html_text2() %>% writeLines() #> This is a paragraph. This another sentence. #> This should start on a new line  # By default, html_text2() also converts non-breaking spaces to regular # spaces: html <- minimal_html(\"<p>x&nbsp;y<\/p>\") x1 <- html %>% html_element(\"p\") %>% html_text() x2 <- html %>% html_element(\"p\") %>% html_text2()  # When printed, non-breaking spaces look exactly like regular spaces x1 #> [1] \"x y\" x2 #> [1] \"x y\" # But aren't actually the same: x1 == x2 #> [1] FALSE # Which you can confirm by looking at their underlying binary # representaion: charToRaw(x1) #> [1] 78 c2 a0 79 charToRaw(x2) #> [1] 78 20 79"},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an HTML document from inline HTML — minimal_html","title":"Create an HTML document from inline HTML — minimal_html","text":"Create HTML document inline HTML","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an HTML document from inline HTML — minimal_html","text":"","code":"minimal_html(html, title = \"\")"},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an HTML document from inline HTML — minimal_html","text":"html HTML contents page. title Page title (required HTML spec).","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/minimal_html.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create an HTML document from inline HTML — minimal_html","text":"","code":"minimal_html(\"<p>test<\/p>\") #> {html_document} #> <html> #> [1] <head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset ... #> [2] <body><p>test<\/p><\/body>"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":null,"dir":"Reference","previous_headings":"","what":"Static web scraping (with xml2) — read_html","title":"Static web scraping (with xml2) — read_html","text":"read_html() works performing HTTP request parsing HTML received using xml2 package. \"static\" scraping operates raw HTML file. works sites, cases need use read_html_live() parts page want scrape dynamically generated javascript. Generally, recommend using read_html() works, faster robust, fewer external dependencies (.e. rely Chrome web browser installed computer.)","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Static web scraping (with xml2) — read_html","text":"","code":"read_html(x, encoding = \"\", ..., options = c(\"RECOVER\", \"NOERROR\", \"NOBLANKS\"))"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Static web scraping (with xml2) — read_html","text":"x Usually string representing URL. See xml2::read_html() options. encoding Specify default encoding document. Unless otherwise specified XML documents assumed UTF-8 UTF-16. document UTF-8/16, lacks explicit encoding directive, allows supply default. ... Additional arguments passed methods. options Set parsing options libxml2 parser. Zero RECOVER recover errors NOENT substitute entities DTDLOAD load external subset DTDATTR default DTD attributes DTDVALID validate DTD NOERROR suppress error reports NOWARNING suppress warning reports PEDANTIC pedantic error reporting NOBLANKS remove blank nodes SAX1 use SAX1 interface internally XINCLUDE Implement XInclude substitition NONET Forbid network access NODICT reuse context dictionary NSCLEAN remove redundant namespaces declarations NOCDATA merge CDATA text nodes NOXINCNODE generate XINCLUDE START/END nodes COMPACT compact small text nodes; modification tree allowed afterwards (possibly crash try modify tree) OLD10 parse using XML-1.0 update 5 NOBASEFIX fixup XINCLUDE xml:base uris HUGE relax hardcoded limit parser OLDSAX parse using SAX2 interface 2.7.0 IGNORE_ENC ignore internal document encoding hint BIG_LINES Store big lines numbers text PSVI field","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Static web scraping (with xml2) — read_html","text":"","code":"# Start by reading a HTML page with read_html(): starwars <- read_html(\"https://rvest.tidyverse.org/articles/starwars.html\")  # Then find elements that match a css selector or XPath expression # using html_elements(). In this example, each <section> corresponds # to a different film films <- starwars %>% html_elements(\"section\") films #> {xml_nodeset (7)} #> [1] <section><h2 data-id=\"1\">\\nThe Phantom Menace\\n<\/h2>\\n<p>\\nReleased ... #> [2] <section><h2 data-id=\"2\">\\nAttack of the Clones\\n<\/h2>\\n<p>\\nReleas ... #> [3] <section><h2 data-id=\"3\">\\nRevenge of the Sith\\n<\/h2>\\n<p>\\nRelease ... #> [4] <section><h2 data-id=\"4\">\\nA New Hope\\n<\/h2>\\n<p>\\nReleased: 1977-0 ... #> [5] <section><h2 data-id=\"5\">\\nThe Empire Strikes Back\\n<\/h2>\\n<p>\\nRel ... #> [6] <section><h2 data-id=\"6\">\\nReturn of the Jedi\\n<\/h2>\\n<p>\\nReleased ... #> [7] <section><h2 data-id=\"7\">\\nThe Force Awakens\\n<\/h2>\\n<p>\\nReleased: ...  # Then use html_element() to extract one element per film. Here # we the title is given by the text inside <h2> title <- films %>%   html_element(\"h2\") %>%   html_text2() title #> [1] \"The Phantom Menace\"      \"Attack of the Clones\"    #> [3] \"Revenge of the Sith\"     \"A New Hope\"              #> [5] \"The Empire Strikes Back\" \"Return of the Jedi\"      #> [7] \"The Force Awakens\"        # Or use html_attr() to get data out of attributes. html_attr() always # returns a string so we convert it to an integer using a readr function episode <- films %>%   html_element(\"h2\") %>%   html_attr(\"data-id\") %>%   readr::parse_integer() episode #> [1] 1 2 3 4 5 6 7"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":null,"dir":"Reference","previous_headings":"","what":"Live web scraping (with chromote) — read_html_live","title":"Live web scraping (with chromote) — read_html_live","text":"read_html() operates HTML source code downloaded server. works websites can fail site uses javascript generate HTML. read_html_live() provides alternative interface runs live web browser (Chrome) background. allows access elements HTML page generated dynamically javascript interact live page clicking buttons typing forms. Behind scenes, function uses chromote package, requires copy Google Chrome installed machine.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Live web scraping (with chromote) — read_html_live","text":"","code":"read_html_live(url)"},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Live web scraping (with chromote) — read_html_live","text":"url Website url read .","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Live web scraping (with chromote) — read_html_live","text":"read_html_live() returns R6 LiveHTML object. can interact object using usual rvest functions, call methods, like $click(), $scroll_to(), $type() interact live page like human .","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/read_html_live.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Live web scraping (with chromote) — read_html_live","text":"","code":"if (FALSE) { # When we retrieve the raw HTML for this site, it doesn't contain the # data we're interested in: static <- read_html(\"https://www.forbes.com/top-colleges/\") static %>% html_elements(\".TopColleges2023_tableRow__BYOSU\")  # Instead, we need to run the site in a real web browser, causing it to # download a JSON file and then dynamically generate the html:  sess <- read_html_live(\"https://www.forbes.com/top-colleges/\") sess$view() rows <- sess %>% html_elements(\".TopColleges2023_tableRow__BYOSU\") rows %>% html_element(\".TopColleges2023_organizationName__J1lEV\") %>% html_text() rows %>% html_element(\".grant-aid\") %>% html_text() }"},{"path":"https://rvest.tidyverse.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. magrittr %>% xml2 url_absolute","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rename.html","id":null,"dir":"Reference","previous_headings":"","what":"Functions renamed in rvest 1.0.0 — rename","title":"Functions renamed in rvest 1.0.0 — rename","text":"rvest 1.0.0 renamed number functions ensure every function common prefix, matching tidyverse conventions emerged since rvest first created. set_values() -> html_form_set() submit_form() -> session_submit() xml_tag() -> html_name() xml_node() & html_node() -> html_element() xml_nodes() & html_nodes() -> html_elements() (html_node() html_nodes() superseded widely used.) Additionally session related functions gained common prefix: html_session() -> session() forward() -> session_forward() back() -> session_back() jump_to() -> session_jump_to() follow_link() -> session_follow_link()","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rename.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Functions renamed in rvest 1.0.0 — rename","text":"","code":"set_values(form, ...)  submit_form(session, form, submit = NULL, ...)  xml_tag(x)  xml_node(...)  xml_nodes(...)  html_nodes(...)  html_node(...)  back(x)  forward(x)  jump_to(x, url, ...)  follow_link(x, ...)  html_session(url, ...)"},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":null,"dir":"Reference","previous_headings":"","what":"Repair faulty encoding — repair_encoding","title":"Repair faulty encoding — repair_encoding","text":"function deprecated work. Instead re-read HTML file correct encoding argument.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Repair faulty encoding — repair_encoding","text":"","code":"repair_encoding(x, from = NULL)"},{"path":"https://rvest.tidyverse.org/dev/reference/repair_encoding.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Repair faulty encoding — repair_encoding","text":"encoding string actually . NULL, guess_encoding used.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/rvest-package.html","id":null,"dir":"Reference","previous_headings":"","what":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","title":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","text":"Wrappers around 'xml2' 'httr' packages make easy download, manipulate, HTML XML.","code":""},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/reference/rvest-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"rvest: Easily Harvest (Scrape) Web Pages — rvest-package","text":"Maintainer: Hadley Wickham hadley@posit.co contributors: Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate a session in web browser — session","title":"Simulate a session in web browser — session","text":"set functions allows simulate user interacting website, using forms navigating page page. Create session session(url) Navigate specified url session_jump_to(), follow link page session_follow_link(). Submit html_form session_submit(). View history session_history() navigate back forward session_back() session_forward(). Extract page contents html_element() html_elements(), get complete HTML document read_html(). Inspect HTTP response httr::cookies(), httr::headers(), httr::status_code().","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate a session in web browser — session","text":"","code":"session(url, ...)  is.session(x)  session_jump_to(x, url, ...)  session_follow_link(x, i, css, xpath, ...)  session_back(x)  session_forward(x)  session_history(x)  session_submit(x, form, submit = NULL, ...)"},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate a session in web browser — session","text":"url URL, either relative absolute, navigate . ... additional httr config use throughout session. x session. integer select ith link string match first link containing text (case sensitive). css, xpath Elements select. Supply one css xpath depending whether want use CSS selector XPath 1.0 expression. form html_form submit submit button used submit form? NULL, default, uses first button. string selects button name. number selects button using relative position.","code":""},{"path":"https://rvest.tidyverse.org/dev/reference/session.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate a session in web browser — session","text":"","code":"s <- session(\"http://hadley.nz\") s %>%   session_jump_to(\"hadley-wickham.jpg\") %>%   session_jump_to(\"/\") %>%   session_history() #> Warning: Not Found (HTTP 404). #>   https://hadley.nz/ #>   https://hadley.nz/hadley-wickham.jpg #> - https://hadley.nz/  s %>%   session_jump_to(\"hadley-wickham.jpg\") %>%   session_back() %>%   session_history() #> Warning: Not Found (HTTP 404). #> - https://hadley.nz/ #>   https://hadley.nz/hadley-wickham.jpg  # \\donttest{ s %>%   session_follow_link(css = \"p a\") %>%   html_elements(\"p\") #> Navigating to <http://rstudio.com>. #> {xml_nodeset (16)} #>  [1] <p class=\"h5\">See you in Seattle August 12-14!<\/p> #>  [2] <p>Securely share data-science applications<br>\\n across your team ... #>  [3] <p>Our code is your code. Build on it. Share it. Improve people’s  ... #>  [4] <p>Take the time and effort out of uploading, storing, accessing,  ... #>  [5] <p class=\"sh4 uppercase mb-[8px] text-blue1\">\\n            Custome ... #>  [6] <p class=\"mt-[8px] body-md-regular text-blue1/[.62]\">\\n            ... #>  [7] <p class=\"mt-[16px] body-md-regular text-neutral-blue62 line-clamp ... #>  [8] <p class=\"mt-[16px] body-md-regular text-neutral-blue62 line-clamp ... #>  [9] <p class=\"description body-lg-regular text-neutral-light/70\" style ... #> [10] <p class=\"body-sm-regular text-blue1/[.62] mt-[25px]\">\\n           ... #> [11] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [12] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [13] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [14] <p class=\"ui-small uppercase text-blue1\">\\n                        ... #> [15] <p class=\"ui-small uppercase text-blue1\">\\n                    con ... #> [16] <p class=\"body-md-regular body-sm-regular\">We use cookies to bring ... # }"},{"path":[]},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-104","dir":"Changelog","previous_headings":"","what":"rvest 1.0.4","title":"rvest 1.0.4","text":"CRAN release: 2024-02-12 New read_html_live() reads HTML real, live, HTML browser, meaning can scrape HTML generated javascript. returns LiveHTML object can also use simulate user interactions page, like clicking, typing, scrolling (#245). html_table() discards rows without cells (@epiben, #360).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-103","dir":"Changelog","previous_headings":"","what":"rvest 1.0.3","title":"rvest 1.0.3","text":"CRAN release: 2022-08-19 Re-document fix HTML issues .Rd.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-102","dir":"Changelog","previous_headings":"","what":"rvest 1.0.2","title":"rvest 1.0.2","text":"CRAN release: 2021-10-16 Fixes CRAN html_table() converts empty tables empty tibbles (@epiben, #327).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-101","dir":"Changelog","previous_headings":"","what":"rvest 1.0.1","title":"rvest 1.0.1","text":"CRAN release: 2021-07-26 html_table() correctly handles tables cells contain blank values rowspan /colspan, e.g. <td rowspan=\"\"> parsed <td rowspan=1> (@epiben, #323). Fix broken example","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-100","dir":"Changelog","previous_headings":"","what":"rvest 1.0.0","title":"rvest 1.0.0","text":"CRAN release: 2021-03-09","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"new-features-1-0-0","dir":"Changelog","previous_headings":"","what":"New features","title":"rvest 1.0.0","text":"New html_text2() provides natural rendering HTML nodes text, converting <br> “”, removing non-significant whitespace (#175). default, also converts &nbsp; regular spaces, can suppress preserve_nbsp = TRUE (#284). html_table() re-written scratch closely mimic algorithm browsers use parsing tables. mean far fewer tables fails produce output (#63, #204, #215). fill argument deprecated since longer needed. html_table() now returns tibble rather data frame compatible rest tidyverse (#199). performance considerably improved (#237). also gains na.strings argument control values converted NA (#107), convert argument control whether run conversion (#311). New html_form_submit() allows submit form directly, without needing create session (#300). rvest now licensed MIT (#287).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"api-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"API changes","title":"rvest 1.0.0","text":"Since 1.0.0 release, included large number API changes make rvest compatible current tidyverse conventions. Older functions deprecated, existing code continue work (albeit new warnings). rvest now imports xml2 rather depending . cleaner avoids attaching xml2 functions ’re less likely use. reduce change breakages, rvest re-exports xml2 functions read_html() url_absolute(), code may now need explicit library(xml2). html_form() now returns object class rvest_form (instead form). Fields within form now class rvest_field, instead variety classes lacking rvest_ prefix. functions working forms common html_form_ prefix: set_values() became html_form_set(). submit_form() renamed session_submit() returns session. html_node() html_nodes() superseded favor html_element() html_elements() since (almost) always return elements, nodes (#298). html_session() now session() returns object class rvest_session (instead session). functions work session objects now common session_ prefix. Long deprecated html(), html_tag(), xml() functions removed. minimal_html() (doesn’t appear used package) arguments flipped make intuitive. guess_encoding() renamed html_encoding_guess() avoid clash stringr::guess_encoding() (#209). repair_encoding() deprecated doesn’t appear work. pluck() longer exported avoid clash purrr::pluck(); need use purrr::map_chr() friends instead (#209). xml_tag(), xml_node(), xml_nodes() formally deprecated favor html_ equivalents.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"minor-improvements-and-bug-fixes-1-0-0","dir":"Changelog","previous_headings":"","what":"Minor improvements and bug fixes","title":"rvest 1.0.0","text":"“harvesting web” vignette rewritten focus basics rvest, eliminating screenshots keep installed package svelte possible. ’s also renamed vignette(\"rvest\") since ’s vignette read first. SelectorGadget vignette now web-article, https://rvest.tidyverse.org/articles/articles/selectorgadget.html, can generous screenshots since ’re longer bundled every install package. Together rewrite vignette, means rvest now ~90 Kb instead ~1.1 Mb. uses IMDB eliminated since site explicitly prohibits scraping (#195). session_submit() errors form doesn’t url (#288). New session_forward() function complement session_back(). now allows pick submission button position (#156). ... argument deprecated; please use config instead. html_form_set() can now accept character vectors allowing select multiple checkboxes set select multiple values multi-<select> (#127, help @juba). also uses dynamic dots can use !!! list values (#189).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-036","dir":"Changelog","previous_headings":"","what":"rvest 0.3.6","title":"rvest 0.3.6","text":"CRAN release: 2020-07-25 Remove failing example","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-035","dir":"Changelog","previous_headings":"","what":"rvest 0.3.5","title":"rvest 0.3.5","text":"CRAN release: 2019-11-08 Use web archive fix broken example.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-034","dir":"Changelog","previous_headings":"","what":"rvest 0.3.4","title":"rvest 0.3.4","text":"CRAN release: 2019-05-15 Remove unneeded read_xml.response() method (#242).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-033","dir":"Changelog","previous_headings":"","what":"rvest 0.3.3","title":"rvest 0.3.3","text":"CRAN release: 2019-04-11 Fix R CMD check failure submit_request() now checks empty form-field-types select correct submit fields (@rentrop, #159)","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-032","dir":"Changelog","previous_headings":"","what":"rvest 0.3.2","title":"rvest 0.3.2","text":"CRAN release: 2016-06-17 Fixes follow_link() back() correctly manage session history. ’re using xml2 1.0.0, html_node() now return “missing node”. Parse rowspans colspans effectively filling using repetition left right (colspan) top bottom (rowspan) (#111) Updated examples demos website structure changed. Made compatible xml2 0.1.2 1.0.0.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-031","dir":"Changelog","previous_headings":"","what":"rvest 0.3.1","title":"rvest 0.3.1","text":"CRAN release: 2015-11-11 Fix invalid link SSA example. Parse <options> don’t value attribute (#85). Remove remaining uses html() favor read_html() (@jimhester, #113).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-030","dir":"Changelog","previous_headings":"","what":"rvest 0.3.0","title":"rvest 0.3.0","text":"CRAN release: 2015-09-23 rvest rewritten take advantage new xml2 package. xml2 provides fresh binding libxml2, avoiding many work-arounds previously needed XML package. Now rvest depends xml2 package, xml functions available, rvest adds thin wrapper html. number functions change names. old versions still work, deprecated removed rvest 0.4.0. html_tag() -> html_name() html() -> read_html() html_node() now throws error matches, warning ’s one match. think make likely fail clearly structure page changes. xml_structure() moved xml2. New html_structure() (also xml2) highlights id class attributes (#78). submit_form() now works forms use GET (#66). submit_request() (hence submit_form()) now case-insensitive, find <input type=SUBMIT> well <input type=\"submit\">. submit_request() (hence submit_form()) recognizes forms <input type=\"image\"> valid form submission button.","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"rvest-020","dir":"Changelog","previous_headings":"","what":"rvest 0.2.0","title":"rvest 0.2.0","text":"CRAN release: 2015-01-01","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"new-features-0-2-0","dir":"Changelog","previous_headings":"","what":"New features","title":"rvest 0.2.0","text":"html() xml() pass ... httr::GET() can finely control request (#48). Add xml support: parse xml(), work using xml_node(), xml_attr(), xml_attrs(), xml_text() xml_tag() (#24). xml_structure(): new function displays structure (.e. tag attribute names) xml/html object (#10).","code":""},{"path":"https://rvest.tidyverse.org/dev/news/index.html","id":"bug-fixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"rvest 0.2.0","text":"follow_link() now accepts css xpath selectors. (#38, #41, #42) html() better job dealing encodings (passing problem XML::parseHTML()) instead trying (#25, #50). html_attr() returns default value input NULL (#49) Add missing html_node() method session. html_nodes() now returns empty list elements found (#31). submit_form() converts relative paths absolute URLs (#52). also deals better 0-length inputs (#29).","code":""}]