Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't select some element. #203

Closed
Tajmirul opened this issue May 30, 2022 · 4 comments
Closed

Can't select some element. #203

Tajmirul opened this issue May 30, 2022 · 4 comments
Labels

Comments

@Tajmirul
Copy link

I am trying to fetch the title and description of a Vimeo video. I brought the HTML successfully. But I can't select the description div.

Here is the code:

const { videoUrl } = req.body;
const vimeoResponse = await fetch(videoUrl);
const vimeoResponseTxt = await vimeoResponse.text();
const vimeoHtml = parse(vimeoResponseTxt);
const title = vimeoHtml.querySelector('meta[property=og:title]').getAttribute('content');
const description = vimeoHtml.innerHTML;

fs.writeFile('vimeo-video.html', description, error => {
    console.log(error);
});

this code brings the HTML. The HTML contains a div with class description-wrapper.

                  <div class="clip_details-description description-wrapper iris_desc">
                    <p class="first">Country music legend, Trish Cotton, has something to say.</p>
                    <p>
                      Written by Kyle Kasabian (@kylekasabian) <br />
                      Directed by Derek Mari (@directorderek)<br />
                      Director of Photography: Peter Mickelsen<br />
                      Produced by Derek Mari and Kyle Kasabian<br />
                      Edited by Derek Mari
                    </p>
                    <p>Starring: Alyssa Sabo, Janine Hogan, and Kyle Kasabian</p>
                    <p>
                        Assistant Camera: Casey Schoch<br />
                        Production Sound: David Alvarez<br />
                        Production Assistant: Keith Ahlstrom
                    </p>
                    <p>Music by Morgan Matthews</p>
                    <p>
                      Blink &amp; Miss Productions<br />
                      Bad Cat Films
                    </p>
                  </div>
                </div>

But when I try to select the div by querySelector it returns null.

const description = vimeoHtml.querySelector('.description-wrapper');
console.log(description); // null
@taoqf
Copy link
Owner

taoqf commented Jun 1, 2022

I'm afraid I could not find where the problem is.

const vimeoHtml = parse(`<div class="clip_details-description description-wrapper iris_desc">
                    <p class="first">Country music legend, Trish Cotton, has something to say.</p>
                    <p>
                      Written by Kyle Kasabian (@kylekasabian) <br />
                      Directed by Derek Mari (@directorderek)<br />
                      Director of Photography: Peter Mickelsen<br />
                      Produced by Derek Mari and Kyle Kasabian<br />
                      Edited by Derek Mari
                    </p>
                    <p>Starring: Alyssa Sabo, Janine Hogan, and Kyle Kasabian</p>
                    <p>
                        Assistant Camera: Casey Schoch<br />
                        Production Sound: David Alvarez<br />
                        Production Assistant: Keith Ahlstrom
                    </p>
                    <p>Music by Morgan Matthews</p>
                    <p>
                      Blink &amp; Miss Productions<br />
                      Bad Cat Films
                    </p>
                  </div>
                </div>`);
		const description = vimeoHtml.querySelector('.description-wrapper');
		description.toString().should.eql('<ul id="list"><li><a href="#">Some link</a></li></ul>');

@wolfie
Copy link

wolfie commented Feb 10, 2023

I'm not sure if this is exactly related, but this outputs "null" for me for [email protected] and node version 17.4.0:

import { parse } from "node-html-parser";
console.log(
  parse(
    `<html><body><pre><code class="language-typescript">type Foo = { foo: 'bar' }</code></pre></body></html>`
  ).querySelector("code")
);

@wolfie
Copy link

wolfie commented Feb 10, 2023

It seems like the bug is in the PRE tag - there's an assumption that it can't have child nodes:

import { parse } from "node-html-parser";

const convert = root => ({
  tag: root.tagName,
  textContent: root.textContent,
  children: [...root.childNodes].map(convert),
});

const tree = convert(
  parse(`<html><body><pre><code class="language-typescript">type Foo = { foo: 'bar' }</code></pre></body></html>`)
);

console.log(JSON.stringify(tree, null, 2));

This outputs:

{
  "tag": null,
  "textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
  "children": [
    {
      "tag": "HTML",
      "textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
      "children": [
        {
          "tag": "BODY",
          "textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
          "children": [
            {
              "tag": "PRE",
              "textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
              "children": [
                {
                  "textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
                  "children": []
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

@taoqf
Copy link
Owner

taoqf commented Aug 17, 2023

@wolfie try this

parse(html, {
	blockTextElements: {
		script: true,
		noscript: true,
		style: true,
	}
});

@taoqf taoqf closed this as completed Aug 17, 2023
taoqf added a commit that referenced this issue Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants