Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing only <option selected> from a <select> list. #244

Open
thegoatherder opened this issue Jan 7, 2022 · 3 comments
Open

Parsing only <option selected> from a <select> list. #244

thegoatherder opened this issue Jan 7, 2022 · 3 comments

Comments

@thegoatherder
Copy link

The goal
My HTML contains <select> form controls. The parser extracts the text for every <option> in the menu. I want it to only extract the control as it's displayed - i.e. the <option selected>

Is there a configuration option that supports this? I can't find one on the docs.

Example:

<div>You have selected:</div>
<select>
   <option>A</option>
   <option>B</option>
   <option selected>C</option>
</select>

Currently this outputs as:

You have selected: A B C

Desired output

You have selected: C

Best attempt
I can try to preprocess the HTML in a DOM parser to remove the other options from the menu prior to handing it to html-to-text

@KillyMXI
Copy link
Member

KillyMXI commented Jan 7, 2022

I'd suggest to create a custom formatter for select tags.
In a formatter you have access to children nodes, you can inspect them and pick whatever you need instead of calling the walk function.

Start from this Readme section.
elem is a DOM Element as parsed by htmlparser2. You can use astexplorer to see how it represents tags and attributes of interest.

Of built-in formatters, list and table formatters handle all the children tags on their own, although they are a lot more complex than the formatter for select is going to be.

I haven't implemented formatters for form tags myself yet because it seems rather rare use-case for html-to-text for the amount of code to support. But with requests like this coming I may reprioritize it. So thanks for asking.

@KillyMXI
Copy link
Member

KillyMXI commented Jan 7, 2022

Alternatively, you can skip all option tags and only display option[selected], thanks to selectors support.

There is no :not() selector support (yet), so it will look like this:

{
  selectors: [
    { selector: 'option', format: 'skip'  },
    { selector: 'option[selected]', format: 'inline' }
  ]
}

@thegoatherder
Copy link
Author

Excellent! Thanks, I’ll give this a go on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants