Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to remove nodes. #41

Open
XAMPPRocky opened this issue Aug 9, 2017 · 2 comments
Open

Add the ability to remove nodes. #41

XAMPPRocky opened this issue Aug 9, 2017 · 2 comments

Comments

@XAMPPRocky
Copy link

In BeautifulSoup there is the ability to remove nodes from the scraper, this is valuable for removing certain kinds of text or elements from text.

@utkarshkukreti
Copy link
Owner

Yes, I would definitely like to have this feature. Unfortunately, this would require major changes in the internals of the crate and I'm not sure what a good design would look like at this point.

Right now Document has a vector of node::Row and node::Node has a reference to the Document and a usize index. This means allowing removing/inserting nodes will require some kind of Arena like structure so that removed spots are available for reuse by nodes inserted later. We'll also have to not store a reference to Document in Node so that the Document can be mutated while one of its Node exists. We could have Node be just an index like the petgraph crate does but that'll make many current APIs verbose, e.g. document[node].text() instead of node.text(). Or we could just go and wrap everything in Rc<RefCell<>> but I'd like to not do that if at all possible.

I'm open to suggestions!

@sbeckeriv
Copy link

sbeckeriv commented Oct 15, 2020

would it be hard to just blank out the contents of the node? Or have a node::RemovedNode?

[edit]
Soft deletes. sbeckeriv@da9b245 I did not read all of the code to understand why this is a bad idea. Just proof of concept for my needs.
[edit]
not working as i would expect

            for mut node in &mut document
                .find(select::predicate::Name("noscript"))
                .borrow_mut()
            {
                node.delete();
dbg!(node);
            }

node shows deleted is true here but when the text() function is called it is not marked as deleted
[edit] it might have worked my document wasnt first listed a mut. I moved to a local version that takes the index number of the notes i want and skips them in the text view. sbeckeriv@2bb9c9d#diff-af08c3181737aa5783b96dfd920cd5ef70829f46cd1b697bdb42414c97310e13R143 i moved the function out of my fork and have a local text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants