Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace_with fails with Cannot replace one element with another when the element to be replaced is not part of a tree error #141

Open
fabienheureux opened this issue Jan 21, 2022 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@fabienheureux
Copy link
Contributor

fabienheureux commented Jan 21, 2022

Sorry about the title, I am not sure how to name this issue as it is very specific and related to a specific post.
I am still investigating where it comes from but thought I could post it as some of you might have a better idea of the source of the issue.

The importer fails on this specific item
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your site. -->
<!-- It contains information about your site's posts, pages, comments, categories, and other content. -->
<!-- You may use this file to transfer that content from one site to another. -->
<!-- This file is not intended to serve as a complete backup of your site. -->
<!-- To import this information into a WordPress site follow these steps: -->
<!-- 1. Log in to that site as an administrator. -->
<!-- 2. Go to Tools: Import in the WordPress admin panel. -->
<!-- 3. Install the "WordPress" importer from the list. -->
<!-- 4. Activate & Run Importer. -->
<!-- 5. Upload this file using the form provided on that page. -->
<!-- 6. You will first be asked to map the authors in this export file to users -->
<!--    on the site. For each author, you may choose to map to an -->
<!--    existing user on the site or to create a new user. -->
<!-- 7. WordPress will then import each of the posts, pages, comments, categories, etc. -->
<!--    contained in this file into your site. -->
<!-- generator="WordPress/4.9.3" created="2021-12-24 12:02" -->
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:wp="http://wordpress.org/export/1.2/" version="2.0">
   <channel>
      <title>Limonadier</title>
      <link>https://limonadier.net</link>
      <description>Débit de beaux sons</description>
      <pubDate>Fri, 24 Dec 2021 12:02:30 +0000</pubDate>
      <language>fr-FR</language>
      <wp:wxr_version>1.2</wp:wxr_version>
      <wp:base_site_url>https://limonadier.net</wp:base_site_url>
      <wp:base_blog_url>https://limonadier.net</wp:base_blog_url>
      <generator>https://wordpress.org/?v=4.9.3</generator>
      <item>
         <title>Lucy Dacus — No Burden (debut LP)</title>
         <link>https://limonadier.net/lucy-dacus-no-burden-debut-lp/</link>
         <pubDate>Fri, 01 Apr 2016 13:45:31 +0000</pubDate>
         <dc:creator><![CDATA[MNKTMNR]]></dc:creator>
         <guid isPermaLink="false">http://limonadier.net?p=38158</guid>
         <description />
         <content:encoded><![CDATA[

<iframe style="border: 0; width: 100%; height: 42px;" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=278575587/transparent=true/" width="300" height="150" seamless=""><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe>

<iframe style="border: 0; width: 100%; height: 42px;" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300" height="150" seamless=""><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe>
			&nbsp;]]></content:encoded>
         <excerpt:encoded />
         <wp:post_id>38158</wp:post_id>
         <wp:post_date><![CDATA[2016-04-01 15:45:31]]></wp:post_date>
         <wp:post_date_gmt><![CDATA[2016-04-01 13:45:31]]></wp:post_date_gmt>
         <wp:comment_status><![CDATA[closed]]></wp:comment_status>
         <wp:ping_status><![CDATA[closed]]></wp:ping_status>
         <wp:post_name><![CDATA[lucy-dacus-no-burden-debut-lp]]></wp:post_name>
         <wp:status><![CDATA[publish]]></wp:status>
         <wp:post_parent>0</wp:post_parent>
         <wp:menu_order>0</wp:menu_order>
         <wp:post_type><![CDATA[post]]></wp:post_type>
         <wp:post_password />
         <wp:is_sticky>0</wp:is_sticky>
      </item>
   </channel>
</rss>
Here is the traceback
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/management/commands/import_xml.py", line 70, in handle
    importer.run(
  File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 113, in run
    wp_post_id=wordpress_item.cleaned_data.get("wp_post_id")
  File "/usr/local/lib/python3.8/functools.py", line 967, in __get__
    val = self.func(instance)
  File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 518, in cleaned_data
    "body": self.body_stream_field(self.prefilter_content(self.raw_body)),
  File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 436, in body_stream_field
    builder.promote_child_tags()
  File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/block_builder.py", line 58, in promote_child_tags
    promotee.parent.replace_with(promotee)
  File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 266, in replace_with
    raise ValueError(
ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree.
And here are some logs I added in the promote_child_tags method
Promotee <iframe height="150" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300">&lt;a href="http://lucydacus.bandcamp.com/album/no-burden"&gt;No Burden by Lucy Dacus&lt;/a&gt;</iframe>
Parent <p> <br/>
<iframe height="150" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300">&lt;a href="http://lucydacus.bandcamp.com/album/no-burden"&gt;No Burden by Lucy Dacus&lt;/a&gt;</iframe></p>
Parent name p
Removee tags ['p', 'div', 'span']

Details

Wagtail v2.15.2
I installed wagtail-wordpress-import from the main branch yesterday, so I am using the latest version of this codebase.

@fabienheureux
Copy link
Contributor Author

Something odd I noticed is the fact that <p> <br /> is the "parent" whereas these tags are not even in the original xml 🤔

@nickmoreton nickmoreton added the bug Something isn't working label Jan 22, 2022
@nickmoreton
Copy link
Collaborator

Thanks for the report.

I tried importing your XML snippet and it works OK, without console errors or warnings. I get a single imported page as expected with 2 'raw_html` blocks, each containing the iframe.

Something odd I noticed is the fact that <p> <br /> is the "parent" whereas these tags are not even in the original xml 🤔

The <p> <br /> tags are added in the bleach process.

@nickmoreton nickmoreton self-assigned this Jan 27, 2022
ahayzen-kdab added a commit to ahayzen-kdab/wagtail-wordpress-import that referenced this issue Jan 19, 2024
eg

<div>
    <blockquote></blockquote>
    <tag></tag>
    <blockquote></blockquote>
</div>

<blockquote></blockquote>
<div>
    <tag></tag>
</div>
<blockquote></blockquote>

<blockquote></blockquote>

This solve the following crash
Cannot replace one element with another when the element to be replaced is not part of a tree

Closes torchbox#141
ahayzen-kdab added a commit to ahayzen-kdab/wagtail-wordpress-import that referenced this issue Jan 22, 2024
eg

<div>
    <blockquote></blockquote>
    <tag></tag>
    <blockquote></blockquote>
</div>

<blockquote></blockquote>
<div>
    <tag></tag>
</div>
<blockquote></blockquote>

<blockquote></blockquote>

This solve the following crash
Cannot replace one element with another when the element to be replaced is not part of a tree

Closes torchbox#141
ahayzen-kdab added a commit to ahayzen-kdab/wagtail-wordpress-import that referenced this issue Jan 22, 2024
eg

```html
<div>
    <blockquote></blockquote>
    <tag></tag>
    <blockquote></blockquote>
</div>
```

Needs to become
```html
<blockquote></blockquote>
<div>
    <tag></tag>
</div>
<blockquote></blockquote>
```

Before it became
```html
<blockquote></blockquote>
```

This solves the following crash
ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree.

Related torchbox#141
ahayzen-kdab added a commit to ahayzen-kdab/wagtail-wordpress-import that referenced this issue Jan 22, 2024
eg

<div>
    <blockquote></blockquote>
    <tag></tag>
    <blockquote></blockquote>
</div>

<blockquote></blockquote>
<div>
    <tag></tag>
</div>
<blockquote></blockquote>

<blockquote></blockquote>

This solve the following crash
Cannot replace one element with another when the element to be replaced is not part of a tree

Closes torchbox#141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants