Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordpress: Shortcodes/scripts in Posts render output in search results #868

Closed
neosin opened this issue Jun 22, 2017 · 11 comments
Closed

Wordpress: Shortcodes/scripts in Posts render output in search results #868

neosin opened this issue Jun 22, 2017 · 11 comments
Assignees
Labels
confirmed bug low priority module:posts Issues related to the Posts Indexable / Feature.
Milestone

Comments

@neosin
Copy link

neosin commented Jun 22, 2017

Shortcodes/scripst in Posts render output in search results.

When searching posts and if the post contains a shortcode or script tag, the source of those are rendered in the search results.

is this a bug or a setting that needs to be set somewhere?

@ivankristianto
Copy link
Contributor

@neosin yes shortcodes will be rendered when doing the indexing. so the search results will output rendered content as a whole.

@tlovett1
Copy link
Member

tlovett1 commented Jun 26, 2017

We apply the the_content filter and strip tags before storing the text in ES. Therefore script tags shouldn't be searchable but regular text inside a shortcode should.

'post_content' => $this->prepare_text_content( apply_filters( 'the_content', $post->post_content ) ),

@neosin
Copy link
Author

neosin commented Jun 26, 2017

We use shortcodes to display video with jwplayer and add json schema data. The issue we are running into is that the schema data is being indexed and displayed. The video player is being removed as expected but the schema data is not. We are not sure how to remove it. The javascript json tags are being removed but the schema data between the tags is not. Any suggestions of how to remove the data between the tags?
<script type="application/ld+json">{"@context":"http:\/\/schema.org\/","@type":"VideoObject", "name".....etc </script>

@neosin
Copy link
Author

neosin commented Jun 29, 2017

so this is a bug?

@ivankristianto
Copy link
Contributor

Yes it supposed to have script tags inside the content indexed.
We investigate more

@dosmoc
Copy link

dosmoc commented Oct 11, 2017

I've run into this issue myself. While the surrounding script tags are stripped by prepare_text_content, the code itself is still put into post_content. It'd be preferable to have the entire element removed. Would it be possible to get a hook inserted before the call to strip_tags in the prepare_text_content function?

@psorensen
Copy link
Contributor

Hi @dosmoc You can filter the ep_sync_args and provide your own logic to the post_content.

add_filter( 'ep_sync_args', 'my_remove_scripts_from_ep', 10, 2 );
function my_remove_scripts_from_ep( $args, $post_id ) {
    // get the post object of the given ID, bail if empty
    $post = get_post( $post_id );
    if ( empty( $post ) ) {
        return $args;
    }

    // do stuff to $post->post_content here, i.e:
    $post_content = strip_tags( $post->post_content );

    $args['post_content'] = $post_content;
   
    return $args;
}

@brandwaffle
Copy link
Contributor

@neosin closing this as it appears to have an answer. Feel free to re-open if you've got a follow-up!

@brandwaffle
Copy link
Contributor

Looking over this again I'd agree that we want to exclude any JS content (not just the tags themselves) in a situation like this. Looks like instead of strip_tags() we can call https://developer.wordpress.org/reference/functions/wp_strip_all_tags/.

@brandwaffle brandwaffle reopened this Apr 30, 2018
@felipeelia felipeelia added this to the 4.2.0 milestone Mar 29, 2022
@felipeelia felipeelia self-assigned this Mar 29, 2022
@felipeelia felipeelia added the module:posts Issues related to the Posts Indexable / Feature. label Mar 31, 2022
@felipeelia
Copy link
Member

ElasticPress 3.0 changed this a bit and then ElasticPress 4.0 added yet another change related to this. Currently, on a website with Instant Results enabled, these three fields are indexed for a specific post:

Name Description Indexed as Example of usage
post_content The raw content of the post, as in the database. <!-- wp:paragraph -->n<p>Post content </p>n<!-- /wp:paragraph --> Currently, it is the field used in regular searches.
post_content_filtered The post content after being filtered by the the_content filter. \n<p>Post content </p>n<script>console.log("entrei");</script> Theoritically, one could get this field and output it directly, without processing it with PHP
post_content_plain Same as post_content_filtered but after a wp_strip_all_tags() call. Post content Used to display results in Instant Results' modal

That means that:

  1. The problem outlined in the issue is no longer applicable, as post_content nowadays really contains only the post content, without any transformation, and is not used to be printed anymore.
  2. If people want to match against filtered content they can apply the solution outlined in this comment
  3. Using Replace search in wp-admin #2, will likely match words like "script" or "console". If that is a problem, it would be possible to use post_content_plain instead (if IR is enabled)

As some custom solutions may be relying on post_content_filtered to display the post content formatted (with

, and , for example), any adjustment on that would be a breaking change. We could bring post_content_plain to the "regular" fields list but even that would be a breaking change, as non-IR users would have to do a full sync (sending the new mapping with that field.)

@felipeelia
Copy link
Member

The explanation and the table were added to https://elasticpress.zendesk.com/hc/en-us/articles/4402857301389.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed bug low priority module:posts Issues related to the Posts Indexable / Feature.
Projects
None yet
Development

No branches or pull requests

8 participants