Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude links and email addresses from prominent words analysis #19137

Merged
merged 20 commits into from
Dec 6, 2022

Conversation

agnieszkaszuba
Copy link
Contributor

@agnieszkaszuba agnieszkaszuba commented Nov 7, 2022

Context

  • URLs and email addresses that are written out as part of the text (e.g. "Check out the Yoast blog: https://yoast.com/seo-blog/" instead of "Check out the Yoast blog") were included in the prominent words analysis, which affected the internal linking suggestions and the prominent words shown in Insights. Since URLs and email addresses they aren't really words, they should not be counted as part of prominent words. An exception are URLs that only consist of the domain name (e.g. yoast.com) since those can be used as words (e.g. "Let's publish this article on yoast.com").

Summary

This PR can be summarized in the following changelog entry:

  • [wordpress-seo-premium] Improves the prominent words section in Insights and the internal linking suggestions by excluding URLs and email addresses from the prominent words.
  • Improves the accuracy of calculating text length in Japanese by excluding all spaces and HTML tags from the character count, and by including domain names.
  • [shopify-seo] Improves the way we calculate text length of Japanese texts, for example by excluding all spaces from the character count. This improves the accuracy of many assessments in Japanese, such as the text length assessment or the sentence length assessment.
  • [shopify-seo] Improves Japanese character count by excluding all spaces from the character count.
  • [shopify-seo] Improves the prominent words section in Insights by excluding URLs and email addresses from the prominent words.
  • [yoastseo] Removes URLs and email addresses from the text before calculating prominent words (for insights and for internal linking).
  • [yoastseo] Improves the regex used to remove URLs from the text so that it matches URLs containing semi-colons, and doesn't match domain names (e.g. yoast.com).
  • [yoastseo] Removes all spaces from the text before counting the number of characters in Japanese texts.

Relevant technical choices:

  • The functionality for removing URLs that was used in the countCharacters.js helper for Japanese is moved to a separate file, since it can be re-used to exclude URLs from the prominent words analysis.
  • The regex is also improved so that it correctly removes URLs containing semi-colons (e.g. https://www.example.com/foo/?bar=baz&inga=42&quux) and so that it doesn't match domain names (e.g. yoast.com) While domain names can be used as URLs, it might be more common to use them as proper nouns in a sentence, for example We got so much traffic on yoast.com after the latest release)
  • While working on this issue, a bug was found where 1 character is added to the Japanese text for every HTML tag (that's not at the end/beginning of the string, or following/preceding a full stop). This is because when we remove HTML tags before counting the characters, we replace the HTML tags with spaces. We then remove some of the spaces that are considered redundant (double spaces, spaces at the beginning/end of the string, spaces before/after a full stop). But in Japanese, spaces aren't generally used so it's better to remove all spaces before counting the characters. This step was added inside the countCharacters.js helper. Also a technical debt issue was created to try fixing this more upstream (i.e. inside the stripHTMLTags helper.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

You will need the following texts for testing:
Prominent words testing text 1
Prominent words testing text 2
Prominent words testing text 3

Test in WordPress

  • Note that the behaviour of the internal links suggestions is inconsistent, both in the current released versions (19.11 Free and 19.5 Premium) and on this branch (same for trunk). If a certain post/page/CPT doesn’t appear as a suggested link, rerun the SEO optimization and confirm that it appears afterwards. (To do that go to Tools/Yoast Test/Reset Indexables & migrations and then click on the Run SEO Optimization button.) Taxonomies appear ONLY after rerunning SEO optimization. You need to create the term only by entering a title, and then add text when you open the text editor.

  • Activate Yoast SEO Free and Premium (for acceptance tester: make sure to link to the Free branch when building Premium)

Test prominent words for insights

  • Open a new post, paste the first test text in your editor, and publish the post
  • Confirm that the links/emails under the heading Not prominent words do not show up in the list of prominent words
  • Confirm that the words under the heading Prominent words do show up in the list of prominent words

Test internal linking

  • Create and publish two new posts - one with the second test text and one with the third test text
  • Confirm that:
    • In the internal linking suggestions of the first post, the second post is suggested and the third one is not
    • In the internal linking suggestions of the second post, the first post is suggested and the third one is not
    • In the internal linking suggestions of the third post, neither the first nor the second post are suggested

Test prominent words created when clicking the SEO data optimization button

  • Unindex the posts that you have created (documentation on unindexing posts - use the second method)
  • Go to Yoast SEO -> Tools and click on the SEO data optimization button
  • Go to the xxxx_yoast_prominent_words table in the database and confirm that the posts have prominent words, and that URLs and emails are not listed among them (but other words are)

Test Japanese character count

Test whether adding URLs doesn't increase the character count

  • Follow the steps from the Testing the text length assessment section of the PR that excluded URLs from Japanese character count (regression testing)
    • When you are adding URLs to your text, make sure that some of them are directly following and preceding Japanese characters (e.g. ぜらス床ぜらスhttps://example.comぜらス床ぜらス). The Japanese characters should be included in the character count, while the URL shouldn't.
  • Add the following URL to your text and confirm that it doesn't increase the character count in the Text length assessment: https://www.example.com/foo/?bar=baz&inga=42&quux
  • Add yoast.com to your text and make sure that the character count increases by 9 characters.

Test whether adding spaces and HTML tags doesn't increase the character count

Confirm that the character count shown by the Text length assessment does not increase in any of the following cases:

  • When you embed a URL in your text
  • When you add other HTML tags in your text, for example <i> or <span> tags.
  • When you add a space somewhere in your text
  • Where you add a line break somewhere in your text

Test upgrade routine

  • Delete the posts that you created with the test texts
  • Install and activate the latest stable version of Free (for acceptance tester: make sure to also unlink the Premium branch from the issue branch and rebuild Premium)
  • Create and publish the posts again
  • Confirm that all the links and emails in the first post show up in the list of prominent words
  • Confirm that in the internal linking suggestions of the first post, both the second and third post are suggested
  • Install and activate the version of Free containing this change (for acceptance tester: make sure to also link the issue branch to the Premium branch and rebuild Premium)
  • Go back to the first post and confirm that the links and emails in the first post do not show up in the list of prominent words
  • Also confirm that in the internal linking suggestions, the second post is suggested but the third one isn't

Test in Shopify

Test prominent words for insights

  • Enable the Insights feature flag by pasting YOAST_INSIGHTS_ACTIVE=1 in your .env.local file
  • Open a new post and paste the first test text in your editor
  • Confirm that the links/emails under the heading Not prominent words do not show up in the list of prominent words
  • Confirm that the words under the heading Prominent words do show up in the list of prominent words

Test Japanese character count

Test whether adding URLs doesn't increase the character count

  • Change your shop language to Japanese (documentation on how to change shop language)
  • Create a new product and add some Japanese text (you can use this generator)
  • Paste a few different URLs into your text and confirm that they do not change the character count according to the Text length assessment
    • When you are adding URLs to your text, make sure that some of them are directly following and preceding Japanese characters (e.g. ぜらス床ぜらスhttps://example.comぜらス床ぜらス). The Japanese characters should be included in the character count, while the URL shouldn't.
  • Add the following URL to your text and confirm that it doesn't increase the character count in the Text length assessment: https://www.example.com/foo/?bar=baz&amp;inga=42&amp;quux
  • Add yoast.com to your text and make sure that the character count increases by 9 characters.

Test whether adding spaces and HTML tags doesn't increase the character count

Confirm that the character count shown by the Text length assessment does not increase in any of the following cases:

  • When you embed a URL in your text
  • When you add other HTML tags in your text, for example <i> or <span> tags.
  • When you add a space somewhere in your text
  • Where you add a line break somewhere in your text

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Block/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test the Test prominent words for insight, Test internal linking and Test Japanese character count steps in all post types and in the Block, Classic, Gutenberg, and Elementor editors. The Test prominent words created when clicking the SEO data optimization button and Test upgrade routine parts only need to be tested in one content type and editor.

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

  • Since changes were made to the helper for counting characters in Japanese, all functionality that relies on this helper is affected (for Japanese):
    • Sentence length assessment (the helper is used to calculate sentence length)
    • Paragraph length assessment (the helper is used to calculate paragraph length)
    • Subheading distribution assessment (the helper is used to check the length of the whole text, and the length of the text
      under each subheading, if there are any).
    • Transition words assessment (only checking applicability of assessment relies on counting characters - in Japanese, it
      should be applicable for texts with at least 400 characters)
    • Keyword density assessment (only checking applicability of assessment relies on counting characters - in Japanese, it
      should be applicable for texts with at least 200 characters)
    • Internal linking suggestions (only checking applicability of assessment relies on counting characters - in Japanese, it
      should be applicable for texts with at least 200 characters)

Testing of these functionalities was not added to the test instructions because the helper is used in the exact same way for all those functionalities. And if counting characters works correctly in the Text length assessment (which looks at the whole text), it is also assumed counting characters works when counting the length of (parts of) the text in other assessments. But a quick smoke test to make sure that this functionality still works as expected might be a good safety check.

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change.

Quality assurance

  • I have tested this code to the best of my abilities
  • I have added unit tests to verify the code works as intended
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.

Fixes https://yoast.atlassian.net/browse/PC-865, https://yoast.atlassian.net/browse/PRODUCT-851

@agnieszkaszuba agnieszkaszuba added the changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog label Nov 8, 2022
@agnieszkaszuba agnieszkaszuba added the Shopify This PR impacts Shopify. label Nov 9, 2022
@marinakoleva
Copy link
Contributor

marinakoleva commented Nov 16, 2022

CR done ✅
Testing done apart from the Shopify section.

Specifically, sections Test prominent words for insight and Test internal linking tested in:

  • Posts tested in Block editor, Gutenberg, Classic, Elementor;
  • Pages tested in Block, Classic editor;
  • Custom Post Types: tested in Block editor and Elementor;
  • Taxonomies: tested in Block editor.

Section Test Japanese character count steps tested in:

  • Posts: Block, Classic, Gutenberg, Elementor
  • Pages: Block, Classic, Gutenberg, Elementor
  • Custom Post Types: Block, Classic, Gutenberg, Elementor
  • Taxonomies: Classic
  • Custom taxonomies: Classic

@iolandasequino
Copy link
Contributor

iolandasequino commented Nov 17, 2022

I tested Test prominent words for insights in Shopify. It works as expected 🎉
It took a long time to build the app, since in the meantime we found out a feature branch we merged is breaking trunk, which I merged into this branch earlier today.

In order to test this PR checkout to the latest working commit: gco 2a3efa5 and then build the app.
Only the Japanese part is left to be tested.

@iolandasequino
Copy link
Contributor

Findings when testing Japanese in Shopify, some good and some not so good:

  • Adding URLs before of after a Japanese text does not affect character count 🎉
  • This URL: https://www.example.com/foo/?bar=baz&amp;inga=42&amp;quuxdoes not affect character count when pasted after or inside a text 🎉
  • Adding URLs inside a Japanese text does affect character count, making it increase by 1 character 😢
  • Adding yoast.com to the text makes the character count sometimes increase by 10 when the text is typed after a Japanese text and by 9 when inside a Japanese text 😢
  • Pasting yoast.com inside a Japanese text makes the character count increase by 10 and when deleting the text the character count will be left with one character extra compared to before pasting 😢

The results are also somewhat inconsistent, so it would be better to test at least in another environment.

@marinakoleva
Copy link
Contributor

marinakoleva commented Nov 21, 2022

After testing the issue on Shopify one more time and testing the behaviour in the released version too, there seems to be only one problem left, which is specific to this branch. Namely, that when a URL with formatting is added and then removed, the additional +1 character remains in the character count. While in the released version, removing the URL decreases the character count back to what it was before the URL was added. 



Otherwise, both in this branch and in the released version:

  • Adding yoast.com to the text makes the character count increase by 10 when the text is typed before or after a Japanese text and by 9 when inside a Japanese text (most of the time, but not a 100% consistent behaviour)
  • adding a URL with formatting adds a character to the character count

@mhkuu
Copy link
Contributor

mhkuu commented Nov 30, 2022

CR of the new commits: ✅ (confirmed yarn test passes 🎉 ).

@agnieszkaszuba
Copy link
Contributor Author

I fixed the issue with formatted URLs adding an extra character. And I also couldn't replicate the issue where adding yoast.com sometimes increases the character count by 9 and sometimes by 10 characters. So that might be fixed as well, though you mentioned the behavior was inconsistent so maybe I just got lucky and didn't encounter it 😛

@FAMarfuaty
Copy link
Contributor

Acceptance test:
I tested these two scenarios both in WordPress and in Shopify (cross-testing with different content types and editors)

  • Test whether adding URLs doesn't increase the character count
  • Test whether adding spaces and HTML tags doesn't increase the character count

Everything works as expected 👍🏽

@FAMarfuaty FAMarfuaty added this to the 19.13 milestone Dec 6, 2022
@FAMarfuaty FAMarfuaty merged commit df0429e into trunk Dec 6, 2022
@FAMarfuaty FAMarfuaty deleted the PC-865-exclude-links-from-prominent-words branch December 6, 2022 11:43
@enricobattocchi enricobattocchi modified the milestones: 19.13, 19.14 Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog Shopify This PR impacts Shopify.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants