Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incorrect search behavior if the query contains non-English letters or words. #2140

Closed
ghost opened this issue Jun 9, 2021 · 9 comments · Fixed by #2155
Closed

[Bug] Incorrect search behavior if the query contains non-English letters or words. #2140

ghost opened this issue Jun 9, 2021 · 9 comments · Fixed by #2155
Labels
bug Something isn't working good first issue Good for newcomers type:server-side

Comments

@ghost
Copy link

ghost commented Jun 9, 2021

Describe the bug
When trying to do a video search, if the query contains non-English letters or words, the search instead of what I entered in the query returns garbage in the search results.
And this is the case with all queries that contain non-English letters or words, which makes it almost impossible to use the search.
What's strange is that not all instances are affected by this problem, but only a certain part of them.
Attempts to experiment with the search filter parameters did not help either, the search results are still garbage.

Instances that are affected by this issue:
invidious.snopyta.org
yewtu.be
invidious.silkky.cloud

Instances that are not affected by this issue:
invidious.048596.xyz
inv.riverside.rocks

Steps to Reproduce
Try to enter a search, for example "Фильмы ужасов".

Expected behavior:
In the search results should appear videos, one way or another related to the horror genre.

Actual behavior:
The search results return "garbage" videos that have absolutely nothing to do with the horror genre.

Screenshots
Screenshot 1 which was taken on the instance yewtu.be is affected by this issue and is the actual behavior.
test
Screenshot 2 which was taken on the instance inv.riverside.rocks is not affected by this issue and is the expected behavior.
test2
Additional context

  • Browser: Firefox ESR 78.11.
  • OS: Windows 10.
@ghost ghost added the bug Something isn't working label Jun 9, 2021
@Mennaruuk
Copy link

Can confirm with Arabic too (باتلفيلد)
5cqlro

@SamantazFox
Copy link
Member

Thanks for reporting this :) NB: Snopyta seems to work for me.

For the record, hare are the different instance versions, at the time of report:

invidious.snopyta.org   => 2021.05.26-4a45d10 @ master
yewtu.be                => 2021.06.08-c942a88 @ master
invidious.silkky.cloud  => 2021.06.10-a6e38e2 @ master
invidious.048596.xyz    => 2021.05.13-996dd1a @ serving12
inv.riverside.rocks     => 2021.05.26-4a45d10 @ master

And btw @tenpura-shrimp, your instance is not properly redirecting to HTTPs, and will need to be updated soon if you wanna stay in the official list :)

@ghost
Copy link
Author

ghost commented Jun 11, 2021

It seems that the problem is related to an attempt to convert the received request from UTF-8 and then transmit the converted version of YouTube.
As another test, I decided to search for "+100500".
Search results on instance yewtu.be.
test
Search results on instance inv.riverside.rocks.
test2
As you can see, in screenshot 1, although videos with partially matching the search query are displayed, they have strange name "%2B100500".
If my memory serves me, "%2B" is the result of converting the "+" sign from UTF-8 by some algorithm.
Because of which, I have an assumption that the search is behaving incorrectly due to the fact that after processing the search request by an instance, it tries to send it to YouTube no longer in the form "+100500", but in the form "%2B100500", from this videos with a similar name appear in search results.

@unixfox
Copy link
Member

unixfox commented Jun 11, 2021

@noname946 Interesting findings.

@SamantazFox Do you think this could be related to the fact that #1985 changed something with the encoding on the search endpoint?
inv.riverside.rocks is running an old version of Invidious (https://inv.riverside.rocks/api/v1/stats) and is unaffected by the issue. But yewtu.be and invidious.snopyta.org are running a newer version and are affected by the issue.

@SamantazFox
Copy link
Member

If my memory serves me, "%2B" is the result of converting the "+" sign from UTF-8 by some algorithm.

It's called URL encoding ^^ Basically, some characters (like ./@?&=) are reserved in URLs, and must be converted in order to respect the URL specification.

@SamantazFox Do you think this could be related to the fact that #1985 changed something with the encoding on the search endpoint?

This is very likely, yep... I'll investigate and try a fix tonight or tomorrow.

@SamantazFox
Copy link
Member

Patch live at https://test.invidious.io

@Mennaruuk @noname946 can you please try it? I'll make a PR in the meantime.

@Mennaruuk
Copy link

Search results are relevant again!!! Thank you sooo much.

@SamantazFox
Copy link
Member

@Mennaruuk you're welcome :)

@ghost
Copy link
Author

ghost commented Jun 13, 2021

@SamantazFox Checked the search for "+100500" and "Фильмы ужасов", now the search results match the expected behavior and there are no more "garbage" videos.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working good first issue Good for newcomers type:server-side
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants