Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wikipedia: results for namespaces/articles with colons in title are surprising #2573

Closed
SnoopJ opened this issue Nov 29, 2023 · 0 comments · Fixed by #2575
Closed

wikipedia: results for namespaces/articles with colons in title are surprising #2573

SnoopJ opened this issue Nov 29, 2023 · 0 comments · Fixed by #2575
Assignees
Labels
Bug Things to squish; generally used for issues
Milestone

Comments

@SnoopJ
Copy link
Contributor

SnoopJ commented Nov 29, 2023

Description

The wikipedia plugin supports redirection and "near miss" queries by talking to the action=query part of the MediaWiki API. This has surprising results if the user's query is namespaced (i.e. SomeNamespace:*) and the target page does not exist. Oftentimes, the result will be something outside the intended namespace, which is fairly surprising from the user's side.

10:22 <+SnoopJ> a less profane test case: https://en.wikipedia.org/wiki/Category:Spatulas
10:22 <+Sopel> [wikipedia] Spatulas | "A spatula is a broad, flat, flexible blade used to mix, spread and lift material including foods, drugs, plaster and paints. In medical applications, "spatula" may also be used synonymously with tongue depressor.The word spatula derives from the Latin word for a flat piece of wood or splint, a diminutive form of the Latin spatha, meaning 'broadsword', and hence can also refer to a tongue depressor. The words spade (digging […]"
10:22 <+SnoopJ> what's interesting about that one is…
10:22 <+SnoopJ> .wp Category:Spatulas
10:22 <+Sopel> [wikipedia] Category:Spatula (genus) | "" | https://en.wikipedia.org/wiki/Category%3ASpatula_%28genus%29
10:23 <+SnoopJ> …that there *is* a fairly close category, but I guess the remote API is either not sending it, or it's after the other one

(I checked the Special:* namespace as well, it seems all namespaces are affected)

This is a regression caused by #2414 which introduced the use of urlparse() on a non-URL string, which causes the namespace to be confused for a URL scheme. In simple terms, my code for addressing #2412 is holding urlparse() wrong.

Reproduction steps

  1. Query the bot with a Wikipedia URL or .wp command for a non-existent namespaced page
  2. There is no (2)

Edit: this also affects regular articles with colons in the title, e.g.

<SnoopJ> https://en.wikipedia.org/wiki/Pitfall:_The_Lost_Expedition
<terribot> [wikipedia]  The Lost Expedition | "The Lost Expedition (Russian: Пропавшая экспедиция, romanized: Propavshaya ekspeditsiya) is a 1975 Soviet drama film directed by Venyamin Dorman."

Expected behavior

The namespace should not be ignored, if I asked for Category:Spatulas, I don't want to know about Spatulas

Relevant logs

No response

Notes

We can urlparse() the entire URL and drop the prefixing /wiki, but there might be a better way to do it that involves slicing, since dropping the prefix is slightly annoying in Python 3.8. Shouldn't be hard to fix, though.

Sopel version

7693af3

Installation method

pip install

Python version

3.8.18

Operating system

Ubuntu 20.04

IRCd

No response

Relevant plugins

wikipedia

@SnoopJ SnoopJ added Bug Things to squish; generally used for issues Low Priority labels Nov 29, 2023
@SnoopJ SnoopJ added this to the 8.0.0 milestone Nov 29, 2023
@SnoopJ SnoopJ self-assigned this Nov 29, 2023
@dgw dgw closed this as completed in #2575 Dec 3, 2023
@SnoopJ SnoopJ changed the title wikipedia: results for non-existent namespaced articles are surprising wikipedia: results for namespaces/articles with colons in title are surprising Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Things to squish; generally used for issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant