You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The wikipedia plugin supports redirection and "near miss" queries by talking to the action=query part of the MediaWiki API. This has surprising results if the user's query is namespaced (i.e. SomeNamespace:*) and the target page does not exist. Oftentimes, the result will be something outside the intended namespace, which is fairly surprising from the user's side.
10:22 <+SnoopJ> a less profane test case: https://en.wikipedia.org/wiki/Category:Spatulas
10:22 <+Sopel> [wikipedia] Spatulas | "A spatula is a broad, flat, flexible blade used to mix, spread and lift material including foods, drugs, plaster and paints. In medical applications, "spatula" may also be used synonymously with tongue depressor.The word spatula derives from the Latin word for a flat piece of wood or splint, a diminutive form of the Latin spatha, meaning 'broadsword', and hence can also refer to a tongue depressor. The words spade (digging […]"
10:22 <+SnoopJ> what's interesting about that one is…
10:22 <+SnoopJ> .wp Category:Spatulas
10:22 <+Sopel> [wikipedia] Category:Spatula (genus) | "" | https://en.wikipedia.org/wiki/Category%3ASpatula_%28genus%29
10:23 <+SnoopJ> …that there *is* a fairly close category, but I guess the remote API is either not sending it, or it's after the other one
(I checked the Special:* namespace as well, it seems all namespaces are affected)
This is a regression caused by #2414 which introduced the use of urlparse() on a non-URL string, which causes the namespace to be confused for a URL scheme. In simple terms, my code for addressing #2412 is holding urlparse() wrong.
Reproduction steps
Query the bot with a Wikipedia URL or .wp command for a non-existent namespaced page
There is no (2)
Edit: this also affects regular articles with colons in the title, e.g.
<SnoopJ> https://en.wikipedia.org/wiki/Pitfall:_The_Lost_Expedition
<terribot> [wikipedia] The Lost Expedition | "The Lost Expedition (Russian: Пропавшая экспедиция, romanized: Propavshaya ekspeditsiya) is a 1975 Soviet drama film directed by Venyamin Dorman."
Expected behavior
The namespace should not be ignored, if I asked for Category:Spatulas, I don't want to know about Spatulas
Relevant logs
No response
Notes
We can urlparse() the entire URL and drop the prefixing /wiki, but there might be a better way to do it that involves slicing, since dropping the prefix is slightly annoying in Python 3.8. Shouldn't be hard to fix, though.
SnoopJ
changed the title
wikipedia: results for non-existent namespaced articles are surprising
wikipedia: results for namespaces/articles with colons in title are surprising
Dec 15, 2023
Description
The
wikipedia
plugin supports redirection and "near miss" queries by talking to theaction=query
part of the MediaWiki API. This has surprising results if the user's query is namespaced (i.e.SomeNamespace:*
) and the target page does not exist. Oftentimes, the result will be something outside the intended namespace, which is fairly surprising from the user's side.(I checked the
Special:*
namespace as well, it seems all namespaces are affected)This is a regression caused by #2414 which introduced the use of
urlparse()
on a non-URL string, which causes the namespace to be confused for a URL scheme. In simple terms, my code for addressing #2412 is holdingurlparse()
wrong.Reproduction steps
.wp
command for a non-existent namespaced pageEdit: this also affects regular articles with colons in the title, e.g.
Expected behavior
The namespace should not be ignored, if I asked for
Category:Spatulas
, I don't want to know aboutSpatulas
Relevant logs
No response
Notes
We can
urlparse()
the entire URL and drop the prefixing/wiki
, but there might be a better way to do it that involves slicing, since dropping the prefix is slightly annoying in Python 3.8. Shouldn't be hard to fix, though.Sopel version
7693af3
Installation method
pip install
Python version
3.8.18
Operating system
Ubuntu 20.04
IRCd
No response
Relevant plugins
wikipedia
The text was updated successfully, but these errors were encountered: