Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: documented as calling other plugins when redirects match, but doesn't #2230

Closed
dgw opened this issue Dec 30, 2021 · 8 comments · Fixed by #2433
Closed

url: documented as calling other plugins when redirects match, but doesn't #2230

dgw opened this issue Dec 30, 2021 · 8 comments · Fixed by #2433
Assignees
Labels
Milestone

Comments

@dgw
Copy link
Member

dgw commented Dec 30, 2021

Once upon a time, the url plugin's title_auto feature would check if a link redirected to a URL that should be handled by another plugin, but that appears to no longer be the case since the conversion to use requests. Its documentation and that for the process_urls helper (below) still say this is the case, but the behavior is gone.

sopel/sopel/modules/url.py

Lines 290 to 295 in d36a19d

def title_auto(bot, trigger):
"""
Automatically show titles for URLs. For shortened URLs/redirects, find
where the URL redirects to and show the title for that (or call a function
from another plugin to give more information).
"""

sopel/sopel/modules/url.py

Lines 322 to 329 in d36a19d

def process_urls(bot, trigger, urls):
"""
For each URL in the list, ensure that it isn't handled by another plugin.
If not, find where it redirects to, if anywhere. If that redirected URL
should be handled by another plugin, dispatch the callback for it.
Return a list of (title, hostname) tuples for each URL which is not handled
by another plugin.
"""

I would argue that this is a useful feature, and we should bring it back. For example, our first-party GitHub plugin could get free handling of repo/issue/PR/commit links shortened through git.io if this was still the case. Any shortened URL at present, actually, is only ever going to get the generic title-fetching behavior.

While it's certainly possible to create Yet Another Sopel Plugin that resolves short URLs, it would be fairly annoying to implement—and it wouldn't block the built-in title handling, either.

Note, I don't care if this is part of url, but it makes more sense to put it there than into the bot's core URL handling. We don't necessarily want the core bot making HTTP requests every time it sees a link. Some of the thoughts in this issue were previously expressed in #sopel@libera on 2021-11-30.

@dgw dgw added the Feature label Dec 30, 2021
@dgw dgw added this to the 8.0.0 milestone Dec 30, 2021
@SnoopJ
Copy link
Contributor

SnoopJ commented Mar 23, 2023

I believe the "did any redirects happen?" part of the fix for this issue involves looking at Response.history, but I'm less sure about what to do after that after a glance at the plugin to remind myself of its structure.

A patch may want to add a note to plugin.url to explicitly warn users that they will not receive events for redirects to the URLs they declare this way (since no request is performed)

@SnoopJ SnoopJ self-assigned this Mar 23, 2023
@dgw
Copy link
Member Author

dgw commented Mar 23, 2023

A patch may want to add a note to plugin.url to explicitly warn users that they will not receive events for redirects to the URLs they declare this way (since no request is performed)

That would constitute an inability to re-implement the documented behavior, then, wouldn't it?

"If that redirected URL should be handled by another plugin, dispatch the callback for it."

@Exirel
Copy link
Contributor

Exirel commented Mar 23, 2023

The URL plugin should probably say to request not to follow redirect, and to implement the follow-redirect logic ourselves. This way, we should be able to look at a request, if it's a redirect, ask the bot if a plugin can deal with it, etc. and so on up until there is no redirect and no plugin can handle it.

@dgw
Copy link
Member Author

dgw commented Mar 23, 2023

Or the bot's URL dispatcher can test for redirects itself if no plugin's handler matches? I wouldn't mind decoupling this redirect-handling behavior from url at all. It would be ideal, in fact.

Problem is the url plugin's auto_title handler also counts as "a plugin's handler" that matches everything. But I'd also say that it's better to set the URL dispatcher to ignore url.auto_title than to rely on url to handle redirected links. It's still coupling core behavior to a plugin, but much less obnoxious than the other way around.

@half-duplex
Copy link
Member

If we're considering that, my dream is something like:

  • Sopel detects URL
  • If a plugin configured a callback for that URL, hand it to the plugin and you're done. Otherwise,
  • Sopel performs GET request. If redirect, goto ^
  • Sopel passes the URL and the Response to plugins that have registered for page body processing
    • Plugins may be able to register an xpath or callback for deciding if they're relevant
    • Callbacks can also return E_NOTME or something, which tells Sopel to continue looking for other plugins to handle that page
  • If all registered plugins pass on it (url, xpath, or E_NOTME), it goes to url.py if enabled

This would let plugins like bugzilla or a hypothetical mastodon/activitypub plugin to handle the relevant application regardless of the domain/path it lives at (mastodon.social, nondeterministic.computer, etc etc etc etc etc), and for others that use page data instead of turning a URL into an API call, they could dump it directly through an xml parser or BS4.

Worth it? No idea.

@dgw
Copy link
Member Author

dgw commented Mar 23, 2023

We've talked about that idea before, and I like it. Not this late in 8.0, and it's big enough I think it has to wait for 9.0, but I like it. If you're up for taking your bullets there and fleshing them out a bit into a standalone feature issue, go for it. 😃

FWIW, a mastodon plugin is no longer hypothetical (soon to be adopted under our org, I think).

The question now is, how can we resolve this issue in a relatively low-effort way for 8.0, while planning ahead to replace it in 9?

@SnoopJ
Copy link
Contributor

SnoopJ commented Mar 24, 2023

(semantic warning: longposting)

That would constitute an inability to re-implement the documented behavior, then, wouldn't it?

I think I'm keying mostly off of your suggestion that the patch that implements this should live in url, which I understood as meaning that this plugin should be capable of dealing with redirects, but I think that the answer to your question is "yes" the way I've been thinking about it. I agree with your later reply that the general problem does not make sense for 8.0, it may be enough to get url alone dealing with redirections, and we might learn more about how we want to solve the big problem in pursuing that.

Sopel doing requests on the plugins behalf

* Sopel performs GET request. If redirect, goto ^

I'm a -1 on Sopel doing eager requests, I don't think the core should be performing a request on every matched URL before dispatching to plugins. What should the rules for this one-size-fits all request look like? What if someone links to an .iso etc? There's a lot to worry about, and I don't really think it's the core's business as a general rule. At a minimum, we seem to be talking about paying attention to every URL, which is a pretty big change.

However, reflecting on this suggestion, I began wondering if it makes sense for Sopel to provide a helper for requests that will dispatch to plugins that have a registered pattern matching an eventual redirect. Let's assume https://example.com redirects to some other specific path, and that two plugins have registered patterns for this domain. Does it make sense to do something like this?

@plugin.url("https://example.com")
def pluginA_example(bot, trigger):
    urls = web.search_urls(trigger, exclusion_char=bot.config.url.exclusion_char, clean=True)
    for url in urls:
        # in the servicing of get(), we could dispatch to decorators that match the redirect URL, where appropriate
        response = web.get(url)  # type: requests.Response

    # ...

# …somewhere else, a cousin decoration occurs…
@plugin.url("https://example.com/with/a/specific/path")
def pluginB_example_specific(bot, trigger):
    # ...

Other spellings of this "let the bot do the request" ideas might include a helper for explicitly delegating back to the bot (e.g. bot.url_event(redirect_target)), or some way for plugin authors to explicitly opt in to letting the core do the request for them (e.g. @plugin.url("https://example.com/", do_request_for_me=True)). I think these options are along the same lines as what @Exirel suggested.

I think all of that is definitely well past the bar for 8.0, but it's interesting to think about how to get redirects working cooperatively.

@SnoopJ
Copy link
Contributor

SnoopJ commented Mar 25, 2023

I've filed #2432 as longer-term planning for restoration of this feature, so that for 8.0 we can focus on 'just' fixing the incorrect claim in the docstrings. It's a good idea to dispatch to callbacks who want to know about redirects, but I think we need to think about that some more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants