Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin registration fails, yet Pelican continues unperturbed, resulting in broken site. #3172

Open
2 tasks done
aknrdureegaesr opened this issue Aug 11, 2023 · 14 comments · May be fixed by #3173
Open
2 tasks done

Plugin registration fails, yet Pelican continues unperturbed, resulting in broken site. #3172

aknrdureegaesr opened this issue Aug 11, 2023 · 14 comments · May be fixed by #3173
Milestone

Comments

@aknrdureegaesr
Copy link

  • I have read the Filing Issues and subsequent “How to Get Help” sections of the documentation.
  • I have searched the issues (including closed ones) and believe that this is not a duplicate.
  • Debian GNU/Linux 11 (bullseye):
  • 3.9.2:
  • 4.8.0:
  • (my own):
  • (see below):
  • Link to your site:
  • Link to your source:
  • Link to a Gist with the contents of your settings file:

Issue

I'm running my own plugin that is needed to build my site. That plugin has some error handling at registration time.

Expectation: When the plugin cannot register, the site building stops and Pelican errors out.

Actually seen: When the plugin cannot register, an error is logged, but the site building continues. Pelican reports success to the operating system (exit value 0). But the generated site is broken, as the plugin never ran.

To reproduce, use a plugin like this one that errors out each time:

def register():
    raise RuntimeError("Something is wrong, cannot proceed with site building.")

I found this code in Pelican's __init__.py which I believe is pertinent:

    def init_plugins(self):
        self.plugins = []
        for plugin in load_plugins(self.settings):
            name = get_plugin_name(plugin)
            logger.debug('Registering plugin `%s`', name)
            try:
                plugin.register()
                self.plugins.append(plugin)
            except Exception as e:
                logger.error('Cannot register plugin `%s`\n%s',
                             name, e)

        self.settings['PLUGINS'] = [get_plugin_name(p) for p in self.plugins]
@avaris
Copy link
Member

avaris commented Aug 12, 2023

Expectation: When the plugin cannot register, the site building stops and Pelican errors out.

Well, no. Plugins are not "essential", pelican can run fine without them. If a plugin registration fails, too bad but it's no reason to stop pelican. If you want to stop pelican if an error occurs use --fatal option:

--fatal errors|warnings
                        Exit the program with non-zero status if any errors/warnings encountered. (default: )

@aknrdureegaesr
Copy link
Author

Well, no. Plugins are not "essential", pelican can run fine without them. If a plugin registration fails, too bad but it's no reason to stop pelican.

That's an interesting argument. To exaggerate it: We could replace Pelican with a single line of code exit(0) and it would run fine every single time.

Please, view this from a user's perspective: The user ordered, via some configuration, to have some plugin, presumably (as in my case), as that plugin is needed to correctly generate the site. What sense does it make to for the program to report "success" if that success can be achieved only by ignoring expressed wishes the user has uttered?

Do you think it happens a lot that a user would activate a plugin, but doesn't really care that much whether that plugin is actually doing anything or failing miserably? That's certainly not the way I as a user use software.

What Pelican does now breaks the "principle of least surprise" for most users, those who configure stuff with a view that that configuration isn't ignored. I think people who don't care whether a plugin is actually loaded or not should simply not load it. I don't see what advantage those people get from the present "first-class support for undetermined outcome".

Am I missing something?

@aknrdureegaesr
Copy link
Author

--fatal errors|warnings
Exit the program with non-zero status if any errors/warnings encountered. (default: )

Thank you! It is of course much better to fix the problem at this level rather than just for plugin loading. I changed my pull request #3173 accordingly.

@avaris
Copy link
Member

avaris commented Aug 14, 2023

That's an interesting argument. To exaggerate it: We could replace Pelican with a single line of code exit(0) and it would run fine every single time.

That's a bit hyperbole, isn't it...

What sense does it make to for the program to report "success" if that success can be achieved only by ignoring expressed wishes the user has uttered?

It's not ignored. You get a log message saying there is something wrong with your wishes. Same happens when a content cannot be parsed, i.e. let's say you wrote invalid markdown in on one of the articles, pelican will log an error, skip that and continue with the rest. If the issue is not critical and process can continue then there is no reason to stop, therefore it is a "success". Whether or not the result is what you expected is a different story and changes from case to case. It's up to the user to rectify this issue, that's why it is logged. --fatal even gives you the option to change the exit behavior.

Thank you! It is of course much better to fix the problem at this level rather than just for plugin loading. I changed my pull request #3173 accordingly.

This is a breaking change. Is there any compelling reason for imposing this default behavior to everyone, rather than you setting it for your own use? pelican gives you this option.

@aknrdureegaesr
Copy link
Author

I agree that Pelican gives me the option to get the error handling I want.

I argue the present default value of that option is less than optimal: "To see whether it worked, carefully scrutinize the log." This is broken UX.

Since the inception of the exit value as part of the venerable "pipes and filters" architecture decades ago, people expect differently: "If the exit value is 0, there is no need to wade through the log - all went well." This traditional UX is superior.

So I propose changing the default from --fatal=ignore to --fatal=errors.

This is a breaking change, yes.

But... Do you know, or at least can you envision a user who actually relies on the default behavior as is presently implemented? "Errors are not reported via the exit value, you need to read the log to find out," - Who would prefer that?

Few, if any. So few, if any, would actually experience the break. From my software development experience (I'm a freshly retried software developer, starting in the profession in 1990), these people are either non-existing or in a very tiny minority. Still, they are not abandoned. They can have what they want via --fatal=ignore.

Normal people are better served with the usual, time-proven "fail loudly" way of handling problems.

This indeed is time-proven. "Fail loudly" and the related "fail early" and "fail hard" are not my invention. For reliable software, they are best practice: "If things are going awry, stop what you're doing and cry for help loudly. If you cannot deliver what the user requested, don't pretend you can."

For closing, here are some random resources on "fail loudly", "fail hard", and "fail early".

@avaris
Copy link
Member

avaris commented Aug 14, 2023

Why not make --fatal=warnings the default then? That's another failure state. After all, if everything was fine you wouldn't get any warnings.

We do have different definitions of "failure".

Edit:

Do you know, or at least can you envision a user who actually relies on the default behavior as is presently implemented?

I don't know and honestly irrelevant.

First, backwards compatibility is a thing. You don't marginally change behavior for no good reason, especially when there is a solution for it. And no, "I like this way better" is not a good reason.

Second, we don't mean the same thing with "failure". If the issue is not "broken beyond any recovery", it is not a failure, as such does not require termination. You listed a bad plugin and it wasn't loaded, missing dependency, whatever. If things blow up later because of that, great there is your "non-zero exit code". If things don't blow up and pelican can run without it, where is the "failure" in this? It is no different than you forgetting to add that plugin.

Third, here is another "bad UX": stop immediately at every non-critical issue. So you have to fix one, run it, fix another, run it again, fix another...

@aknrdureegaesr
Copy link
Author

aknrdureegaesr commented Aug 15, 2023

We do have different definitions of "failure".

Could be. Starting to think about it, mine would probably be: "Something the user wanted does not happen." What's yours?

You listed a bad plugin and it wasn't loaded, missing dependency, whatever. If things blow up later because of that, great there is your "non-zero exit code". If things don't blow up and pelican can run without it, where is the "failure" in this?

The failure is: The output of pelican is utterly broken.

My particular homegrown plugin actually changes the HTML generated. Missing that step means the generated HTML is deficient and must under no circumstances get published to the production server.

FWIW: The context here is an automated workflow. I'm a devop kind of person and thus far have found Pelican very well suited to automated workflows. Stuff is regularly published here after trivial text changes without fear. I want to keep the "without fear", also for other people to whom I've recommended Pelican in the past who generally think like I do.

Third, here is another "bad UX": stop immediately at every non-critical issue. So you have to fix one, run it, fix another, run it again, fix another...

Yes. This disadvantage you point out certainly exists. The industry as a whole has mostly agreed to pay this price, as they consider the benefits (increased overall robustness, in particular) of "fail early" usually outweighs this disadvantage. Now I do not here actually propose introducing "fail early" to Pelican, but only "fail loudly". So I suggest we need not discuss "fail early" in detail here. I'm ready to do so if you want me to, but please, let's find a different place for that discussion.

First, backwards compatibility is a thing.

Yes. We may call this change "Release type: major" then, if you want. You asked this done earlier, I reacted in my pull request, but you have removed this now. Which "release type" value do you want?

Why not make --fatal=warnings the default then? That's another failure state. After all, if everything was fine you wouldn't get any warnings.

Could be done. If the Pelican code duly distinguishes between warnings and errors, this need not be done.

A warning isn't an error. If the code senses that something is weird, that's a warning. If the code senses something is wrong, that's an error. The difference between those two is, "this seems weird, but maybe it is something the user foresaw and wants" vs. "we are clearly not doing what the user asked for".

E.g., let us say a user has requested some software, among other things, to paint green all incoming blue foobars. At startup time, the software finds that no green paint is available. In my thinking, this is clearly a warning, but need not be an error. It may still happen that no blue foobars will be encountered during this run. Maybe green paint is expensive right now, and the user purposefully chose to not supply any, as none will be needed. In summary: All may still be well. But encountering a blue foobar and not being able to paint it green for lack of green paint? That is an error, not a warning. Something the user has specifically asked for does not happen. In this case, the exit value should not indicate to the user "all is well".

@avaris
Copy link
Member

avaris commented Aug 15, 2023

Could be. Starting to think about it, mine would probably be: "Something the user wanted does not happen." What's yours?

I wrote it: "broken beyond any recovery"

The failure is: The output of pelican is utterly broken.

For your case. I try not to generalize and guess what every users case could be.

Yes. We may call this change "Release type: major" then, if you want. You asked this done earlier, I reacted in my pull request, but you have removed this now. Which "release type" value do you want?

What I want is to not change behavior needlessly, especially for things that already have solutions. Will you be here to answer all the "my pelican started crashing after upgrading" issues that could come? You want something, it is already there. You can turn on the behavior you want.

E.g., let us say a user has requested some software, among other things, to paint green all incoming blue foobars. At startup time, the software finds that no green paint is available. In my thinking, this is clearly a warning, but need not be an error. It may still happen that no blue foobars will be encountered during this run. Maybe green paint is expensive right now, and the user purposefully chose to not supply any, as none will be needed. In summary: All may still be well. But encountering a blue foobar and not being able to paint it green for lack of green paint? That is an error, not a warning. Something the user has specifically asked for does not happen. In this case, the exit value should not indicate to the user "all is well".

Not an accurate analogy, IMO.

You want to paint 100 foobars.
5 of them are not input quite correctly, lets say slightly disoriented but could be corrected. That's a warning.
5 of them are completely wrong and cannot be painted, but they could be set aside continue with the rest. That's an error.
You have no paint or there is one that completely breaks the machine. That's critical.

You say let's stop at the first wrong foobar. I say 95 of them can still be painted and those 5 could be set aside. They don't "clog" the machine, why stop?

The good thing is, you can configure your the machine to get your desired behavior.

@aknrdureegaesr
Copy link
Author

You say let's stop at the first wrong foobar. I say 95 of them can still be painted and those 5 could be set aside. They don't "clog" the machine, why stop?

This is proper procedure in a mechanical factory (maybe - I'm not an expert on those). I'll argue it is not in software. For software works differently.

One important difference is: There is no "set aside" (at least not in Pelican). There is no coming back to stuff later that has been set aside earlier.

There is input and there is output. And that is all there is to it.

In a regular factory, maybe every single object produced will later be checked for quality. So if one object isn't produced well, the production of that single object can be retried later. So even if one object isn't produced well, it makes sense to nevertheless continue the production run, as you suggest.

But in software like Pelican, the software is asked to do its job and report on completion. There is one report after all output has been generated. There is no API in Pelican for reporting problems with particular files. This is proper, there shouldn't be. But that means there is no "set aside".

There is also no "repair later" within a Pelican run. After the success report has been delivered, the exit value returned as 0, there is no tomorrow. Pelican has run to completion. That's it. There is no second chance to generate anything. There is no way to go back to any pile of "set aside" incomplete objects.

If there is any problem with the output, the entire run indeed is broken. From Pelican's point of view, broken beyond repair. For, other than in a mechanical factory, there is no repair chance. The run has completed. Anything broken, small or big, is broken beyond repair.

It most be reported as such. It is a question of basic honesty.

The only thing a Pelican run can report as broken (beyond repair) is that entire Pelican run itself. Exit code not 0.

That honest report opens the only repair chance we have: Make sure the user is aware of the problem.

Then the user can fix stuff Pelican can't, such as configuration or environment or whatever. After that external fix, the user can rerun Pelican anew.

For this to work, it is paramount that the user is made aware of the problem. Pelican must not lie "all is well" via exit code 0, if in fact an error occurred. That lie decreases the only repair chance we have, and decreases it tremendously.

Instead, Pelican must honestly confess to the user when the output clearly is not what the user expects. When an error occurred, Pelican should do whatever is in its power to make the user distrust the output. Easy enough: Exit value not 0.

After 30+ years of experience as a software developer, I say: This is what the user expects. Not just me. It is universal. People want to know whether the software did or didn't do what it was tasked to do.

For that is the only repair chance in the case of an error: The user may be able to repair what's broken. Pelican certainly isn't. The user may be able to analyze the underlying reason for the failure. Pelican can't. From Pelican's point of view: Any error is "broken beyond repair".

@avaris
Copy link
Member

avaris commented Aug 16, 2023

pelican logs things. It's not because we are ascii-art fans and want to paint terminals. It's feedback to the user including the issues with different levels of severity that have occurred during the run. Information is there, it is reported. Just because you don't like the format of this feedback does not mean "pelican lied to me". You seem to like exaggerating but it's starting to feel dishonest.

@aknrdureegaesr
Copy link
Author

I owe you an apology.

There is a convention I'm used to: "If a program could not do what it was asked to do, it signals that state of affairs via a non-null exit code." Returning 0 after catching an exception breaks that convention. If Pelican were bound by it, I may have a reason for moral outrage and even saying "Pelican lied to me".

But Pelican has never signed that convention. So I am not entitled to such a statement. I apologize.

@aknrdureegaesr
Copy link
Author

aknrdureegaesr commented Aug 18, 2023

The exitvalue-convention

It goes roughly like this:

If a program could not do what it was asked to do, it signals that state of affairs via a non-null exit value.

Informally phrased: "Fail loudly."

The exitvalue-convention is rather universal.

I have given several resources that recommend it.

More importantly: The convention is at the heart of how automation is done, and has been done for many decades. This includes the venerable make, Python solutions like invoke, and many others.

The exitvalue-convention is Python-ish

The convention is automatically followed by Python programs, if only a few (imho, "standard") rules are followed. It basically boils down to:

  • When something is wrong, throw an appropriate error.
  • Catching an error accepts the responsibility to either repair whatever was wrong, or raise another (or the same) error.

To not follow the exitvalue-convention in a Python program, that program basically has to deviate from the beaten path and roll its own error handling. (This is what Pelican does.)

Not following the exitvalue-convention produces bugs.

Pelican comes with pelican-quickstart, which sets up invoke tasks and make targets.

Using either of these in an automated context, e.g., something like invoke build publish or make regenerate publish without intervening manual checking (I did say "automated context"!), comes with a very real danger that broken HTML is being pushed to the production server. This is the user experience this bug report wants to repair.

(I'm personally coming from an "automate all the things" devops background.)

In such an automated context, the errors would be logged all right, but otherwise ignored, leading to a broken production site. So the fact that the errors are somewhere in the log is nice, but does not prevent the problem.

Following the convention has no long-term known disadvantages

Neither of the two of us could thus far come up with an example where not following the exitvalue-convention is actually useful, from any user's perspective.

One clear disadvantage is known: Switching Pelican to follow the convention is a breaking change. This is a short-term disadvantage.

My suggestion

I honestly believe Pelican would become a better software if it followed the convention, even when no special command line switches are given.

I have no problems with the ability to opt-out of the convention. But I think it is a big improvement for Pelican if the convention is followed normally, without the user having to opt into it via a special command line switch.

This is what I suggest. This is what I have been suggesting here the whole time. It is what my PR does.

Alternative

As an alternative (which I do not recommend), Pelican could continue to stay out of the exitvalue-convention.

To mitigate the disadvantages, I would then suggest

  • the fact that Pelican does not follow the convention should be clearly mentioned in the documentation, e.g., https://docs.getpelican.com/en/latest/publish.html
  • and the opt-in --fatal errors into the convention should be used in the tasks.py and Makefile as generated by pelican-quickstart.

Adopting the convention would be better.

@offbyone
Copy link
Contributor

Speaking as a Pelican user with custom plugins myself, I would definitely prefer if my CI would fail visibly if I push a change that causes my custom plugins to break. I can definitely add the --fatal=errors flag, now that I know about it, but honestly I didn't even realize it was an option I would need to set; I had assumed until now that it was the case. I think I've just gotten lucky so far, but I'd prefer the defaults covered my case.

@justinmayer justinmayer added this to the Pelican 5.0 milestone Nov 1, 2023
@justinmayer justinmayer removed the bug label Nov 16, 2023
@egberts
Copy link
Contributor

egberts commented Jul 24, 2024

Expectation: When the plugin cannot register, the site building stops and Pelican errors out.

Well, no. Plugins are not "essential", pelican can run fine without them. If a plugin registration fails, too bad but it's no reason to stop pelican. If you want to stop pelican if an error occurs use --fatal option:

Weird.

If my pelicanconf.py has a PLUGINS = [ ... ], wouldn't I be expecting each and every one of those plugins to successfully initialized (thereby incurring not a warning but a fatal upon plugin initialization failure)???

After all, I did explicitly stated in my PLUGINS = all the ones I wanted (thereby considered ... "essential"), no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants