-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin registration fails, yet Pelican continues unperturbed, resulting in broken site. #3172
Comments
Well, no. Plugins are not "essential",
|
That's an interesting argument. To exaggerate it: We could replace Pelican with a single line of code Please, view this from a user's perspective: The user ordered, via some configuration, to have some plugin, presumably (as in my case), as that plugin is needed to correctly generate the site. What sense does it make to for the program to report "success" if that success can be achieved only by ignoring expressed wishes the user has uttered? Do you think it happens a lot that a user would activate a plugin, but doesn't really care that much whether that plugin is actually doing anything or failing miserably? That's certainly not the way I as a user use software. What Pelican does now breaks the "principle of least surprise" for most users, those who configure stuff with a view that that configuration isn't ignored. I think people who don't care whether a plugin is actually loaded or not should simply not load it. I don't see what advantage those people get from the present "first-class support for undetermined outcome". Am I missing something? |
Thank you! It is of course much better to fix the problem at this level rather than just for plugin loading. I changed my pull request #3173 accordingly. |
That's a bit hyperbole, isn't it...
It's not ignored. You get a log message saying there is something wrong with your wishes. Same happens when a content cannot be parsed, i.e. let's say you wrote invalid markdown in on one of the articles,
This is a breaking change. Is there any compelling reason for imposing this default behavior to everyone, rather than you setting it for your own use? |
I agree that Pelican gives me the option to get the error handling I want. I argue the present default value of that option is less than optimal: "To see whether it worked, carefully scrutinize the log." This is broken UX. Since the inception of the exit value as part of the venerable "pipes and filters" architecture decades ago, people expect differently: "If the exit value is 0, there is no need to wade through the log - all went well." This traditional UX is superior. So I propose changing the default from This is a breaking change, yes. But... Do you know, or at least can you envision a user who actually relies on the default behavior as is presently implemented? "Errors are not reported via the exit value, you need to read the log to find out," - Who would prefer that? Few, if any. So few, if any, would actually experience the break. From my software development experience (I'm a freshly retried software developer, starting in the profession in 1990), these people are either non-existing or in a very tiny minority. Still, they are not abandoned. They can have what they want via Normal people are better served with the usual, time-proven "fail loudly" way of handling problems. This indeed is time-proven. "Fail loudly" and the related "fail early" and "fail hard" are not my invention. For reliable software, they are best practice: "If things are going awry, stop what you're doing and cry for help loudly. If you cannot deliver what the user requested, don't pretend you can." For closing, here are some random resources on "fail loudly", "fail hard", and "fail early". |
Why not make We do have different definitions of "failure". Edit:
I don't know and honestly irrelevant. First, backwards compatibility is a thing. You don't marginally change behavior for no good reason, especially when there is a solution for it. And no, "I like this way better" is not a good reason. Second, we don't mean the same thing with "failure". If the issue is not "broken beyond any recovery", it is not a failure, as such does not require termination. You listed a bad plugin and it wasn't loaded, missing dependency, whatever. If things blow up later because of that, great there is your "non-zero exit code". If things don't blow up and Third, here is another "bad UX": stop immediately at every non-critical issue. So you have to fix one, run it, fix another, run it again, fix another... |
Could be. Starting to think about it, mine would probably be: "Something the user wanted does not happen." What's yours?
The failure is: The output of pelican is utterly broken. My particular homegrown plugin actually changes the HTML generated. Missing that step means the generated HTML is deficient and must under no circumstances get published to the production server. FWIW: The context here is an automated workflow. I'm a devop kind of person and thus far have found Pelican very well suited to automated workflows. Stuff is regularly published here after trivial text changes without fear. I want to keep the "without fear", also for other people to whom I've recommended Pelican in the past who generally think like I do.
Yes. This disadvantage you point out certainly exists. The industry as a whole has mostly agreed to pay this price, as they consider the benefits (increased overall robustness, in particular) of "fail early" usually outweighs this disadvantage. Now I do not here actually propose introducing "fail early" to Pelican, but only "fail loudly". So I suggest we need not discuss "fail early" in detail here. I'm ready to do so if you want me to, but please, let's find a different place for that discussion.
Yes. We may call this change "Release type: major" then, if you want. You asked this done earlier, I reacted in my pull request, but you have removed this now. Which "release type" value do you want?
Could be done. If the Pelican code duly distinguishes between warnings and errors, this need not be done. A warning isn't an error. If the code senses that something is weird, that's a warning. If the code senses something is wrong, that's an error. The difference between those two is, "this seems weird, but maybe it is something the user foresaw and wants" vs. "we are clearly not doing what the user asked for". E.g., let us say a user has requested some software, among other things, to paint green all incoming blue foobars. At startup time, the software finds that no green paint is available. In my thinking, this is clearly a warning, but need not be an error. It may still happen that no blue foobars will be encountered during this run. Maybe green paint is expensive right now, and the user purposefully chose to not supply any, as none will be needed. In summary: All may still be well. But encountering a blue foobar and not being able to paint it green for lack of green paint? That is an error, not a warning. Something the user has specifically asked for does not happen. In this case, the exit value should not indicate to the user "all is well". |
I wrote it: "broken beyond any recovery"
For your case. I try not to generalize and guess what every users case could be.
What I want is to not change behavior needlessly, especially for things that already have solutions. Will you be here to answer all the "my pelican started crashing after upgrading" issues that could come? You want something, it is already there. You can turn on the behavior you want.
Not an accurate analogy, IMO. You want to paint 100 foobars. You say let's stop at the first wrong foobar. I say 95 of them can still be painted and those 5 could be set aside. They don't "clog" the machine, why stop? The good thing is, you can configure your the machine to get your desired behavior. |
This is proper procedure in a mechanical factory (maybe - I'm not an expert on those). I'll argue it is not in software. For software works differently. One important difference is: There is no "set aside" (at least not in Pelican). There is no coming back to stuff later that has been set aside earlier. There is input and there is output. And that is all there is to it. In a regular factory, maybe every single object produced will later be checked for quality. So if one object isn't produced well, the production of that single object can be retried later. So even if one object isn't produced well, it makes sense to nevertheless continue the production run, as you suggest. But in software like Pelican, the software is asked to do its job and report on completion. There is one report after all output has been generated. There is no API in Pelican for reporting problems with particular files. This is proper, there shouldn't be. But that means there is no "set aside". There is also no "repair later" within a Pelican run. After the success report has been delivered, the exit value returned as 0, there is no tomorrow. Pelican has run to completion. That's it. There is no second chance to generate anything. There is no way to go back to any pile of "set aside" incomplete objects. If there is any problem with the output, the entire run indeed is broken. From Pelican's point of view, broken beyond repair. For, other than in a mechanical factory, there is no repair chance. The run has completed. Anything broken, small or big, is broken beyond repair. It most be reported as such. It is a question of basic honesty. The only thing a Pelican run can report as broken (beyond repair) is that entire Pelican run itself. Exit code not 0. That honest report opens the only repair chance we have: Make sure the user is aware of the problem. Then the user can fix stuff Pelican can't, such as configuration or environment or whatever. After that external fix, the user can rerun Pelican anew. For this to work, it is paramount that the user is made aware of the problem. Pelican must not lie "all is well" via exit code 0, if in fact an error occurred. That lie decreases the only repair chance we have, and decreases it tremendously. Instead, Pelican must honestly confess to the user when the output clearly is not what the user expects. When an error occurred, Pelican should do whatever is in its power to make the user distrust the output. Easy enough: Exit value not 0. After 30+ years of experience as a software developer, I say: This is what the user expects. Not just me. It is universal. People want to know whether the software did or didn't do what it was tasked to do. For that is the only repair chance in the case of an error: The user may be able to repair what's broken. Pelican certainly isn't. The user may be able to analyze the underlying reason for the failure. Pelican can't. From Pelican's point of view: Any error is "broken beyond repair". |
|
I owe you an apology. There is a convention I'm used to: "If a program could not do what it was asked to do, it signals that state of affairs via a non-null exit code." Returning 0 after catching an exception breaks that convention. If Pelican were bound by it, I may have a reason for moral outrage and even saying "Pelican lied to me". But Pelican has never signed that convention. So I am not entitled to such a statement. I apologize. |
The exitvalue-conventionIt goes roughly like this:
Informally phrased: "Fail loudly." The exitvalue-convention is rather universal.I have given several resources that recommend it. More importantly: The convention is at the heart of how automation is done, and has been done for many decades. This includes the venerable The exitvalue-convention is Python-ishThe convention is automatically followed by Python programs, if only a few (imho, "standard") rules are followed. It basically boils down to:
To not follow the exitvalue-convention in a Python program, that program basically has to deviate from the beaten path and roll its own error handling. (This is what Pelican does.) Not following the exitvalue-convention produces bugs.Pelican comes with Using either of these in an automated context, e.g., something like (I'm personally coming from an "automate all the things" devops background.) In such an automated context, the errors would be logged all right, but otherwise ignored, leading to a broken production site. So the fact that the errors are somewhere in the log is nice, but does not prevent the problem. Following the convention has no long-term known disadvantagesNeither of the two of us could thus far come up with an example where not following the exitvalue-convention is actually useful, from any user's perspective. One clear disadvantage is known: Switching Pelican to follow the convention is a breaking change. This is a short-term disadvantage. My suggestionI honestly believe Pelican would become a better software if it followed the convention, even when no special command line switches are given. I have no problems with the ability to opt-out of the convention. But I think it is a big improvement for Pelican if the convention is followed normally, without the user having to opt into it via a special command line switch. This is what I suggest. This is what I have been suggesting here the whole time. It is what my PR does. AlternativeAs an alternative (which I do not recommend), Pelican could continue to stay out of the exitvalue-convention. To mitigate the disadvantages, I would then suggest
Adopting the convention would be better. |
Speaking as a Pelican user with custom plugins myself, I would definitely prefer if my CI would fail visibly if I push a change that causes my custom plugins to break. I can definitely add the |
Weird. If my After all, I did explicitly stated in my |
Issue
I'm running my own plugin that is needed to build my site. That plugin has some error handling at registration time.
Expectation: When the plugin cannot register, the site building stops and Pelican errors out.
Actually seen: When the plugin cannot register, an error is logged, but the site building continues. Pelican reports success to the operating system (exit value 0). But the generated site is broken, as the plugin never ran.
To reproduce, use a plugin like this one that errors out each time:
I found this code in Pelican's
__init__.py
which I believe is pertinent:The text was updated successfully, but these errors were encountered: