-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LibYAML bindings by default if installed #436
Conversation
Not sure if there is any interest in doing something like this, or if there is a better way to "automatically" use LibYAML? It seems somewhat relevant to the efforts going on with #303 .. (The failed tests are expected, as they would normally be skipped.) |
It's definitely a common "developer quality of life" problem that everyone has to solve, but changing the defaults or tweaking them globally could possibly break things (eg, a pip-installed pyyaml built against an OS-packaged libyaml that later got removed or (worse) updated to an incompatible version would fail incomprehensibly). One idea that might be worth exploring is something like a parallel set of loader/dumper defs that just map to the C versions if available and the pure-Python ones if not (basically what you have here with the |
Thank you for the feedback! I guess my premise was that on a system with libyaml installed and functioning, it doesn't make sense for some code to use pure-Python and some libraries to use the C versions - it would usually be all or nothing, and typically I would want it enabled, unless there was some specific reason I needed to disable it. Any thoughts on how a parallel set of loaders/dumpers might work? I started with just adding the Implementation details aside, is it a reasonable feature to be able to replace all usages of the python loader and dumper with the C versions, including in any imported libraries, or is this unlikely to be acceptable in the core pyyaml code? |
For better or worse, there are enough observable behavior differences between the libyaml-backed bits and the pure-Python bits that I think it's a bad idea to force existing code to use a different one without an explicit opt-in. I can think of several chunks of pyyaml code I've written that would break if the default impl were swapped to the libyaml-backed classes. What you've added with just the new
Short answer: probably not. Longer answer: At first blush, providing a globally settable default Loader/Dumper (and maybe others) sounds like a good idea. Anytime something like this comes up though, the question should be: "what would happen if two libraries did this?" Unfortunately, there are parts of the pyyaml API that already encourage mutation of global state for configuration and behavioral changes, but we shouldn't be making more of them, and this one in particular has the capability to do real damage. eg, your library sets it to |
Some pyyaml users prefer the fast performance of libyaml, and are less concerned about minor incompatibilities. There is no existing method to globally enable the libyaml bindings, and different libraries may do things in different ways. This change provides a way for users to request global replacement of the default Python Loaders and Dumpers with the libyaml bindings, prior to the initial import of the pyyaml library. To enable in Python3: import builtins builtins._yaml__use_libyaml_by_default__ = True import yaml To enable in Python2: import __builtin__ __builtin__._yaml__use_libyaml_by_default__ = True import yaml Assuming that libyaml is installed, the result will be that: yaml.BaseLoader == yaml.cyaml.CBaseLoader yaml.SafeLoader == yaml.cyaml.CSafeLoader yaml.FullLoader == yaml.cyaml.CFullLoader yaml.UnsafeLoader == yaml.cyaml.CUnsafeLoader yaml.Loader == yaml.cyaml.CLoader yaml.BaseDumper == yaml.cyaml.CBaseDumper yaml.SafeDumper == yaml.cyaml.CSafeDumper yaml.Dumper == yaml.cyaml.CDumper This allows the use of the "sugar" methods (safe_load, full_load, unsafe_load, etc.) as is, benefiting from the performance boost of libyaml if it is installed. This should also provide very broad compatibility with existing code that may explicitly reference specific loaders and dumpers by name.
Fair points. Definitely an opt-in would be a requirement, I just don't know what the right mechanics are. The latest update uses Thinking about this more, the I had not considered replacing In terms of "what would happen if two libraries did this?" there are already some problems that would actually go away if we rebind the names early. Here is an example of some code (borrowed from the docs) that works if libyaml is not present, but breaks if libyaml is available. It's not clear to me that either the module or the user code is wrong, or how it can be fixed without messing with the 3rd-party library. monster.py (3rd-party library that provides a class that is a import yaml
class Monster(yaml.YAMLObject):
yaml_tag = u'!Monster'
def __init__(self, name, hp, ac, attacks):
self.name = name
self.hp = hp
self.ac = ac
self.attacks = attacks
def __repr__(self):
return "%s(name=%r, hp=%r, ac=%r, attacks=%r)" % (
self.__class__.__name__, self.name, self.hp, self.ac, self.attacks) use_monster.py (user code): from monster import *
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
m = """
!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard
"""
print(yaml.load(m, Loader=Loader)) Without libyaml:
With libyaml:
This is at least fixable from an end-user perspective, by not trying to use libyaml at all. However, if you have two libraries that each provide YAMLObject classes, one which can only be constructed with the Python constructors and one that can only be constructed with C constructors, then you might be stuck. With the proposed patch, this can actually work. try:
import builtins
builtins._yaml__use_libyaml_by_default__ = True
except:
import __builtin__
__builtin__._yaml__use_libyaml_by_default__ = True
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
m = """
!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard
"""
print(yaml.load(m, Loader=Loader)) The try/except for cyaml checking can be dropped, although leaving it is fine. try:
import builtins
builtins._yaml__use_libyaml_by_default__ = True
except:
import __builtin__
__builtin__._yaml__use_libyaml_by_default__ = True
from yaml import Loader, Dumper
from monster import *
m = """
!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard
"""
print(yaml.load(m, Loader=Loader)) I don't love this, but but it seems not-too-horrible. It's opt-in, it only kicks in the first time |
I think you're missing one of my primary points: a global opt-in shouldn't be on the table, since if two libraries doing that with different needs get combined in a project, somebody loses. The best we can do in a way that doesn't encourage bad behavior is basically what you've already done: provide a new set of aliased types that will use libyaml if present and fall back if not. The way I was using "opt-in" means every caller still needs to opt-in to that behavior by using the new aliases. Old code that's unaware or unwilling won't be affected, and new code that won't work (for whatever reason) can still select a specific type that will work regardless of the choices made by another developer or library. |
I guess it depends on how we are defining "need" and "loses". There is definitely code that would break if it were forced to use libyaml, which would be bad. There are probably many libraries that try to use libyaml, and some may be unacceptably slow without it, but I suspect that most of them can fall back to pyyaml. We could categorize and matrix them and see if they could possibly share a common set of bindings.
In most cases, we can find a compromise, although one library may be unhappy (-). Hypothetically, say we have two libraries in a combined project with incompatible requirements (=Py and =C), but everything is working, because each explicitly references the desired bindings. What would happen if the feature were introduced?
I'm definitely approaching this more from and end-user perspective, where I want to influence the behavior of all the libraries I'm importing without any updates to those libraries. It was intended to address the "floating" cases - a way to request libyaml as a best-effort. If you look at it from the library author approach, this is not really a solution at all, since in most cases the library would be setting the option too late.
The complexity with the aliased types is the add_contructor(Foo, foo_constructor, Loader=yaml.SafeLoader)
add_contructor(Foo, foo_constructor, Loader=yaml.CSafeLoader) This is more likely: try:
from yaml import CSafeLoader as SafeLoader
except:
from yaml import SafeLoader as SafeLoader
add_contructor(Foo, foo_constructor, Loader=SafeLoader) And this can already cause breakage, because it's not really possible to know which loader knows how to do the right thing. (Again, I'm mostly considering the use case of importing libraries that define classes of YAMLObjects, and trying to use them all together in a single set of YAML documents that can be dumped and loaded, not libraries that each do their own thing separately.) I think the problem may actually get worse if a third option is introduced, unless we adjust the class definitions to share (or do some sort of cross-lookup). Anyway ... all that being said, the "global" option feels more appropriate as a monkey-patch outside of pyyaml. How would you feel about moving the function definitions and YAMLObject class out of import importlib
import yaml
yaml.Loader = yaml.CLoader
yaml.Dumper = yaml.Dumper
importlib.reload(yaml.functions) |
Closing the pull request .. if anyone else is interested in this functionality, they can try this out: |
PyYAML users generally want the fast performance of LibYAML.
This change automatically rebinds yaml.Dumper to CDumper, and
yaml.Loader to CLoader, if LibYAML bindings are available.
This allows the use of the "sugar" methods (safe_load, full_load,
unsafe_load, etc.) as is, benefiting from the performance boost of
LibYAML if it is installed.