Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve config file support #2419

Closed
mmerickel opened this issue Mar 31, 2016 · 20 comments
Closed

improve config file support #2419

mmerickel opened this issue Mar 31, 2016 · 20 comments
Milestone

Comments

@mmerickel
Copy link
Member

Pyramid's config file support has been dependent for a long time on PasteDeploy much to the chagrin of many users who hate ini files due to lack of support for types, environment variable support and non-wsgi scripts.

This issue is about coming up with a better design, so let me first layout some goals:

  • Alternate file formats: YAML, TOML, INI, etc. The only stipulation is that one of these formats is capable of supporting PasteDeploy-style files to maintain bw-compat.
  • Environment variables. There are no specific requirements here but each individual format may support some way of injecting environment variables into the config, along with fallback values for missing values.
  • Alternate runners. PasteDeploy is broken into wsgi components, servers and logging. A missing component is an arbitrary cli application. Along with starting wsgi servers, commonly it should be possible to also pull settings for other cli-based applications such as background workers and other scripts. These apps are about defining groups of settings, not about a Python API for defining a command line app such as main(global_conf, **settings).

Looking at what we have right now. We have pserve which is cli app that loads settings from a couple parts of an ini file. 1) It loads a [server] and then 2) it loads a wsgi app which can be composed from other wsgi apps. In a very general sense pserve is just running some code which is loading settings for isolated targets that it composes together. Great. That's pserve.

I think it should be possible to accomplish these goals without completely overhauling what exists but rather just adding a layer of indirection between Pyramid and PasteDeploy as well as implementing some alternate backends for popular formats (YAML).

So, what can we simplify?

PasteDeploy is doing too much. It is a tightly coupled library between an ini file and wsgi components (app, server). We need to remove PasteDeploy from the pipeline as much as possible and focus on converting arbitrary file formats into sections. Each parser should, on its own, provide support for default values as well as values from environment vars.

A good parser can then be written to support the PasteDeploy format that pserve requires.

@inklesspen has put a lot of work into montague and it improves on several of these points including abstracting the file format from the PasteDeploy format. I think the parser itself is where things falls short. An explicit goal I have is to be able to use montague (or simply the underlying parser) to load sections. The WSGI component is a layer on top of this. The difference here is that I'm trying to make it an explicit goal to make the config file itself reusable outside of a WSGI context. This is only semi-possible right now, and a source of endless problems when using PasteDeploy.

  • PasteDeploy modifies the ini parser but does not expose it publicly so no one else can use it.
  • As a result there are a lot of inconsistencies on whether the file is parsed using PasteDeploy or whether it is parsed by the stdlib ConfigParser.
  • Issues involving escaping variables differently between sections.
  • No inheritance or shared settings. PasteDeploy defines some options, but they aren't usable outside of its supported sections.
@mmerickel
Copy link
Member Author

My goal is for pyramid to stop caring about the file format but rather interface with a top-level api that can return it the logging, app, and server configs for a particular file path. The rest is up to you the interface to define. This is not unlike montague, but I would like the added support for loading arbitrarily named sections from the file such that 1) it can be re-parsed for settings without actually loading the app or server and 2) the same parser can be used for other purposes other than just the wsgi apps. The downfall of all approaches right now is that the parser is hidden behind apis that can only load wsgi/server/filter (montague adds logging) objects and every other use-case is ignored (any non-wsgi apps that also need settings).

cc @inklesspen, @bertjwregeer, @msabramo @sontek

@nek4life
Copy link
Contributor

nek4life commented Apr 7, 2016

I don't mind the ini format at all. It's simply and does what it needs to do for the most part. The lack of being able to use environment variables is what drives me nuts That being said alternative config formats like yaml would be very welcome.

@pauleveritt
Copy link
Member

You briefly mention multiple config files. Paste has some support for an include-like function, but it's not well-known and has some sharp edges.

Might that be in scope? That is, the abstraction is capable of loading multiple files, possibly in different formats, and make them sections in a global config abstraction?

@mmerickel
Copy link
Member Author

@pauleveritt What do you mean "in scope"? I'm aware of that feature and it definitely has its limitations but you're right it can be used as a poor-man's inheritance. In my proposal PasteDeploy would basically be used for the ini format, alongside some extra support for using its private parser to pull in other non-wsgi sections so all of that functionality would remain unchanged for bw-compat.

@msabramo
Copy link
Contributor

msabramo commented Apr 7, 2016

What to do about logging?

So pserve is the thing that calls setup_logging, which is a fairly simple wrapper around logging.config.fileConfig. I wonder if there are benefits to Pyramid handling logging or should it be dropped in a Pyramid 2.0?

There is currently an assumption that logging config will exist in the same ini file as the PasteDeploy stuff, so it all gets lumped together which makes it hard to change stuff. It seems a bit weird to me that two different code paths are used for different sections of the same file. In other words, I have a development.ini and mostly it is getting parsed by the PasteDeploy, but the logging stuff is getting parsed by the logging module. So even if PasteDeploy or Montague had some whiz-bang features for doing better variable interpolation or using environment variables, then the logging stuff can't use that. Then you have to explain to users that you can use $(env:DEBUG) or whatever in their ini file BUT not in the logging sections.

I see two options:

  1. Logging goes in a separate file and is limited to the limited set of features that the Python logging and configparser modules give you. This means you are going to have trouble using environment variables to configure logging, which is a very good usage of environment variables -- e.g.: LOGGING_LOGGER_ROOT_LEVEL=DEBUG pserve app.ini
  2. Make Pyramid no longer responsible for logging. The application configures logging however it likes and then can use environment variables or whatever. Pyramid setting up logging for the application is a convenience, but it comes at the price of reduced flexibility. And flexibility is one of the things that people love about Pyramid.

@mmerickel
Copy link
Member Author

The whole point in my discussion up above is that things are currently separated which is causing these issues. PasteDeploy does not control returning a dict of logging values and thus Pyramid must parse the file again itself. I would like to normalize this under a single interface with a single parser. pserve should absolutely be responsible for logging - it's an application which is the entity responsible for applying configuration.

Let me try to be more specific with a concrete api that I'm tossing around here:

class ILoader(Interface):
    def get_wsgi_app(name):
        """ Return a WSGI application by the given name."""

    def get_wsgi_filter(name):
        """ Return a callable that accepts a WSGI application and returns a new WSGI application."""

    def get_wsgi_server(name):
        """ Return a callable that accepts a WSGI application and starts serving requests."""

    def get_app_settings(name):
        """ Return a settings dictionary for an entry point defined with the given name."""

    def setup_logging():
        """ Apply logging configuration to the Python stdlib logging module."""

def get_loader(uri):
    """ Return an implementation of ``ILoader`` that supports the given ``uri``."""

It is then up to the implementation of the loader to define where globals and logging come from. For example, it could load a yaml dict, or it could call logging.config.fileConfig on the file. I just want to remove that stuff from Pyramid core to allow people to define their own loaders. I stress that this is very similar to montague's current interface, except a little bit higher level I think and less specific to wsgi.

@inklesspen
Copy link
Contributor

logging.dictConfig() was added in 2.7. If #2368 goes through, we could decide to simply require that loaders return a dict suitable for passing to dictConfig. (I have implemented code for transforming an INI-based logging config section into such a dict.)

@mmerickel
Copy link
Member Author

I'd prefer pyramid didn't care about dicts but rather just asked the loader to do the job. We are stuck calling fileConfig for ini files, for example. I don't think anyone is wanting to parse the ini logging config into a dict just to satisfy the interface, and I don't see the point anyway.

@msabramo
Copy link
Contributor

msabramo commented Apr 7, 2016

What if an app wants to use a different logging system like Logbook, twiggy, or structlog?

@mmerickel
Copy link
Member Author

What if an app wants to use a different logging system like Logbook, twiggy, or structlog?

I can't really have a discussion about that without someone who cares attempting to implement an example using the interface I provided and then telling me why it doesn't work.

From my perspective (pyramid) it doesn't give a damn what logging you are using, it simply calls setup_logging() and assumes the process is configured. If there's some issue that requires passing in a context object or something else then I'll have to see an example. The proposed interface will work for the expected use case which is configuring a global logging system like the stdlib logging module.

@mmerickel
Copy link
Member Author

(I have implemented code for transforming an INI-based logging config section into such a dict.)

Sorry @inklesspen just realized you said that. I'd still like to stay away from the dict itself and focus on the task that pyramid cares about which is just "setup logging". Is there any reason anyone can see for passing around the dict instead? My problem with the dict is then we are coupled to python stdlib logging by design versus hiding whatever happens behind the function call.

@inklesspen
Copy link
Contributor

pastedeploy is actually three things: an ini file format, a python api, and some entry point types for apps, servers, etc. So far, I've seen a proposal for a replacement for the second item in the list, which would be implemented in a way that allows for the first item to be swapped out. But that still leaves the question of entry points.

The pastedeploy entry point types are a little annoying, because the global conf is passed as a dict while local conf is passed as keyword arguments. Also, there's two different entry point styles for apps and filters, as well as a second entry point style for servers that takes the wsgi app and immediately starts running the server!

Since ideally I should be able to convert a scaffolded app from JSON to YAML config just by changing the loader implementation, I think we need a common set of entry point signatures as well. I also think we shouldn't just blindly reuse pastedeploy's; we deserve to fix the warts.

Off-the-cuff proposal:

def app_factory(local_conf, global_conf):
    return wsgi_app

def filter_factory(local_conf, global_conf):
    return wsgi_filter

def server_factory(local_conf, global_conf):
    return wsgi_server

Simple and boring, which is good. The only real thing we've lost here is Paste's "Composite" app type, which I think is not really used much these days. (The filter_app_factory filter type is just a trivial adjustment from filter_factory.)

There is, unfortunately, one other issue with PasteDeploy's ini syntax, compared to your proposed Loader interface. PasteDeploy provides three ways to get an app with a filter applied to it: the filter-with value within an app's configuration, the filter-app section type, and the pipeline section type. (pipeline is really just an alternative syntax for filter-app.)

If I define the following INI file:

[filter-app:main]
use = lowerify
next = actual

[filter:lowerify]
egg:montague_testapps#caps
method_to_call = lower

[app:actual]
use = egg:montague_testapps#basic_app

I can load the actual app and the lowerify filter separately, or I can load the "app" main, which is the app and filter combined.

On the other hand, compare this INI file:

[app:main]
use = egg:montague_testapps#basic_app
filter-with = upperfilter

[filter:upperfilter]
use = egg:montague_testapps#caps
method_to_call = upper

Here, since I have specified the filter-with argument, PasteDeploy can only load the app with the filter applied, not the app by itself.

In montague, I solved this problem by parsing the INI file into a tree structure, then rewriting it into a pipeline style syntax where each item had its own unique name (possibly a generated name, if necessary). I think that's definitely overkill. Instead, we should choose to limit our level of PasteDeploy bwcompat. Full bwcompat requires you to adhere to design decisions that may have made sense in the early days of WSGI, but make little sense now.

Lastly, I'm not really a fan of setup_logging doing things while all the other loader methods return objects, but I can live with it.

@mmerickel
Copy link
Member Author

pastedeploy is actually three things: an ini file format, a python api, and some entry point types for apps, servers, etc. So far, I've seen a proposal for a replacement for the second item in the list, which would be implemented in a way that allows for the first item to be swapped out. But that still leaves the question of entry points.

All of this is true, but they are PasteDeploy features. They are not features of Pyramid and they are not features that Pyramid uses or depends on. Pyramid core currently cares about 5 APIs and that's it: get_loader, get_wsgi_app, get_wsgi_server and setup_logging, get_settings (for things like [pshell] where general parsing is required). Those obviously aren't the exact APIs but they are definitely the concepts. Nowhere in Pyramid does it care about entry points or ini file formats specifically so those don't belong in this API.

There's 2 steps here:

  1. Define exactly what Pyramid needs so we can get everything else out of Pyramid into another library. This is the goal of my proposed API.
  2. Define exactly how montague can accomplish being the thin layer between pyramid (and other apps) and the actual file / loader. This likely defines some common utilities (providing some methods such as app_factory and some entry point support) and also a standard way to handle the get_loader(config_uri) to lookup possible loaders.

What I'm trying to spec out here is an API that gets AWAY from PasteDeploy as much as possible and focuses on what top-level applications need. That is almost entirely distilled down to get_app_settings and setup_logging, but I'd like to also keep some of the wsgi composibility in the API itself since WSGI is such a composable standard unlike most other applications. Obviously get_app_settings('app:main') abstracts the file format into a dict which is eventually used by get_wsgi_app('main') which would then further use the use statements and any entry points supported by the loader to build the app from those settings. Similarly for a pipeline it would need to get settings for several filters and one app and construct them. Composite is also the same. At the end for all of those we just care about the resulting wsgi app.

We should probably update the loader methods to accept some extra_opts dictionary that can be used similarly to global_conf to enhance the loader's settings. Originally I thought the loader would be solely reponsible for where the global_conf came from but this prevents you from passing in CLI options and such things we'd like to keep.

Entry points and filters are both things that are loader-specific. Now, I know I'm focusing on Pyramid's point of view so let me cross to the other side for a second and focus on the implementation. It would obviously be a good idea for most loaders to support similar entry points and logging configuration so that montague or whatever we end up with would provide some support for things like taking a local_conf and global_conf and returning a filter, or an app, etc using some standard entry points. Then each loader would call into those shared APIs (if it wants!!) but could also do something totally custom and Pyramid simply wouldn't care.

Off-the-cuff proposal:

Where does Pyramid call app_factory(local_conf, global_conf) from? Where does that API accept the file path / config_uri? If I need to come up with a local_conf inside Pyramid then I must parse the file myself, which means Pyramid is now responsible for parsing files! I think you're focusing on details that are out of the scope of "let's define exactly what Pyramid cares about and how to get everything else outside of course and into the implementors hands to do with what they want". If a flexible library exposes those 2 interfaces I proposed then Pyramid can start using it today and the rest of the battle about which format/loader to use can be dealt with outside of Pyramid which I think is what everyone wants.

For the config_uri I have been thinking about an approach inspired by SQLAlchemy urls and its dialect support. Think: pastedeploy:///path/to/development.ini versus mikesyaml:///path/to/foo.yml) where for some common formats (like INI and YAML) we'd define some default such that /path/to/foo.yml was internally mapped to awesomeYamlLoader:///path/to/foo.yml and ini would be mapped to pastedeploy for bwcompat. If we use this get_loader api then we don't have to care, you can do pserve inklesspensTomlLoader://foo.toml#main. Note the fragment can be stripped off as it is now and used as the name... no changes necessary there. This is a separate discussion however. I understand that uri schemes may not be ideal on the CLI, my point is just to show a possible approach to letting the user decide what loader is used to handle the file.

@inklesspen
Copy link
Contributor

My off-the-cuff proposal was intended for Pyramid apps, not Pyramid itself.

Currently, if I have a scaffolded Pyramid app, it contains a function like this:

def main(global_config, **settings):
    """ This function returns a Pyramid WSGI application.
    """
    config = Configurator(settings=settings)
    # do many configurations here
    return config.make_wsgi_app()

That main(global_config, **settings) function is the way it is because it must conform to the PasteDeploy entry point signature. Likewise, the scaffolded app has [paste.app_factory] in its entry points. If you are fine leaving it up to montague or a similar library to specify the entry point signature, ok; I had expected Pyramid would want to have a say in what the entry point signature is.

@mmerickel
Copy link
Member Author

I don't think Pyramid is interested in dictating a common signature. I say this because Pyramid itself advocates example apps without that signature in single-file apps. That is only there for PasteDeploy integration. It does make for a nice standard entry point that I think custom loaders should adopt but when you decide to write your app with a particular loader+format then you conform to its expectations. The sad part is that those apps are no longer loadable by Pyramid's tools like pserve. The advantage here is that Pyramid could still load / use your app if an appropriate loader exists. Imagine a loader that could load the required objects from a Python module like pserve wsgi.py or something.

@sontek
Copy link
Member

sontek commented Apr 17, 2016

The one thing I'd love to see on the pyramid side of things is exposing the things the p* commands needs as properties on the wsgi app.

wsgi_app.routes
wsgi_app.tweens

If pyramid didn't care about the configuration format and exposed everything the commands would need to build a CLI around it then that would allow an ecosystem to grow around it. If someone wants to create a pyramid CLI system today that loads YAML or TOML they also have to re-implement pserve, proutes, ptweens, etc.

Especially since single file apps lose the ability to print their routes right now because the commands require a config file. I'd love to see this:

app.py

app = config.make_wsgi_app()

proutes

python -c 'from app import app;print(app.routes)'

and pserve would just be using standard server:

gunicorn app:app

@sontek
Copy link
Member

sontek commented Apr 17, 2016

This also similar to what projects like flask do: http://flask.pocoo.org/snippets/117/

@mmerickel
Copy link
Member Author

As a status update on this issue, @huntcsg and I started https://github.com/mmerickel/plaster and https://github.com/mmerickel/plaster_pastedeploy at the sprints. I think they're starting to come together nicely but I would very much appreciate some help with some of the finer points. The library is basically a thin wrapper around entry points to allow a loader to be selected via a path or url. Plaster is focused only on lookup, leaving basically everything (including wsgi logic) up to the loaders themselves making the library hopefully very useful for non-Pyramid apps as well. The idea is that Pyramid will document what it expects a loader to implement in order to work. The wsgi-specific methods will be get_wsgi_app, get_wsgi_server, get_wsgi_app_settings, and get_wsgi_server_settings. With these methods and the standard get_settings and setup_logging it should be a very simple search/replace to update Pyramid to use this api in the few places that matter. Anyone may then define a loader with the required methods and expect it to work with their Pyramid app as well as with other apps that can rely on simply get_settings and setup_logging.

I very much expect that a significant amount of the work done in montague_yaml or montague_toml could be converted easily into a plaster loader. The reason for writing plaster instead of modifying montague is because it just takes a very different approach - as badly as I feel about that, it felt unavoidable at this stage. I feel I handled the efforts with montague pretty poorly and I hope that offering up this solution can help keep the ball rolling. We likely wouldn't be even considering replacing this part of Pyramid right now if it weren't for montague.

@jvanasco
Copy link
Contributor

jvanasco commented Jul 9, 2016

Since these exciting changes are forthcoming, I wanted to suggest/request a tangential feature -- a hook/feature to note the configuration file path (and loader type) into the registry or registry.settings when/if that information is available

The desire for this is largely for testing and development purposes.

@mmerickel
Copy link
Member Author

#2985 is open for evaluation. It is a branch of pyramid that allows completely replacing the config file parser. See http://docs.pylonsproject.org/projects/plaster/en/latest/ for more info on how to do that.

If you have a chance to look please let me know how it goes.

@mmerickel mmerickel added this to the 1.9 milestone Apr 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants