Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a register() function for registering custom Readers and Writers #431

Closed
dotsdl opened this issue Sep 12, 2015 · 12 comments
Closed

Add a register() function for registering custom Readers and Writers #431

dotsdl opened this issue Sep 12, 2015 · 12 comments

Comments

@dotsdl
Copy link
Member

dotsdl commented Sep 12, 2015

This topic was originally broached in #429.

Currently, to make a custom Reader or Writer usable by MDAnalysis, it must be manually added to the appropriate dictionary.

I think it better to create a top-level namespace function, perhaps MDAnalysis.register() that takes in readers and writers of any variety and adds them to the appropriate dict. I recently did this to datreant to make custom Treants identifiable by the collections and other built-in machinery, which is needed for objects such as Sim in MDSynthesis.

If Readers/Writers have attributes with all the information included, then registering a custom-built one would be as simple as:

import MDAnalysis as mda

# add my custom reader
mda.register(MyReader)

# add my custom writer
mda.register(MyWriter)

This would avoid Reader/Writer developers to have to manually deal with the dictionaries; they need only make sure their Reader/Writer has all the necessary components to be processed, which is all in one place in the class definition. I think it also important that the register function be exposed at the top of the namespace to make it easy to access (it can live anywhere in the library that's appropriate internally).

The same should also work as painlessly for topology Readers/Writers.

@richardjgowers
Copy link
Member

So this will only work for the current session right? We currently have the ability to pass Classes as kwargs to a Universe, but maybe registering is a better solution

mda.Universe(top, traj, topology_format=Parser, format=Reader)

You'll also need to include the file extension that is associated with the Reader, so maybe something more like:

mda.register({'TRR': TRRReader}) 

But maybe the associated file extension should belong to the Reader class in the first place, in which case the dict of formats to Readers could initially be built as:

_READERS = {readerclass.format: readerclass for readerclass in classlist}

@kain88-de
Copy link
Member

I would like to have a decorator for this.

@register_reader('trr')
class TRRReader(BaseReader)
# ... magic

@register_writer('trr')
class TRRWriter(BaseWriter)
# .. magic

This makes the code very readable and easy to check if a reader is correctly registered. It is still possible to add a custom reader/writer during a session. We can also use this decorator in MDAnalysis itself.

@richardjgowers
Copy link
Member

Oooh I like that. Would get rid of those ugly dicts too. There's already a
bunch of tests for getting correct readers for a format, so it's just the
code that needs doing

On Sat, 12 Sep 2015 14:11 kain88-de [email protected] wrote:

I would like to have a decorator for this.

@register_reader('trr')class TRRReader(BaseReader)# ... magic
@register_writer('trr')class TRRWriter(BaseWriter)# .. magic

This makes the code very readable and easy to check if a reader is
correctly registered. It is still possible to add a custom reader/writer
during a session. We can also use this decorator in MDAnalysis itself.


Reply to this email directly or view it on GitHub
#431 (comment)
.

@richardjgowers
Copy link
Member

I was having a play around this morning...

formats = {}


class AddToFormats(type):
    def __new__(cls, clsname, bases, dct):
        newcls = super(AddToFormats, cls).__new__(cls, clsname, bases, dct)

        if not dct['fmt'] is None:
            formats[dct['fmt']] = newcls

        return newcls


class A(object):
    __metaclass__ = AddToFormats
    fmt=None
    def __init__(self):
        self.val = 'A'


class B(A):
    fmt='magic B'
    def __init__(self):
        self.val = 'B'

Using this, we could make every Reader class created automatically get added to the format dict. But I don't 100% understand metaclasses, so maybe (probably) this is a terrible idea and we can just use a decorator.

With the Readers, there's 2 extra dicts that identify which of them can

  • act as both topology and coordinates (so if given one filename to Universe, applies Parser and Reader to it)
  • support compressed formats

So the decorator will probably end up as

@reader_for('XYZ', does_topology=True, compressed=True)
class XYZReader()

@kain88-de
Copy link
Member

I would prefer the decorator. It should be easier to grasp for Python newbies then meta classes. Another plus for me is that they can be used in the fashion @dotsdl lined out in the beginning.

@dotsdl
Copy link
Member Author

dotsdl commented Sep 13, 2015

@richardjgowers @kain88-de I like the idea that this works as a decorator (and also as a callable function), but I don't like the idea that you have to specify anything but the class you're registering itself as an argument. I think everything about a Reader, including whether it features compression handling or if it also functions as a topology reader, etc., etc., should be determined from hidden attributes in the class itself. That simplifies the registration call, and also puts the full description of the Reader/Writer in the class definition itself.

Basically, right at the top of something like the XYZReader definition you have something like:

@register
class XYZReader(base.Reader):
    _handlertype = 'reader'
    _extensions = ['xyz']
    _compressed = True
    _topology = True

    def __init__(self):
        # ....

with the hidden attributes needed and optional for Readers/Writers of any stripe documented with the register function itself. If the hidden attributes like _compressed and _topology are present and True then the Reader gets placed in the corresponding dictionaries. Likewise, this also makes it easy to define multiple possible extensions for a single Reader/Writer, or to have more complex attributes as needed.

The idea here, I think, is to make life as easy as possible for custom handler developers. I don't think there should be a zoo of registering functions they have to choose from, or a bunch of keyword arguments. I think forcing these identifiers to be in the class definition itself makes them much more explicit, and also allows you to add a slew of Readers/Writers with a single call, like:

import MDAnalysis as mda

mda.register(XYZReader, XYZWriter, MyCustomReader, MyRatherDerivativeWriter, ShameReader)

Some of this might just be my preference for API simplicity and where to put the complexity, though, since it has to go somewhere.

@richardjgowers
Copy link
Member

Ok, decorator it is. The thing I see being a headache is getting all our imports in the right order... I don't like the repetition of 'reader' as an attribute to a Reader, but maybe that's the simplest way to make it work

@kain88-de
Copy link
Member

@dotsdl I would like to have keyword arguments for the decorator.

  1. It is explicit in the decorator declaration what kind of keyword arguments it takes. This makes it easy to check them again. Either in an IDE or the interpreter. The hidden attributes will make that impossible or at least very hard.
  2. A quick look up of the decorator will give me it's default arguments.
  3. These arguments only affect the decorator so they should also be defined in the decorator and not at a separate place.
  4. Variables that are not used by the class itself shouldn't be part of the class.
  5. It will make the decorator itself easier.

I don't see your particular use case as a problem. I doubt people will register a series of Reader/Writers in a single session. In their own python modules we should encourage them to use the decorator directly in the class definition.

@richardjgowers
Copy link
Member

I think I'm leaning towards @kain88-de 's approach, but once you have arguments to the decorator, it's hard to make register work when it's not a decorator.

I'm working off the snippet below. A and B both work fine, but then to use register you have to do register('C type')(C). Which is pretty ugly. Maybe there's a way to work around this by having the wrapper as a class.......

formats = {}
topology_formats = {}

def register(fmt, topology=False):
    def wrapper(cls):
        formats[fmt] = cls
        if topology:
            topology_formats[fmt] = cls
        return cls
    return wrapper


@register('A type')
class A(object):
    def __init__(self):
        self.val = 'A'


@register('B type', topology=True)
class B(object):
    def __init__(self):
        self.val = 'B'


class C(object):
    def __init__(self):
        self.val = 'C'

@kain88-de
Copy link
Member

You can also wrap the decorator

def _register(fmt, topology=False):
    def wrapper(cls):
        formats[fmt] = cls
        if topology:
            topology_formats[fmt] = cls
        return cls
    return wrapper

def register(*args, **kwargs):
    if args:
        fn = args[0]
        args = args[1:]

        return _register(*args, **kwargs)(fn)
    else:
        return _register(*args, **kwargs)

That is what numpy is doing for it's deprecated decorator, see numpy/lib/utils.py.
It allows the decorator to be used as a decorator and as a function.

@dotsdl
Copy link
Member Author

dotsdl commented Sep 14, 2015

@richardjgowers those attributes were just an illustration; many of the ones I used there might not be necessary, or there might be a better way to include them in the class definition. I'm still of the opinion that the class definitions themselves should include all the information needed for the registration mechanism to put them where they need to go, since these pieces of information may be reused by other mechanisms we build later. At the moment, though, since there's only the one mechanism (registration), I think this just comes down to preference.

@orbeckst
Copy link
Member

Realistically speaking, this looks like something for a 0.14 milestone, doesn't it?

@orbeckst orbeckst modified the milestones: 0.13, 1.0 Jan 7, 2016
@richardjgowers richardjgowers self-assigned this Jan 13, 2016
richardjgowers added a commit that referenced this issue Jan 13, 2016
Writer and ProtoReader classes (and all subclasses) now automatically
register the class using the 'format' attribute if present.

Readers which support compressed formats (eg .bz2) can declare this by
setting the 'compressed' attribute to True

Writers which support writing multiple frames can declare this by
setting the 'multiframe' attribute to True
@richardjgowers richardjgowers modified the milestones: 0.13, 1.0 Jan 13, 2016
richardjgowers added a commit that referenced this issue Jan 13, 2016
Fixes Issue #431

TopologyReader, Writer and ProtoReader classes (and all subclasses) now
automatically register the class using the 'format' attribute if present.

Writers which support writing multiple frames can declare this by
setting the 'multiframe' attribute to True
richardjgowers added a commit that referenced this issue Jan 14, 2016
richardjgowers added a commit that referenced this issue Jan 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants