Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style guide #404

Closed
11 tasks done
orbeckst opened this issue Aug 28, 2015 · 49 comments
Closed
11 tasks done

style guide #404

orbeckst opened this issue Aug 28, 2015 · 49 comments

Comments

@orbeckst
Copy link
Member

With an active and diverse developer community we need to spell out more clearly what we want the code to look like, i.e. we need a Style Guide.

Related questions have come up recently in #392 and #393 but there are also the recurring discussions about length of lines etc., in particular see #216 !

A very basic guideline consists of the editor settings:

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 fileencoding=utf-8
  • only use spaces for indentation
  • one level of indentation equals 4 spaces

Generally, PEP 8 is a good starting point (see #216 )

Either edit this post or add a comment and someone will add an item to the list below.

Questions to settle

Code

  • line length (80 79 chars)
  • indentation (spaces, 4 per level)
  • imports (absolute, aliasing eg import numpy as np import numpy as np #392)
  • capitalization (underscore_methods, CapitalClasses, etc as per pep8)
  • camelCase or underscores (underscores, see goodBye camelCase #389)
  • add style checker to travis (nope, but encourage style helpers)
  • Where does Cython code live Source file locations #444 ?

Testing

Documentation

@richardjgowers
Copy link
Member

I think last time we looked at line length, 80 was far too small, so 100/120 is probably better. I'm also happy to be very flexible about this (for readability).

We looked at automated systems (autopep8, yapf) for style and they kind of work, but still require a manual pass afterwards.

One option is we could add a pep8 check of some sort to travis, This would be a little annoying, but would make sure that the code doesn't gradually deteriorate after the #216 tidy up.

@hainm
Copy link
Contributor

hainm commented Aug 29, 2015

@richardjgowers 100/120 is too long. But just write as long as we can, and like you said, to use autopep8 or yapf to format. Problem solved.

I don't think adding pep8 check is a good idea. Which program you're going to use? I used programs to check pandas' code and there are some failed sections.

@dotsdl
Copy link
Member

dotsdl commented Aug 29, 2015

I think using a pep8 checker is a good idea. I started using one for MDSynthesis a while back, and though I felt it was a bit too strict at first (and painful to fix all the violations), I now consider it invaluable for maintaining consistent and readable style throughout the library. I think I would enjoy diving into MDAnalysis code more if something like this was used to enforce pep8 conventions. Silly inconsistencies in style are distracting, and make it hard for others to contribute.

Also, 80 characters is just fine. I don't think there's really a compelling argument for going over that, since it just makes it harder to review code, either locally or on Github. I do not consider 100 or 120 characters acceptable for readability.

@kain88-de
Copy link
Member

80 column limit

I found that using 80 columns is the best compromise when I work on my laptop and want to have
two documents side by side (14" with FullHD resolution). I think the maximal line length I can then watch without linewrapping is something between 90 and 100. This is also true for people using tiling window managers (or editors emulating them like emacs/vim). In my personal projects I use 80 characters and above that limit when it helps readability. (As a side effect I use short descriptive names and avoid deep nesting of loops by moving things into reusable functions)

Code linters

YAPF seems to me to be better then autopep8 at a first glance, I think I have some time this week to play with the settings some more. But I guess it won't produce perfect results since code-linting in python is just really hard. I haven't defaulted to using code linters yet for my python projects but in my C++ projects I have emacs configured to run clang-format on every save (This is a huge help since I completely stopped to think about getting the formating right myself). Since not everybody is using emacs a simple solution might be an installable git-hook that applies the code-linter to every added file.

I would also be against a pep8 checker on travis. I see pep8 as a guideline I should mostly follow
but sometimes it just isn't a good fit for the code.

btw great talk about the topic
https://www.youtube.com/watch?v=wf-BqAjZb8M

Syntax Checkers

Most modern IDE's for python have warning about PEP8 violations enabled by default (see pycharm) and for other editors there are good plugins. For emacs I use Elpy, Python-mode is similar for vim.

Imports

I'm for using common abbreviations for packages in the python-scientific stack. Most people are
used to them and writing them out is more confusing (at least for me).

@hainm
Copy link
Contributor

hainm commented Sep 1, 2015

I have habit writing free style and then use yapf to reformat. yapf -i my_file.py. :D

@hainm
Copy link
Contributor

hainm commented Sep 1, 2015

and I found putting those lines in the top of the code makes me very distracting.

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 fileencoding=utf-8
``

@orbeckst
Copy link
Member Author

orbeckst commented Sep 1, 2015

On 1 Sep, 2015, at 13:15, Hai Nguyen wrote:

and I found putting those lines in the top of the code makes me very distracting.

-- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 --

vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 fileencoding=utf-8

Is it because you're not using a decent editor? ;-)

(These lines are to stay because I am old fashioned and don't want to rely on automated formatting but have my emacs know what the settings are.)

@hainm
Copy link
Contributor

hainm commented Sep 1, 2015

@orbeckst

I am actually very low tech and old fashioned too, using only vim with very minimal support.
But I bet we can update the default in .vimrc (or .emacs??) too.

this is all in my .vimrc :D

set nocompatible
filetype off

filetype plugin indent on
syntax on

set tabstop=4
set shiftwidth=4
set expandtab
set textwidth=90
syntax off

" autocomplete
imap <Tab> <C-p>
imap <C-l> <C-o>A
imap <C-o> <ESC>
imap <C-k> <C-o>lli

" go to next line in insert mode
imap <C-j> <ESC>o
" the rest of sentence after cursor to the next line
nmap <C-j> i<Enter><ESC>w

set backspace=indent,eol,start

syntax on
colorscheme delek

" adding 4 spaces in normal mode
nmap <C-k> i<space><space><space><space><esc>

" pydoc
" Usage: move cursor to python keyword in normal mode and SHIFT-K
let g:pydoc_open_cmd = 'vsplit'
let g:pydoc_highlight = 0

@orbeckst
Copy link
Member Author

orbeckst commented Sep 1, 2015

On 1 Sep, 2015, at 13:22, Hai Nguyen wrote:

But I bet we can update the default in .vimrc (or .emacs??) too.

I like it publicly stated and semi-enforced whenever one loads the file.

@kain88-de
Copy link
Member

I have another item for the list. commit messages

Having good commit messages makes git blame much nicer to use and finding out the reason for changes is much faster.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 5, 2015

I added it to the list.

On 5 Sep, 2015, at 12:00, kain88-de wrote:

I have a mother item for the list. commit messages

Having good commit messages makes git blame much nicer to use and finding out the reason for changes is much faster.

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

@kain88-de
Copy link
Member

Oh what about the doc-strings? Can we switch to the numpy format for function parameters and return values? It will involve a little work to get it working with sphinx but then the docstrings are just much better readable IMHO.

@richardjgowers
Copy link
Member

http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_numpy.html

        Parameters
        ----------
        param1
            The first parameter.
        param2
            The second parameter.

        Returns
        -------
        bool
            True if successful, False otherwise.

@kain88-de Is this what you mean? Looks a little easier to write than standard Sphinx

@kain88-de
Copy link
Member

Yes this is exactly what I mean. I use this standard for my private libraries. It also gets rendered nicely as a docstring in Ipython and the interpreter in general

@kain88-de
Copy link
Member

Short summary of the discussion so far

Line-Length

pro 80 columns are @hainm , @dotsdl and me only @richardjgowers would like to have longer line.
The arguments pro 80 Column arguments were

  • github PR uses 80 columns
  • small screens and tiling

Code checker

I would suggest flake8 since it runs all common style checkers at once. The check can then be done quickly with

flake8 package/MDAnalysis
flake8 testsuite/MDAnalysisTests

But I wouldn't add this to travis right now. Currently I get about 6000 error messages. To stay sane this can only be fixed with some automated code linter like yapf, autopep8.

Capitalization

I'm not sure what you mean by that @orbeckst. PEP8 has some good guidelines on that.

Add CONTRIBUTING.rst

This will be picked up by github and shown at the top of a new Issue/PR. There we can either include the styleguide or just point to the wiki. I would prefer to have the style-guide there and with
the code.

Numpy Style Docs

I'll see what we have to check to get numpy style docs i addition to the ones we already have and if they can coexist.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 8, 2015

@kain88-de : captitalization was indeed meant to refer to the PEP8: Naming Conventions. I added it because the list did not contain a bullet point "wholesale gospel-style adoption of PEP8", and because I know that it is one of @dotsdl 's pet-peeves that MDAnalysis is still imported as import MDAnalysis (which I like for branding reasons). I support the PEP 8 naming conventions with the exception of the strict stipulation that packages have to be all lower case.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 8, 2015

And I like the idea of having

  • style guide on the wiki
  • style guide as CONTRIBUTING.rst at the top level in the source code.

We can write the wiki in reST so this can be the same file.

@kain88-de
Copy link
Member

It should be possible to link to the CONTRIBUTING.md in the wiki. Then this should be easily done.

@kain88-de
Copy link
Member

I don't care about the import MDAnalysis that much. I always use import MDAnalysis as mda anyway because the full name is way to long. Plus my IPython is configured to load MDAnalysis on every start by itself. Btw IPython packages it's own packages under the IPython namespace so we wouldn't be the only onces with capital letters in the package name.

@richardjgowers
Copy link
Member

If the name bothers @dotsdl maybe we could compromise, we could use mDaNaLySiS, I think it's quite distinctive.

More seriously, I don't mind going to 80 char line length, and I've come round to thinking a flake8 check on travis would be a mistake. Adding a link to autopep8/yapf as recommended tools is probably enough.

@kain88-de I think with the numpy style docs, we'd have to rewrite all existing doc strings in this format. I'm assuming we'd have to tell sphinx (the doc maker thing) that we're using style X, it's probably not smart enough to handle two styles at once?

@hainm
Copy link
Contributor

hainm commented Sep 8, 2015

I've come round to thinking a flake8 check on travis would be a mistake. Adding a link to autopep8/yapf as recommended tools is probably enough.

I agree with @richardjgowers about this point. Even with the big projects like pandas or sklearn, there are still some exceptions in pep8. So it's better to recommend in webpage rather checking on travis.

@kain88-de
Copy link
Member

it's probably not smart enough to handle two styles at once

It should. Napoleon is just a preprocessor. It converts the numpy-docstring into rst before sphinx actually does any work. I'm just fighting a little bit with sphinx to see my changes (but that is rather the fault of our sphinx setup and me not knowing sphinx at all then that of napoleon)

@richardjgowers
Copy link
Member

Hmm yeah in which case I like the idea of moving to numpy and using napolean, provided we can gradually migrate.

I think the final point is how we write tests. My pet hate is how the coordinate reader tests are all unique classes, I much prefer the style:

class _TestCoordinateReader(object):
    # define stuff EVERY reader should do

class TestPDBReader(_TestCoordinateReader):
    # define unique to PDB stuff

Or if the tests are simpler, using a test generator to work through different variations like here

@kain88-de
Copy link
Member

Sure tests like that seem quite sensible. I find this particular test incredible hard to understand (with the two yield statements).

Generally refactoring the tests sounds like a good idea.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 9, 2015

Re: tests with yield (and in particular the one for hydrogen bond analysis selections): I admit that it is not the clearest presentation but it does automatically produce all the tests that we needed. Generally, I am a huge fan of test generators.

Some of the hydrogen bonding tests are undoubtedly of questionable quality or usefulness (see comments in file) and it would be worthwhile to extend them. In fact, that goes for many tests in the analysis module.

Generally refactoring the tests sounds like a good idea.

If the tests are bad (i.e. not actually testing something useful, giving false sense of security), then I fully agree with you. If it's just "looks fugly" then I am looking much more at the opportunity cost of someone bright and productive refactoring code that already does exactly what it is supposed to. Ultimately, it's up to every individual how they want to contribute but if in doubt, instead of beautifying tests, I'd rather increase coverage by writing new tests for code that is not well tested yet.

@kain88-de
Copy link
Member

Generally, I am a huge fan of test generators

I had one of these fail when I refactored the KDTree code. I would say of myself that I'm proficient with python. But from the backtraces these failing test-generators gave me I couldn't figure out what the problem was or where it could be.

But you are right it is better to have more tests then just make them look more beautiful. With the refactoring I was thinking that some tests could be easier to understand, I myself don't fell very secure about tests that I don't understand and consequently don't know what is being tested. The comment of @richardjgowers is also very good to have a defined way to test common functionality of child-classes.

@richardjgowers
Copy link
Member

Ok I think that's everything now. Now we just need to populate the style guide page with all of these definitions and links etc

@richardjgowers
Copy link
Member

So now we're writing more Cython and doing coverage on it (#443) we could look at where we store the code. Currently it's in package/src/, but that's confusing to me, as it means that submodules seem to appear out of nowhere (when created by setup.py). I'd rather put the Cython modules where they live in the installed package. Another point is that Cython is meant to be python-like, so including it alongside shouldn't mean we have 2 languages. So by the same logic, I'd keep our C/Fortran (do we have any?) separate in src

@orbeckst
Copy link
Member Author

Currently, the DCD reader is C and the Gromacs libxdr2 is C.
IIRC, the DCD timeseries code is cython but still uses the C includes (which are not just definitions but contain actual code, something like a library built from .h files).

Do other projects bundle the cython code with the python code, too? If this is standard then I have no problem with that (especially if it makes coverage testing #443 easier).

@orbeckst
Copy link
Member Author

Should we add guide lines on re-writing history, such as squashing commits for PRs to reduce noise in the history (along the lines of "one commit, one working feature or self-contained fix")? Or does this create high hurdles for contributions? Or perhaps it is enough to add it as suggestions.

@kain88-de
Copy link
Member

Source file location

👍 for having the cython source code in the same folder where the actual module (*.so-file) will be.
Of course this means that the cython generated C-files should also live there. As a consequence I would personally like to completely scratch package/src and have all source-code for a module reside in the same folder as the module independent of the language, (IMHO best to have src subfolders there)

This seems to be standart numpy, scipy, scikit-learn. mdtraj

Rewriting History

I personally like rewriting history. I do that a lot on my personal branches. See for example #441 during development I had up to 30+ commits for experimentation. (commit early and often). After everything worked I restructured them with git rebase -i into 12 commit that are self contained and help others understand what happened later. I think the reason I like this and CAN to this is because I have used git for years and I'm also a huge fan of git-blame and tig.

For new comers I wouldn't recommend that. Rebasing without a good working knowledge of git is dangerous. I currently teach colleagues in my group git. I can see first hand again that it is not easy to learn and people unused to vcs have trouble writing good and sensible commit messages.

I'd suggest making this up to the author and mentioning it exists and can help us reviewing PR's. We can later politely ask or edit the history our self before we merge.

Second test package

While we are at the topic of moving files around. Would it be possible to move the tests back into the package tree and kill the secondary package? Having tests in dedicated sub-folders of the modules called test is afaik standard for most python-packages. (I very very rarely see a new core-package without that layout). This would also make testing the library easier. I wouldn't have to install it explicitly in my virtual environment. Sometimes I'm in the test folder and forgot to activate the virtual-environment and then the tests fail because they are run against the current stable install.

To enable users to download test-data we could use an approach similar to seaborn. People could load the data on-demand with MDAnalysis.load_dataset(['xtc','pdb'], cache=True).

I count this as a nice to have change and thought I could throw it in while we are discussing altering the directory layout.

@richardjgowers
Copy link
Member

I think the idea for separate tests was to try and reduce the size of the main package. MDAnalysis is 8mb, the tests are 30mb.

I think part of the reason MDAnalysis is so big (numpy is 3mb for reference) is because it comes with all the docs prebuilt.

I think we've got something like your seaborn example:

from MDAnalysis.tests.datafiles import PDB, XTC
u = mda.Universe(PDB, XTC)

This requires the Tests package be installed

Yeah we should probably squash more (where it's simple). I guess I've been lazy with that.

@orbeckst
Copy link
Member Author

Test data and separate test package

I like having test data and tests together, instead of relying on an internet connection to pull in data that might or might not be out of date (admittedly, I don't know how seaborn does versioning of test data --- do they pull it out of a git repo blob?). Once you bundle test code and data, I don't see any other option but to make it two separate packages.

I also like that we use some real MD trajectories for the tests; we could probably shrink test data a little bit but not down to 1/10, so again it makes it difficult to bundle the data with the library package.

@orbeckst
Copy link
Member Author

Docs

Re: bundled docs. We can certainly discuss not including the docs in the package; I don't know how many people actually use the local docs instead of the online ones.

@orbeckst
Copy link
Member Author

Source file location

I'd be happy to put src folders in the place where the so library files should appear. @kain88-de , open a new issue for changing the location of compiled files.

@orbeckst
Copy link
Member Author

Finally, @kain88-de 's comment on rewriting history sounds pretty sensible to me. In particular, forcing every contributor to rebase nicely might be too much. We could use master as a branch with a wonderfully clean history that we rewrite ourselves but right now that seems more effort than it's worth.

More important: good guidelines on what a good commit message should contain

@kain88-de
Copy link
Member

Test Data and separate package

I like having test data and tests together, instead of relying on an internet connection to pull in data that might or might not be out of date

Inside of the git repository the test and test-data would still be there. So as a developer running the test-suite I wouldn't have to download any files from the Internet. The thing that changes is that we don't over the tests as a package for users anymore. We over that users can download individual files as for the example scripts that we have in the documentation.

PDB, XTC = mda.load_datasets(['PDB', 'XTC'])
u = mda.Universe(PDB, XTC)
# show how a particular function is supposed to be used

So an internet connection would be required to run the examples from a user-perspective. What we do loose with such an approach is that I could verify a package download from PyPi by running the test-suite in the MDAnalysisTests package. None of the other packages I use/know bothers with that, it's just a decision we have to make if we want users to be able to verify somehow that the package they installed works as intended.

Git commit Messages

Actually I think the blog-post from Tim-Pope I linked earlier is a pretty good guideline of what a good commit message is. As I'm teaching colleagues git now I noticed that I can preach to them as much as I want how they should write commit messages it works best if I show them what I mean for a commit of their own. It takes time because I have to do it for each one individually but it is the only thing I found that works. Maybe we could write that a Lab-Book is a good analogy to the git-log.

@orbeckst orbeckst self-assigned this Oct 22, 2015
@orbeckst
Copy link
Member Author

I updated the Style Guide on the wiki. I think it now contains everything discussed here but perhaps someone else (such as the usual suspects @richardjgowers , @kain88-de , @tylerjereddy ) can do a quick read-through and if need be, add missing parts.

Once this issue is closed, we'll declare the Style Guide to be official. I'll send an email to the dev list that the Style Guide is now the law of the land, and from then on code reviews can fully reference it to ask for amendments in line with the Style Guide.

@orbeckst
Copy link
Member Author

Re: @kain88-de 's comment on adding tests back into the package: For right now the Style Guide says that we have the split between package and testsuite; this discussion is really a separate issue (so by all means, open one) and has not been decided so for right now, the status-quo prevails.

@kain88-de
Copy link
Member

Maybe add that we should use the six module for ranges and zip in new code like we do in the tests. So new code would already be Python 3 compatible. We have to add it later anyway so we might just do it now

@orbeckst
Copy link
Member Author

On 22 Oct, 2015, at 13:59, kain88-de wrote:

Maybe add that we should use the six module for ranges and zip in new code like we do in the tests. So new code would already be Python 3 compatible. We have to add it later anyway so we might just do it now

Good idea, can you add an entry + example, please?

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

@kain88-de
Copy link
Member

I'll add the part about python 2 and 3 compatibility.
Otherwise there is the thing about 79 or 80 characters limit left.

Besides of that things look good to me.

@orbeckst
Copy link
Member Author

On 22 Oct, 2015, at 14:20, kain88-de wrote:

Otherwise there is the thing about 79 or 80 characters limit left.

I vaguely recalled that there was a discussion but didn't find a conclusion. You were one of the proponents of 80 chars; I really have no strong feelings either way and it would be fine with me to diverge from PEP8 in that respect.

@kain88-de
Copy link
Member

I'm also ok with either one. I just have to edit the emacs mode-line to a fill-column of 79 so that emacs does the right thing in MDAnalysis.

@kain88-de
Copy link
Member

Oh and the thing with the test. I think we can reevaluate that once we have restructured the tests, until then it's fine right now.

@orbeckst
Copy link
Member Author

Let's settle on the canonical 79 chars (if nothing else, removes one paragraph from the style guide because we can say "like PEP8").

@orbeckst
Copy link
Member Author

Style Guide is up and running and from henceforth the law of the land.

(Need to add the SG as a file CONTRIBUTING.rst or HACKING.rst to the repo... but will raise a separate issue.)

@kain88-de
Copy link
Member

Use contributing.rst that will automatically be picked up by github and shown on PRs and issue reports

@orbeckst
Copy link
Member Author

On 23 Oct, 2015, at 23:11, kain88-de wrote:

Use contributing.rst that will automatically be picked up by github and shown on PRs and issue reports

Thanks, I created #508.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants