-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
style guide #404
Comments
I think last time we looked at line length, 80 was far too small, so 100/120 is probably better. I'm also happy to be very flexible about this (for readability). We looked at automated systems (autopep8, yapf) for style and they kind of work, but still require a manual pass afterwards. One option is we could add a pep8 check of some sort to travis, This would be a little annoying, but would make sure that the code doesn't gradually deteriorate after the #216 tidy up. |
@richardjgowers 100/120 is too long. But just write as long as we can, and like you said, to use autopep8 or yapf to format. Problem solved. I don't think adding pep8 check is a good idea. Which program you're going to use? I used programs to check pandas' code and there are some failed sections. |
I think using a pep8 checker is a good idea. I started using one for MDSynthesis a while back, and though I felt it was a bit too strict at first (and painful to fix all the violations), I now consider it invaluable for maintaining consistent and readable style throughout the library. I think I would enjoy diving into MDAnalysis code more if something like this was used to enforce pep8 conventions. Silly inconsistencies in style are distracting, and make it hard for others to contribute. Also, 80 characters is just fine. I don't think there's really a compelling argument for going over that, since it just makes it harder to review code, either locally or on Github. I do not consider 100 or 120 characters acceptable for readability. |
80 column limitI found that using 80 columns is the best compromise when I work on my laptop and want to have Code lintersYAPF seems to me to be better then autopep8 at a first glance, I think I have some time this week to play with the settings some more. But I guess it won't produce perfect results since code-linting in python is just really hard. I haven't defaulted to using code linters yet for my python projects but in my C++ projects I have emacs configured to run clang-format on every save (This is a huge help since I completely stopped to think about getting the formating right myself). Since not everybody is using emacs a simple solution might be an installable git-hook that applies the code-linter to every added file. I would also be against a pep8 checker on travis. I see pep8 as a guideline I should mostly follow btw great talk about the topic Syntax CheckersMost modern IDE's for python have warning about PEP8 violations enabled by default (see pycharm) and for other editors there are good plugins. For emacs I use Elpy, Python-mode is similar for vim. ImportsI'm for using common abbreviations for packages in the python-scientific stack. Most people are |
I have habit writing free style and then use |
and I found putting those lines in the top of the code makes me very distracting.
|
On 1 Sep, 2015, at 13:15, Hai Nguyen wrote:
Is it because you're not using a decent editor? ;-) (These lines are to stay because I am old fashioned and don't want to rely on automated formatting but have my emacs know what the settings are.) |
I am actually very low tech and old fashioned too, using only vim with very minimal support. this is all in my set nocompatible
filetype off
filetype plugin indent on
syntax on
set tabstop=4
set shiftwidth=4
set expandtab
set textwidth=90
syntax off
" autocomplete
imap <Tab> <C-p>
imap <C-l> <C-o>A
imap <C-o> <ESC>
imap <C-k> <C-o>lli
" go to next line in insert mode
imap <C-j> <ESC>o
" the rest of sentence after cursor to the next line
nmap <C-j> i<Enter><ESC>w
set backspace=indent,eol,start
syntax on
colorscheme delek
" adding 4 spaces in normal mode
nmap <C-k> i<space><space><space><space><esc>
" pydoc
" Usage: move cursor to python keyword in normal mode and SHIFT-K
let g:pydoc_open_cmd = 'vsplit'
let g:pydoc_highlight = 0 |
On 1 Sep, 2015, at 13:22, Hai Nguyen wrote:
I like it publicly stated and semi-enforced whenever one loads the file. |
I have another item for the list. commit messages Having good commit messages makes git blame much nicer to use and finding out the reason for changes is much faster. |
I added it to the list. On 5 Sep, 2015, at 12:00, kain88-de wrote:
Oliver Beckstein * [email protected] |
Oh what about the doc-strings? Can we switch to the numpy format for function parameters and return values? It will involve a little work to get it working with sphinx but then the docstrings are just much better readable IMHO. |
http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_numpy.html
@kain88-de Is this what you mean? Looks a little easier to write than standard Sphinx |
Yes this is exactly what I mean. I use this standard for my private libraries. It also gets rendered nicely as a docstring in Ipython and the interpreter in general |
Short summary of the discussion so far Line-Lengthpro 80 columns are @hainm , @dotsdl and me only @richardjgowers would like to have longer line.
Code checkerI would suggest flake8 since it runs all common style checkers at once. The check can then be done quickly with
But I wouldn't add this to travis right now. Currently I get about 6000 error messages. To stay sane this can only be fixed with some automated code linter like yapf, autopep8. CapitalizationI'm not sure what you mean by that @orbeckst. PEP8 has some good guidelines on that. Add CONTRIBUTING.rstThis will be picked up by github and shown at the top of a new Issue/PR. There we can either include the styleguide or just point to the wiki. I would prefer to have the style-guide there and with Numpy Style DocsI'll see what we have to check to get numpy style docs i addition to the ones we already have and if they can coexist. |
@kain88-de : captitalization was indeed meant to refer to the PEP8: Naming Conventions. I added it because the list did not contain a bullet point "wholesale gospel-style adoption of PEP8", and because I know that it is one of @dotsdl 's pet-peeves that MDAnalysis is still imported as |
And I like the idea of having
We can write the wiki in reST so this can be the same file. |
It should be possible to link to the CONTRIBUTING.md in the wiki. Then this should be easily done. |
I don't care about the |
If the name bothers @dotsdl maybe we could compromise, we could use mDaNaLySiS, I think it's quite distinctive. More seriously, I don't mind going to 80 char line length, and I've come round to thinking a flake8 check on travis would be a mistake. Adding a link to autopep8/yapf as recommended tools is probably enough. @kain88-de I think with the numpy style docs, we'd have to rewrite all existing doc strings in this format. I'm assuming we'd have to tell sphinx (the doc maker thing) that we're using style X, it's probably not smart enough to handle two styles at once? |
I agree with @richardjgowers about this point. Even with the big projects like pandas or sklearn, there are still some exceptions in pep8. So it's better to recommend in webpage rather checking on travis. |
It should. Napoleon is just a preprocessor. It converts the numpy-docstring into rst before sphinx actually does any work. I'm just fighting a little bit with sphinx to see my changes (but that is rather the fault of our sphinx setup and me not knowing sphinx at all then that of napoleon) |
Hmm yeah in which case I like the idea of moving to numpy and using napolean, provided we can gradually migrate. I think the final point is how we write tests. My pet hate is how the coordinate reader tests are all unique classes, I much prefer the style: class _TestCoordinateReader(object):
# define stuff EVERY reader should do
class TestPDBReader(_TestCoordinateReader):
# define unique to PDB stuff Or if the tests are simpler, using a test generator to work through different variations like here |
Sure tests like that seem quite sensible. I find this particular test incredible hard to understand (with the two yield statements). Generally refactoring the tests sounds like a good idea. |
Re: tests with yield (and in particular the one for hydrogen bond analysis selections): I admit that it is not the clearest presentation but it does automatically produce all the tests that we needed. Generally, I am a huge fan of test generators. Some of the hydrogen bonding tests are undoubtedly of questionable quality or usefulness (see comments in file) and it would be worthwhile to extend them. In fact, that goes for many tests in the analysis module.
If the tests are bad (i.e. not actually testing something useful, giving false sense of security), then I fully agree with you. If it's just "looks fugly" then I am looking much more at the opportunity cost of someone bright and productive refactoring code that already does exactly what it is supposed to. Ultimately, it's up to every individual how they want to contribute but if in doubt, instead of beautifying tests, I'd rather increase coverage by writing new tests for code that is not well tested yet. |
I had one of these fail when I refactored the KDTree code. I would say of myself that I'm proficient with python. But from the backtraces these failing test-generators gave me I couldn't figure out what the problem was or where it could be. But you are right it is better to have more tests then just make them look more beautiful. With the refactoring I was thinking that some tests could be easier to understand, I myself don't fell very secure about tests that I don't understand and consequently don't know what is being tested. The comment of @richardjgowers is also very good to have a defined way to test common functionality of child-classes. |
Ok I think that's everything now. Now we just need to populate the style guide page with all of these definitions and links etc |
So now we're writing more Cython and doing coverage on it (#443) we could look at where we store the code. Currently it's in |
Currently, the DCD reader is C and the Gromacs libxdr2 is C. Do other projects bundle the cython code with the python code, too? If this is standard then I have no problem with that (especially if it makes coverage testing #443 easier). |
Should we add guide lines on re-writing history, such as squashing commits for PRs to reduce noise in the history (along the lines of "one commit, one working feature or self-contained fix")? Or does this create high hurdles for contributions? Or perhaps it is enough to add it as suggestions. |
Source file location👍 for having the cython source code in the same folder where the actual module ( This seems to be standart numpy, scipy, scikit-learn. mdtraj Rewriting HistoryI personally like rewriting history. I do that a lot on my personal branches. See for example #441 during development I had up to 30+ commits for experimentation. (commit early and often). After everything worked I restructured them with For new comers I wouldn't recommend that. Rebasing without a good working knowledge of git is dangerous. I currently teach colleagues in my group git. I can see first hand again that it is not easy to learn and people unused to vcs have trouble writing good and sensible commit messages. I'd suggest making this up to the author and mentioning it exists and can help us reviewing PR's. We can later politely ask or edit the history our self before we merge. Second test packageWhile we are at the topic of moving files around. Would it be possible to move the tests back into the package tree and kill the secondary package? Having tests in dedicated sub-folders of the modules called To enable users to download test-data we could use an approach similar to seaborn. People could load the data on-demand with I count this as a nice to have change and thought I could throw it in while we are discussing altering the directory layout. |
I think the idea for separate tests was to try and reduce the size of the main package. MDAnalysis is 8mb, the tests are 30mb. I think part of the reason MDAnalysis is so big (numpy is 3mb for reference) is because it comes with all the docs prebuilt. I think we've got something like your seaborn example: from MDAnalysis.tests.datafiles import PDB, XTC
u = mda.Universe(PDB, XTC) This requires the Tests package be installed Yeah we should probably squash more (where it's simple). I guess I've been lazy with that. |
Test data and separate test packageI like having test data and tests together, instead of relying on an internet connection to pull in data that might or might not be out of date (admittedly, I don't know how seaborn does versioning of test data --- do they pull it out of a git repo blob?). Once you bundle test code and data, I don't see any other option but to make it two separate packages. I also like that we use some real MD trajectories for the tests; we could probably shrink test data a little bit but not down to 1/10, so again it makes it difficult to bundle the data with the library package. |
DocsRe: bundled docs. We can certainly discuss not including the docs in the package; I don't know how many people actually use the local docs instead of the online ones. |
Source file locationI'd be happy to put |
Finally, @kain88-de 's comment on rewriting history sounds pretty sensible to me. In particular, forcing every contributor to rebase nicely might be too much. We could use More important: good guidelines on what a good commit message should contain |
Test Data and separate package
Inside of the git repository the test and test-data would still be there. So as a developer running the test-suite I wouldn't have to download any files from the Internet. The thing that changes is that we don't over the tests as a package for users anymore. We over that users can download individual files as for the example scripts that we have in the documentation. PDB, XTC = mda.load_datasets(['PDB', 'XTC'])
u = mda.Universe(PDB, XTC)
# show how a particular function is supposed to be used So an internet connection would be required to run the examples from a user-perspective. What we do loose with such an approach is that I could verify a package download from PyPi by running the test-suite in the MDAnalysisTests package. None of the other packages I use/know bothers with that, it's just a decision we have to make if we want users to be able to verify somehow that the package they installed works as intended. Git commit MessagesActually I think the blog-post from Tim-Pope I linked earlier is a pretty good guideline of what a good commit message is. As I'm teaching colleagues git now I noticed that I can preach to them as much as I want how they should write commit messages it works best if I show them what I mean for a commit of their own. It takes time because I have to do it for each one individually but it is the only thing I found that works. Maybe we could write that a Lab-Book is a good analogy to the git-log. |
I updated the Style Guide on the wiki. I think it now contains everything discussed here but perhaps someone else (such as the usual suspects @richardjgowers , @kain88-de , @tylerjereddy ) can do a quick read-through and if need be, add missing parts. Once this issue is closed, we'll declare the Style Guide to be official. I'll send an email to the dev list that the Style Guide is now the law of the land, and from then on code reviews can fully reference it to ask for amendments in line with the Style Guide. |
Re: @kain88-de 's comment on adding tests back into the package: For right now the Style Guide says that we have the split between package and testsuite; this discussion is really a separate issue (so by all means, open one) and has not been decided so for right now, the status-quo prevails. |
Maybe add that we should use the six module for ranges and zip in new code like we do in the tests. So new code would already be Python 3 compatible. We have to add it later anyway so we might just do it now |
On 22 Oct, 2015, at 13:59, kain88-de wrote:
Good idea, can you add an entry + example, please? Oliver Beckstein * [email protected] |
I'll add the part about python 2 and 3 compatibility. Besides of that things look good to me. |
On 22 Oct, 2015, at 14:20, kain88-de wrote:
I vaguely recalled that there was a discussion but didn't find a conclusion. You were one of the proponents of 80 chars; I really have no strong feelings either way and it would be fine with me to diverge from PEP8 in that respect. |
I'm also ok with either one. I just have to edit the emacs mode-line to a fill-column of 79 so that emacs does the right thing in MDAnalysis. |
Oh and the thing with the test. I think we can reevaluate that once we have restructured the tests, until then it's fine right now. |
Let's settle on the canonical 79 chars (if nothing else, removes one paragraph from the style guide because we can say "like PEP8"). |
Style Guide is up and running and from henceforth the law of the land. (Need to add the SG as a file CONTRIBUTING.rst or HACKING.rst to the repo... but will raise a separate issue.) |
Use contributing.rst that will automatically be picked up by github and shown on PRs and issue reports |
On 23 Oct, 2015, at 23:11, kain88-de wrote:
Thanks, I created #508. |
With an active and diverse developer community we need to spell out more clearly what we want the code to look like, i.e. we need a Style Guide.
Related questions have come up recently in #392 and #393 but there are also the recurring discussions about length of lines etc., in particular see #216 !
A very basic guideline consists of the editor settings:
Generally, PEP 8 is a good starting point (see #216 )
Either edit this post or add a comment and someone will add an item to the list below.
Questions to settle
Code
8079 chars)import numpy as np
import numpy as np #392)add style checker to travis(nope, but encourage style helpers)Testing
Documentation
The text was updated successfully, but these errors were encountered: