-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add: package versionioning, pypi / conda publishing & code format pre-commit pages (Reviews Welcome!) #27
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwasser most of my review boxes here as just suggestions. I'll read this more carefully later next week.
* bioconda | ||
|
||
|
||
<!-- ASK FILIPE about GDAL --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I literally carry a badge at scipy written "don't ask me about gdal" ;-p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 @ocefpaf my question HAHAHA since you said i should ask you about it :0 is really just - is it easier to manage spatial packages with conda because gdal can live in a conda / conda-forge repository given it's written in another language? vs pip which means somehow gdal needs to be installed on a machine already and the code needs to access it? i can ask you on slack too .. i will now laught about this for the rest of today?
really i just wanted to make to try to make sense of why spatial libraries are just so much easier to install with conda and they always have issues with pip (in my experience) . we don't have to talk about gdal specifically LOL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdal get a bad rap but the geoespacial stack has many c/c++ packages that makes it hard to build consistent wheels for. Here are a few non-Python packages and at least two Python packages that depend on them
- proj: pyproj, cartopy
- geos: cartopy, shapely
- gdal: fiona, rasterio
Now, let's talk gdal and qgis, which are not necessarily Python but used by many Pythonistas. The former has Python bindings and the latter can use Python to automate stuff.
- gdal: see the numeber of c/c++ deps in https://github.com/conda-forge/gdal-feedstock/blob/main/recipe/meta.yaml#L26-L67
Sure, one can build a stripped out version of gdal with less support for formats but then users will be wondering why their gdal doesn't read/write format X
.
With conda, or any other complete package manager, all these non-Python dependencies are build consistently with each other, no ABI incompatibilities.
For wheels, one approach to avoid ABI incompatibilities, would be to have a single package "own" the c-lib and that can be done. See SciTools/cartopy#805 for a proposal on how to solve shapely/cartopy problem. However, that requires a huge effort and coordination among different projects and developers. It also remove the project freedom to evolve without depending on what the owner of the c-lib wrapper decides to do. Say cartopy needs a newer geos but shapely is only built with an older version. Now cartopy's development is lagging behind b/c of the decision to move all of its geos wrapper to another lib (this is hypothetical BTW, the coordination to move pygeos->shapely 2.0 and cartopy is going smoothly).
Another difference is the actual management of the c/c++ libraries versions. With wheels, one will get whatever was used to build it. With a complete package manager one can pin the c/c++ lib to a certain version (or range). Say, I need fiona built with gdal 3.6.2. You would either to build it yourself or use the conda version. BTW, do you even know what version is used to build the fiona wheel ? That is not very transparent and require one to dig a bit into the wheel building or docs. For a complete package manager one needs only to list the installed packages.
QGIS... Well, it needs a consistent Python distribution and, installing your own that is compatible with what was used to build QGIS is a nightmare. Why? B/c QGIS depends on gdal
and another ton of c/c++ libs.
I'm probably rambling by now so I'll stop ;-p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is really just - is it easier to manage spatial packages with conda because gdal can live in a conda / conda-forge repository given it's written in another language? vs pip which means somehow gdal needs to be installed on a machine already and the code needs to access it?
Uwe Korn wrote a great blog post on the complexities of building Python packages which depend on multiple compiled dependencies. Well worth a read:
I think the spatial analytics stack is one of those "python" analytics stacks that are incredibly difficult to build Python wheels for which Just Work. The other analytics stacks where conda
is far easier to support and more likely to work are IME the pyarrow
stack and anything GPU related.
@all-contributors please add @ocefpaf for code and design |
I've put up a pull request to add @ocefpaf! 🎉 |
|
Generally those steps are: | ||
|
||
1. Fork the staged recipes conda-forge GitHub repository | ||
1. Create a new recipe using the `grayskull` tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there a tutorial on doing this - otherwise we should create one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just point to the docs above. No need to duplicate them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent!! will add the link - thank you both!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that link is already there. but i linked to the section about generating a recipe on that page.
so is there a todo here? or is this good enough with just a link that is there?
|
||
<!--todo: add as resource https://docs.conda.io/projects/conda/en/latest/glossary.html --> | ||
|
||
pyOpenSci requires that your package has an distribution that can be installed from a public community repository such as PyPI or a conda channel such as `bioconda` or `conda-forge` in the Anaconda cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, this is tricky and maybe you should ignore me on this one. However, Anaconda cloud is one option and, like PyPI, folks can mirror the conda channels elsewhere. Up until recently these mirrors were either private or not a good option. However, more and more we are seeing efforts to de-centralize the channels from one cloud.
TL;DR maybe just don't mention Anaconda cloud. I would not go into the multiple options b/c, like for PyPI, those who know they exist, don't need explanation. Those who don't, are happy with the defaults in their installation.
pyOpenSci requires that your package has an distribution that can be installed from a public community repository such as PyPI or a conda channel such as `bioconda` or `conda-forge` in the Anaconda cloud. | |
pyOpenSci requires that your package has an distribution that can be installed from a public community repository such as PyPI or a conda channel such as `bioconda` or `conda-forge` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent. this makes sense. i didn't know that you could mirror channels elsewhere. how does that even work if you are a conda-forge maintainer but there is a mirror somewhere else? how do the channels stay in sync?
your package. | ||
``` | ||
|
||
### What is Anaconda Cloud and Conda? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide on removing the Anaconda cloud explanation this page will be simplified and we can remove lots of text.
With that said, maybe here it is worth mentioning the conda and its variants: mamba and micromamba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this is a little concerning to me, removing Anaconda from a document describing conda is going to make this more confusing to users. As of right now, there is no alternative cloud hoster for conda packages, and it would be better to elaborate about that than just not mention that at all.
Mamba and micromamba fetch files and package metadata from Anaconda Cloud, even if there are attempts to migrate elsewhere via the OCI mirror system the mamba maintainers are working on and that the conda maintainers plan to adopt as well. But these options are simply not there for the regular end user at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i did ask a question about about the mirrors as i was confused by that. i never knew mirrors were an option. IF you are saying they are not an option right now, i might prefer to leave them because i am also getting a bit confused thinking about mirrors of conda-forge and what that means for me as a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i could potentially still remove anaconda cloud as suggested above simply because it's probably too much information for a user anyway. i think mentioning it once might be good enough? and this idea that there is an anconda cloud that hosts a bunch of channels. the default which users can't add to on their own. and then conda-forge which is preferred and can be added to (and was created for scientist to upload their packages to given anaconda oversees what ends up in the default cloud channel)
pls let me know if my simple description seems accurate or way off...
And more importantly you are likely wondering how to pick the right | ||
repository to publish your Python package. | ||
|
||
The answer to both questions relates dependency conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this. The truth is that this is more culture/use case related then conflicts. If all my work is in pure Python land there is no reason to use conda. If I need to mix languages, manage the non-python dependency versions to my python wrappers, then I need conda.
TL;DR I would not explain all that to avoid confusion but I would use: Try to publish on both, if you cannot publish on the one that will reach most of your users.
The answer to both questions relates dependency conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur, Python packages by definition should be published to PyPI by default, and only in a second step prepared for distribution via Anaconda Distribution, conda-forge etc. Of course, for some Python-based projects, their non-Python dependencies make it impossible or very hard to be published on PyPI in the first place.
My guess is that this document is catering to beginners and intermediate users, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i hear you both! thank you. question related to this.
i think where this section was coming from is somewhat how i teach (noting that my work when teaching has always been heavily spatial focused). In my experience, when i tried to start with just installing tools from pypi i quickly dug myself into a dependency conflict hole. Once i learned about channels and using conda-forge my life became infinitely easier. so i tried to emphasize to my students to attempt to only install from a single channel IF POSSIBLE.
given that experience, it seems to me as if we might want to encourage everyone in the science community to just cross publish on conda-forge to avoid such issues and to explain to them how mixing channels can cause issues.
for instance i worked on a package that had rasterio as a dependency and so we always had install issues UNTIL we submitted to conda-forge. Does this make sense?
i'm trying to find a middle ground but also really trust both of you as you know WAY more than i do about this space as conda maintainers.
### Managing Python package dependency conflicts | ||
|
||
Python environments can encounter conflicts because Python tools can be installed from different repositories. | ||
Broadly speaking, Python environments have a smaller chance of | ||
dependency conflicts when the tools are installed from the same | ||
package repository. Thus environments that contain packages | ||
installed from both pip and conda are more likely to yield | ||
dependency conflicts. | ||
|
||
Similarly installing packages from the default anaconda package mixed with the conda-forge channel can also lead to dependency conflicts. | ||
|
||
Many install packages directly from conda defaults channel. However, because | ||
this channel is managed by Anaconda, the packages available on it are | ||
limited and not frequently updated. Conda-forge was, in fact, created to tackle this issue of | ||
scientific packaging not being on the default Anaconda channel. | ||
|
||
Conda-forge allows scientists to add any package to the conda-forge channel. | ||
Thus, by installing packages using conda-forge, you reduce the risk of | ||
conficts in your local environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwasser this is a great explanation but too advanced, IMO, for this page. I would remove it. Most conflicts happen due to bad use of the tools and, that, is not PyOpenSci responsibility to solve. We, conda-forge, PyPA, etc, should build better docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to remove! Filipe just a question.
i think it relates to comments above as well. when you introduced me to conda forge and the issues of channel mixing it really changed and removed a lot of the frustration that i encountered creating environments with many spatial tools!
so just from a teaching perspective, i've always emphasized - use conda and conda-forge! and i've tried to get students to avoid channel mixing when possible.
Has this changed? i certainly don't want pyopensci getting involved in conflicts and would always want to lean on conda-forge, PyPA etc for that work! But i also am just a bit confused about the conflicts as i always thought it was just good practice to avoid channel mixing. many thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have this now
The conda-forge channel was created to complement the defaults
channel. It allows anyone to submit a package to be published in the channel . Thus, conda-forge
channel ensures that a broad suite of user-developed community packages can be installed from conda.
### Take-aways: If you can, publish on both PyPI and conda-forge to accomodate more users of your package | ||
The take-away here for maintainers is that if you anticipate users wanting | ||
to use conda to manage their local environments (which many do), you should | ||
consider publishing to both PyPI and the conda-forge channel (*more | ||
on that below*). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok good.. so i think if this is perfect, my thought process generally is ok. you just wanted me to avoid talking too much about conflicts and such here to avoid confusing users trying to publish their packages perhaps?
@@ -0,0 +1,286 @@ | |||
# Creating New Versions of Your Python Package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention CalVer? I don't want to make this overly-complex. It seems to big already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i totally could! i need to do some digging into calver as i've never used it myself. i'll do that tomorrow most likely and see if i can summarize it briefly highlighting why and when someone might want to use it. Then link to more information about it.
I've stumbled over this on Mastodon and was surprised that there wasn't an outreach attempt to the conda maintainers. I've left a few comments now and think I have a better idea what this document is trying to achieve and how the pyOpenSci process is in terms of collaboration. @lwasser Please let me know if I can help clarify any questions you may have! |
Posted on https://fosstodon.org/@[email protected]/110105787481684634 to keep the discussion together :) |
Many install packages directly from conda defaults channel. However, because | ||
this channel is managed by Anaconda, the packages available on it are | ||
limited and not frequently updated. Conda-forge was, in fact, created to tackle this issue of | ||
scientific packaging not being on the default Anaconda channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I agree with the statement here: However, because this channel is managed by Anaconda, the packages available on it are limited and not frequently updated
.
This makes it sound as if Anaconda is not willing to update these packages or if there is something wrong with a company maintaining packages versus a community like conda-forge. The Anaconda Distribution focuses on users that are looking for stability and predictability, via packages that are vetted with consistent processes and quality. conda-forge replaces most of this with automation and delegation to volunteers, a system that has other benefits such as being able to manage more projects at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the not frequent updates all about stability of the ecosystem vs the latest. Think Debian stable (defatuls) vs Testing (conda-forge). (Testing is a terrible name there, it should be "latest and possible unstable, use with caution.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, it's no surprise this is a common pattern for Linux distributions as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok no problem. i can update this! Let me run by you what i was trying to say... then we can get the words right!
my general understanding of the default channel is that it's limited in that adding new packages will ONLY happen is Anaconda decides to add them. Whereas conda-forge anyone can submit a recipe to be added.
but i think what you read there is that i was suggesting they do not maintain / keep the distribution current and i totally understand why - it needs to be rewritten.
so what do you think about something like this:
Many install packages directly from conda defaults
channel. However, because
this channel is managed by Anaconda, the packages available on it are
limited to those that Anaconda decides should be core to a stable distribution. The conda-forge channel was created to complement the defaults
channel. The conda-forge
channel ensures that a broad suite of user-developed community packages can be installed from conda.
Once your package is on the conda-forge channel, maintaining it is simple. | ||
Every time that you push a new version of your package to PyPI, it will | ||
kick off a continuous integration build that updates your package in the | ||
conda-forge repository. Once that build is complete, you will get a | ||
notification to review the update. You can merge the pull request for | ||
that update once you are happy with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify that they'll get a notification from GitHub and not from the channel to set expectations correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should remove most of this and link to the docs upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i get it - but i couldn't find a simple description of what maintaining a conda-forge repo entails. i found this - https://conda-forge.org/docs/maintainer/adding_pkgs.html#maintainer-role but i just wanted a simple, easy to read explanation to show users that it's not a huge maintenance burden. in fact, most of the time its super fast and easy (unless you are a package like scikit-image that has a challenging build!)
i want to try to find that balance of linking to docs but also not sending users all over the place as we have multiple workflows that they need to understand.
so i thought a few sentences here on the general process would be good. but if there is another doc page that explains this that would be great to add too. let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, just a few comments here and there :)
pyOpenSci recommends that you follow the [Python PEP 440](https://peps.python.org/pep-0440) which recommends using | ||
[semantic versioning guidelines](https://www.python.org/dev/peps/pep-0440/#semantic-versioning) | ||
when assigning release values to new versions of your Python package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 440 doesn't actually recommend SemVer here, it just lists SemVer, CalVer ("Olson database versioning" called in there) and versions derived from version control information in relation to their compatibility with the version spec. Given the empty promises of SemVer, I would strongly recommend to not suggest this to beginners.
CalVer is a much more meaningful way to express iterative software releases, especially for beginners, since it uses a very well-known "semantic" system: time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh really? ok i'm learning something here. if we consider this i'll need to talk to someone or one of you more about how you'd then setup versioning with calver and best practices.
The other note is that we were told to avoid linking to the peps directly in general by PyPA folks. so maybe i just need to do some more work on this section ?
1. It helps your users (which might include other developers that depend on your package) understand the extent of changes to a package. | ||
2. It helps your development team make decisions about when to | ||
bump a package version based on standard rules. | ||
3. Consistent version increases following semver rules mean that values of your package version explain the extent of the changes made in the code base from version to version. thus your package version numbers become "expressive" in the same way that naming code variables well can [make code expressive](https://medium.com/@daniel.oliver.king/writing-expressive-code-b69ef7a5a2fa). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has all been debunked as antipatterns I'm afraid: https://hynek.me/articles/semver-will-not-save-you/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with that. While SemVer is not bullet proof. Trying to follow it does help. Also, CalVer is not an ideal substitute: https://jacobtomlinson.dev/posts/2023/sometimes-i-regret-using-calver/
TL:DR we need just to comment that options exists and try to follow one that fits pep440 at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalVer is an alternative that works better for beginners who lack the intuition and experience to look out for API changes and backward-compatibility which is the basis for making SemVer work. They easily fall into the trap of Hyrum's Law.
FTR, the blog post you link quotes the Conda Enhancement Proposal that I co-wrote about creating a predictable release schedule for conda (something the conda community has asked for years), saying:
Much of that CEP goes on to desribe how releases should be created bi-monthly but when I read it I didn’t see much info about the pros/cons of removing semantics from the versioning.
We were not "removing semantics" from conda releases but acknowledged that if you're doing SemVer wrong like conda, you might as well switch to time-based releases. It's easy to get it wrong, there were 150 conda 4.x releases! I think the weblog misunderstands the CEP further and makes generalizing arguments:
But due to using CalVer it isn’t transparent to the user community what implications each release has for them.
Conda doesn't squeeze the deprecation policy into version strings anymore, but has a clear deprecation policy with enough information available in the release notes for every release. That's more complete for non-trivial software like conda.
Granted, this is well out of the scope of this documentation IIUC, so I'll stop with the topic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok wow - i really need to chat more with some folks about this to digest it and rewrite this section. MANY MANY thanks for this. calver came up in another review as well but i can't remember the context right now. but it seems like more serious research is needed for this page. so i'll prioritize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the hard thing is that, as your package grows, you will realise that any change of the API poses a backward compatibility concern. Some folks are very "creative" when it comes to using our stuff 😅
e.g. we have discussion on SciPy about returned shapes, order of parameters, precision up to n-decimals, etc. Some things help (for the order, recommend using keyword only as much as possible, this is a life saver down the road. Scikit-learn recently transition their whole API!), but some other things are more subtle (it's a bug in a specific setup, but fixing it introduces a bunch of other breaking changes for everyone).
What is important to me is to describe what all that means for your package and what is your deprecation policy/process. This way your users are in the know when they update.
e.g. with things like FastAPI, I know I have to use exact pins because it happens a few times that they broke the public API with a minor digit update. And some libraries are extremely conservative like SciPy.
* hatch & | ||
* hatch_vcs plugin for hatchling | ||
* setuptools-scm | ||
* python-semantic-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would focus only on hatch for simplicity’s sake (it's a PyPA project with extensibility and standards in mind).
As context: The main problem in the Python ecosystem is the explosion of tools over the past years when packaging standards were solidified.
Hatch is the most recent (like PDM) iteration of that endless loop. That's also the reason the official packaging tutorial focuses on pipenv (that was the most recent one at the time), but mentions a number of other tools.
Capturing the still evolving Python packaging ecosystem is really tricky, and you can't optimize on documenting everything.
The good news is that hatch-vcs is simply a wrapper for setuptools_scm and you can do the same with hatch-vcs as with setuptools_scm (possibly more because of the cool integration into hatch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing the still evolving Python packaging ecosystem is really tricky, and you can't optimize on documenting everything.
+1 this is something I've been trying to do here. Document less, add more links and suggestions instead to the places where these tools are being documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so i would love for you to check out this page in the guide - https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-build-tools.html i've actually spent a LOT of time researching this. and i keep coming back to PDM. Please know that this all comes from a place of respect i'm not arguing just catching you up on what is in my brain around these tools!
- hatch is technically NOT a PyPA project. After talking with PyPA folks and people on Python discourse PyPA accepts packages if people want to add them to PyPA. but PyPA doesn't necessarily support a single package and also intentionally won't support a single tool!
- i really like the hatch maintainer and have spent lots of time playing with hatch. :)
- the core thing that hatch is missing right now is support for other backends. please see this comment from Ralf Gommers about this issue. i tested it. I CAN use PDM with meson-python but not hatch right now. Ofek did tell me supporting different back ends was coming.
Talking with the scientific-python community and folks like Stefán and Ralf, it would be best to support and suggest a tool that supports both pure python and more complex builds. it just makes life a bit easier when we document and create tutorials. As if the package becomes more complex they could also just swap out a back-end. so i've been leaning towards PDM because it's right now the ONLY tool that supports that.
i do like hatch. i personally had a harder time getting started with it because of the documentation. but that is not a reason to not use it. it's more about the backends being a dead end for the scientific community right now. many core projects are moving to meson python. in that pr i talked with poetry devs, pdm's dev reviewed, ofek from hatch and flits maintainer as well.
so i just wanted to let you know where i am with tools. i'm still very open to your feedback but i've also gotten some other feedback that doesn't support hatch (right now).
the other issue with existing docs (i agree we should link more not recreate) is they are mostly not user friendly. but that's another topic altogether 😄
anyway open to your thoughts here. just wanted you to know where i'm coming from at this point.
### Black and Blue | ||
[Black](https://black.readthedocs.io/en/stable/) and [Blue](https://blue.readthedocs.io/en/latest/) (which wraps around Black) are code | ||
formatters. Both Black and Blue will automagically (and *unapologetically*) | ||
fix spacing issues and ensure code format is consistent throughout your | ||
package. Black and Blue also generally adhere to PEP 8 style guidelines with | ||
some exceptions. A few examples of those exceptions are below: | ||
|
||
* Black defaults to a line length of 88 (79 + 10%) rather than the 79 character `PEP 8` specification. However, line length is a setting can be manually overwritten in your Black configuration. | ||
* Black and Blue will not adjust line length in your comments or docstrings. | ||
* Neither tool will review and fix import order (you need *isort* to do that - see below). | ||
|
||
Blue adresses a few format decisions in Black that some maintainers do not like. | ||
[You can compare the differences here](https://blue.readthedocs.io/en/latest/#so-what-s-different) and decide which tool you prefer! | ||
|
||
```{tip} | ||
If you are interested in seeing how Black will format your code, you can | ||
use the [Black playground](https://black.vercel.app/) | ||
``` | ||
|
||
Using a code formatter like Black or Blue will leave you more time to work on | ||
code function rather than worry about format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to only mention black here and not introduce the readers to the useless bike-shedding that resulted in blue.
It doesn't really help them and sets them up for endless discussions in their own teams about the virtues of some minor stylistic differences. But the main point of code formatters is to stop the discussion and just focus on code's value.
It's important for the next generation of Python developers to not waste so much time with this.
This is also one of the reasons why black has been maintained by the developer-in-residence at the Python Software Foundation, to clarify the intention of the de facto standard. Łukasz has a good talk about this: https://www.youtube.com/watch?v=esZLCuWs_2Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is a great suggestion. @jezdez do you have any thoughts on just using ruff? it seems like many are moving to it instead. but also because it has so many tools packaged into it - it requires a lot of settings (it seems to me) so i've been unsure. but it really seems to be more common these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did just remove blue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Ruff is complementary to Black. Ruff alone is not enough and it's a design choice from the author actually to not duplicate all rules. So it's more a replacement of things like flake8, and other pure linters.
So yes, I would personally recommend using Ruff+Black (pandas is doing this for instance.)
BTW it's worth having a look at the hooks they use: https://github.com/pandas-dev/pandas/blob/main/.pre-commit-config.yaml
[Pre-commit.ci](https://pre-commit.ci) is a bot that may become your new | ||
best friend. This bot, when setup on a repo can be configured to do the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me happy, pre-commit has been such as productivity boost for the projects I'm involved with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yay! i've found the same. I added it to our small team's workflow and the other maintainer is very happy using it. it also makes cleaning up our contributions so much easier!! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i also have @ocefpaf to thank for introducing me to the bot!!
I understand that things fly under the radar sometimes but I did comment about this in multiple conda and conda-forge meetings. Folks just did not seems interested in helping out with the review. |
Huh, I must have been absent from these, glad we found each other this way! |
hi @jezdez @dhirschfeld @ocefpaf ! 👋 i'm so sorry that some of this work caught some of y'all by surprise. I need you all to know that i am a HUGE supporter of conda / conda-forge and have always seen @ocefpaf as this expert guiding force in my work. Filipe in my early years of teaching python guided me to conda and i've never turned back from using it after that! i've taught with conda and now mamba as well for years. and even have some lessons on installing python with conda for scientists that i plan to push out soon :) I've been operating with a: _invite a few folks who i know to review in hopes the word gets spread because i just don't know everyone yet! _ i feel like i may have missed an opportunity to better inform that conda community about our work. i started this pr months ago but the packaging guide pages just published were so much work - talking with so many people (and moderating arguments in some cases) that i just lost track of this pr - and figured id pick back up on it once the other work was done. working on small sections at a time is much easier for me to manage given the number of reviewers that we get. So when you see a small section published, you're getting a small test of a much bigger picture! If there is some better way for me to interface with the conda community please let me know. i'm happy to jump into a meeting or post somewhere. Related the PSF and PYPA i've been working with @pradyun to make sure what we are doing aligns with other efforts. I also have been posting in the Python discourse and chatting a bit with Paul Moore there Brett and others there online (who i'm just starting to "meet" via those online conversations). so folks generally are aware of our work and we hope to align wherever we can with other work - but not everyone of course! please know i have only the best intentions here! but also i'm getting to know people in the ecosystem as i work!! so i did not intend to "surprise" anyone that is already super invested in supporting core infrastructure (such as conda maintainers)!! but it did indeed happen so my apologies for that. thank you all for being here and for reviewing for us. i'll start working on this section again soon and will ask questions as i do that work. I am happy to answer questions, and interact as we further flesh this out. The audience is beginner to intermediate users here. And specifically we support scientists. we also have an open peer review process for software and are partnered with JOSS in that effort. We are all about building supportive diverse community and want to include everyone as we develop sound resources for packaging in the scientific ecosystem. ✨ |
I've put up a pull request to add @dhirschfeld! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing that, great write-up with a lot of valuable info. I have a few suggestions 😃
### Black and Blue | ||
[Black](https://black.readthedocs.io/en/stable/) and [Blue](https://blue.readthedocs.io/en/latest/) (which wraps around Black) are code | ||
formatters. Both Black and Blue will automagically (and *unapologetically*) | ||
fix spacing issues and ensure code format is consistent throughout your | ||
package. Black and Blue also generally adhere to PEP 8 style guidelines with | ||
some exceptions. A few examples of those exceptions are below: | ||
|
||
* Black defaults to a line length of 88 (79 + 10%) rather than the 79 character `PEP 8` specification. However, line length is a setting can be manually overwritten in your Black configuration. | ||
* Black and Blue will not adjust line length in your comments or docstrings. | ||
* Neither tool will review and fix import order (you need *isort* to do that - see below). | ||
|
||
Blue adresses a few format decisions in Black that some maintainers do not like. | ||
[You can compare the differences here](https://blue.readthedocs.io/en/latest/#so-what-s-different) and decide which tool you prefer! | ||
|
||
```{tip} | ||
If you are interested in seeing how Black will format your code, you can | ||
use the [Black playground](https://black.vercel.app/) | ||
``` | ||
|
||
Using a code formatter like Black or Blue will leave you more time to work on | ||
code function rather than worry about format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Ruff is complementary to Black. Ruff alone is not enough and it's a design choice from the author actually to not duplicate all rules. So it's more a replacement of things like flake8, and other pure linters.
So yes, I would personally recommend using Ruff+Black (pandas is doing this for instance.)
BTW it's worth having a look at the hooks they use: https://github.com/pandas-dev/pandas/blob/main/.pre-commit-config.yaml
rev: 22.12.0 | ||
hooks: | ||
- id: black | ||
language_version: python3.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick to update the version to something more recent 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh a good nitpick!! good eye - thank you :)
1. It helps your users (which might include other developers that depend on your package) understand the extent of changes to a package. | ||
2. It helps your development team make decisions about when to | ||
bump a package version based on standard rules. | ||
3. Consistent version increases following semver rules mean that values of your package version explain the extent of the changes made in the code base from version to version. thus your package version numbers become "expressive" in the same way that naming code variables well can [make code expressive](https://medium.com/@daniel.oliver.king/writing-expressive-code-b69ef7a5a2fa). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the hard thing is that, as your package grows, you will realise that any change of the API poses a backward compatibility concern. Some folks are very "creative" when it comes to using our stuff 😅
e.g. we have discussion on SciPy about returned shapes, order of parameters, precision up to n-decimals, etc. Some things help (for the order, recommend using keyword only as much as possible, this is a life saver down the road. Scikit-learn recently transition their whole API!), but some other things are more subtle (it's a bug in a specific setup, but fixing it introduces a bunch of other breaking changes for everyone).
What is important to me is to describe what all that means for your package and what is your deprecation policy/process. This way your users are in the know when they update.
e.g. with things like FastAPI, I know I have to use exact pins because it happens a few times that they broke the public API with a minor digit update. And some libraries are extremely conservative like SciPy.
* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
…i#86) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
updates: - [github.com/codespell-project/codespell: v2.2.4 → v2.2.5](codespell-project/codespell@v2.2.4...v2.2.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Move the the pyOpenSci home page link into the "More" dropdown
pre-commit.ci autofix |
ok everyone - i am merging this now! we can definitely make some updates in the future to these new pages. but the conda pages are the pages that most people are interested in !! and this has been open since january (due to my schedule :( ) ! so i'll merge and we can plan to review in the future again as needed. |
This PR is now open for review!