-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation Plan: Switch Python package management away from Pipenv #286
Comments
I've just learned about It follows existing Python conventions closely, so may be more usable for the catalogue (which relies on requirements.txt files in a way that it can't use Pipenv for now). However, it's just some tools around pip, so doesn't manage virtual environments. That's an advantage Poetry and PDM have (and Pipenv as well), the venv management is transparent and you don't have to think about it. I'd be nice to document the trade offs here. |
Add initial query values Handle media reset on query change only in `search` page Update src/utils/prepare-search-query-params.js Co-authored-by: sarayourfriend <[email protected]>
* Remove tsv_to_postgres_loader_overwrite * Remove overwrite tests & extra steps
We have not prioritised this so far, which makes sense because of the large scale disruption to at least two parts of the stack (the API and ingestion server) and many other smaller sub-projects. However, two developments suggest we should be considering it with increased priority.
|
That makes a lot of sense to me, Dhruv. We could even go ahead and do this scoped just to the API, and assume the ingestion server is a non-issue long-term due to the removal project you linked. I'll update the title of this to represent that. There are some really simple solutions we could use, given we're just an application and not a package, and don't necessarily need pyproject.toml for the API. However, Poetry and other tools might have better support for cross-referencing packages in a monorepo context, which would be nice if we pulled certain shared utilities out. There are ways around that, though. |
I'm quite pleased with how Poetry works in my personal projects, and I believe @AetherUnbound is too. I would prefer to not have to learn another toolchain with The implementation plan should list down some concrete requirements and good-to-haves that we expect from the final solution (including but not limited to targeted package updates, good monorepo/workspace support, integration with Renovate/Dependabot etc.) and those will help us decide the best tool for the job. |
Might be good to iron those out before the actual plan, so that the plan can focus on determining what meets the requirements, rather than detemrining the requirements on top of that. They're separate discussions. Let's just go with Poetry if y'all are happy with it. It's enormously popular and works well in my experience as well. |
Noting that while the ingestion-server is on its way out, the indexer-worker (which will live in the catalog) will still be Pipenv based (assuming #4026 is not changed significantly before approval). |
Was asked to come share my experience now with multiple workflows. Before going too far, my current standard for my development is just use That said, I keep playing with these tools so can maybe speak to potential failure modes: pip-tools I generally like. It's specifically a suite of tools built on top of a pip based workflow. Poetry I've only used a little. My current big complaint is that it strongly wants the package name to match the project name, but that might not be a problem for you. The good of poetry is its about as close to NPM for Python as I've encountered. I think if you have folks who are weaker on python, the instructions are much clearer than the pip-tools pile. It also allows dependency grouping. I don't think either properly solve the monorepo problem. I have to maintain a mono-repo at work and my actual process for things I'm not needing editable, is cding from directory to directory and doing a build of each (or a full install because it's "easier") and moving back to what I'm working on to pip install the thing I'm working on. |
Thanks for sharing that, @pathunstrom. I wasn't aware Poetry didn't follow PEP 621! It makes sense given its context, but as we're only moving to it after it is settled, I think it would be good to choose a tool that does follow it. I use hatch extensively and like it, but the lack of lock files make it inappropriate for applications. I've used PDM before, and it has, I think, all the features we want:
I think that makes it an appropriate choice. It sees active development and is supported by Renovate through it's generic PEP 621 support. It also has an opt-in centralised installation cache to speed up re-installations and reduce internet usage like PNPM. @dhruvkb what do you think? @WordPress/openverse-maintainers? |
I didn't want to form any opinions before trying out all the possible tools. So I tried them all, from toy projects for new tools like PDM https://github.com/dhruvkb/pdm-mono and Rye https://github.com/dhruvkb/rye-mono to a full scale migration attempt for Poetry #4107, which is familiar. I really liked Rye mainly for its speed. It is incredibly fast! It's still not a good fit because it's very unstable (pre-v1), misses important features that we need (like cross-platform lockfiles) and is still largely maintained by one person. It's also very ideologically different and tries to do a lot more that isn't applicable to us (like managing Python installations). PDM was also very good, but I could not wrap my head around the monorepo setup. For example, try as I might I could not figure out how to lock dev dependencies for the sub-packages. I expected This got me wondering why we even need monorepo support in our Python projects. Such a setup puts all our packages in one virtualenv and makes it harder to get a project specific lockfile that we can use inside Docker. So I gave Poetry a try without a monorepo setup. It this case, the API defines a path dependency on a package in Poetry has good venv management, targeted dependency upgrades, support from Renovate and Dependabot, cross-platform lockfiles, support for build backends (I don't know why we would need that) and good CLI experience. It's not PEP-621 compliant but I personally would take that tradeoff for a mature, polished experience that's suited for applications, works quite well for libraries and produces understandable dependency chains. These are just my observations, I would like to hear your thoughts after you take a look at #4107. |
I was interested in monorepo support so that we could easily cross-reference packages locally and have dependencies locked between them. For example, if we abstracted code out to be shared by the catalog, API, or indexer workers (e.g., Elasticsearch mappings, other utils, etc), I thought a monorepo would help with that. In fact, it isn't necessary. We would use an editable local dependency for that anyway. Regarding these two from Rye:
We don't need this, though. Our lock files only matter for Linux, because that's where we run the application, and we can generate them in the docker images themselves (and probably should!). We do not run our apps on our local machines ever and we shouldn't try to support that (there are bigger more complex issues we'd need to solve for it). If this is a significant issue for us using Rye, I would highly encourage documenting why cross-platform lock files are relevant to our project. If lock files were relevant for isolated package developments, it'd be a different question, because that generally does happen locally. However, Python packages do not use lock files (and cannot because of how package dependencies are incorporated into a single resolved dependency list) so it is irrelevant.
I disagree this doesn't apply to us. I find it a pretty big pain to need to manage Python installations locally for Openverse, when every other Python project I work on uses a more recent Python version and uses package managers that download Python distributions for you. This is a really nice feature, and especially when we get around to writing Python packages, it makes testing with multiple Python versions (including pypy) trivially easy. This is also pretty common now because https://pypi.org/project/pbs-installer/ makes it basically a non-issue. Rye, PDM, and Hatch all use that or the backing library and in my experience both PDM and Hatch work perfectly for this. For package development, this is a must have QoL feature. I would go back to
We need this for building Python packages, which we have expressed intentions to do. Poetry sounds fine to me. I place a high value on PEP-621 compliance, especially if we are choosing a new tool to use in 2024, but if that is too high a bar, and if the tools available for achieving that are otherwise unsuitable, then fine. It's probably less relevant for applications, but that least me to the next point... We need to keep in mind that we do want to develop Python packages, not just apps. Whatever solution we choose should either work for both or not prevent us from using another for packages if appropriate. I say go for it with Poetry, so long as we can 100% use rye/hatch/whatever for package development, for packages that will get referenced locally by Poetry as local editable dependencies. I don't see why not PEP-621 for packages, especially when lock files are irrelevant there. |
Problem
Pipenv was a good start, but it has some issues that are difficult to work around:
Description
It would be great to have a concise RFC to investigate what it would take to switch from Pipenv to Poetry or PDM. The RFC should make a recommendation for one or the other and describe the trade-offs between them.
Please evaluate the effort required for such in the API and openverse "meta" repositories.
Please also evaluate the effect on any Dockerfiles and other deployment artifacts.
Additional context
WordPress Make Slack conversation here: https://wordpress.slack.com/archives/C02012JB00N/p1659363874533649
You can register for a Make Slack account here: https://chat.wordpress.org
Implementation
The text was updated successfully, but these errors were encountered: