Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 Upgrade (minimal) #577

Merged
merged 23 commits into from
Oct 10, 2024
Merged

Python 3 Upgrade (minimal) #577

merged 23 commits into from
Oct 10, 2024

Conversation

offbyone
Copy link
Contributor

@offbyone offbyone commented Oct 7, 2024

Note: This PR is built on the work started by @JacobCoffee in #569

A minimal Python 2 to 3 upgrade for planet. This is not a complete rewrite, it is the code changes necessary to allow the existing sequential feed updater to run using Python 3.12 rather than 2.... well, probably 2. Looking at this code I suspect it was Python 1, mostly.1

Notable Changes

  • replaced the templating system with jinja2
  • replaced the HTML santization library with markupsafe
  • unvendored feedparser
  • unvendored stdlib logging.
  • added a pyproject.toml
  • added at least a small "this thing sort of works" test
  • version 4.0

Notable Unchanges

  • the caching system is unchanged and still based on shelve
  • no parallelization has been added
  • only the least-invasive code changes necessary to generate the planet HTML were made

Work left to do

JacobCoffee and others added 20 commits July 22, 2024 17:32
- removed the compat_logging, since logging is in the stdlib
- removed sanitize in favor of markupsafe
- starting to add typing
- fixed boolean flags in many places
- switch from hand-parsed args to argparse
- switch from htmltmpl to jinja2

TODO:

- core templates need to be converted to jinja2
- cache keys are still failing
The 2to3 conversion changed the `has_key` to `in` but in this instance,
the class overrides `__contains__` in a way that changes the semantics
of it.
- use markupsafe instead of the stripHtml mini-class
- fix sorting of news items
- handle out of bound dates (terribly)
@ewdurbin
Copy link
Member

ewdurbin commented Oct 8, 2024

Hi @offbyone! Thanks for picking up the torch here!

Is this in a state that generates usable output, or does it still have some ways to go before being (manually) testable?

@offbyone
Copy link
Contributor Author

offbyone commented Oct 8, 2024

It generates useful input on first run, I'm still having some issues with caching. I expect to have time to dig into that this week.

@Mariatta
Copy link
Member

Mariatta commented Oct 8, 2024

Thank you @offbyone for the great work.
I'm not super well-versed with infra/devops kind of thing, but wondering if this is something we can like have deploy preview build kind of thing?

I'm also wishing we could automate the check for valid rss using GH actions or as part of CI. Would that be possible to add? Should I open an issue about it?

@offbyone
Copy link
Contributor Author

offbyone commented Oct 9, 2024

Opening the issue can't hurt; worst-case, it won't happen, but if it makes sense to do, we can.

@Mariatta
Copy link
Member

Mariatta commented Oct 9, 2024

Thanks. I've opened #579 and #580

- switch to argparse
- fix a couple of minor 2-3 bugs
- slightly nicer logging of a couple of exceptions
@offbyone offbyone changed the base branch from py2-to-3 to master October 10, 2024 06:45
@offbyone
Copy link
Contributor Author

Okay, I'd call this ready to review.

I don't know much about how it's currently running or deployed, although I can make some inferences based on the current code. I'm happy to sysadmin that shit if the owners of the host want me to.

ewdurbin added a commit to python/psf-salt that referenced this pull request Oct 10, 2024
ewdurbin added a commit to python/psf-salt that referenced this pull request Oct 10, 2024
@ewdurbin ewdurbin mentioned this pull request Oct 10, 2024
ewdurbin added a commit to python/psf-salt that referenced this pull request Oct 10, 2024
planet: deploy planet from a py2 specific branch

temporary measure while python/planet#577 is in progress
@ewdurbin ewdurbin changed the base branch from master to py3 October 10, 2024 12:35
@ewdurbin
Copy link
Member

Merging this into a new py3 branch to temporarily deploy it alongside the current deployment.

@ewdurbin ewdurbin merged commit daff861 into python:py3 Oct 10, 2024
@ewdurbin
Copy link
Member

https://planetpython.org/3/ is now running and updating along side the main instance.

The initial result is a little confusing, I guess we'll see how it behaves over time?

@offbyone
Copy link
Contributor Author

No, I think I see what's happening there.

There are two things that I'll want to address, one is full text feeds and the other is something that's possibly stateful. The DB has a field - hidden -- that it seems to infer from something about how "new" the feed is. Since I don't have the original DB, it's possible there are feeds not showing up in the new one that might otherwise be there. If you can get me the contents of /srv/planetpython.org/ and /srv/cache from the old runner, I can see about getting things consistent.

@offbyone
Copy link
Contributor Author

Oh, man... yeah, there are some bugs in that output now that I can see it beside the current planet. I'll put together some fixes by the weekend (evenings are a bit slammed for me this week)

@ewdurbin
Copy link
Member

@offbyone
Copy link
Contributor Author

Can you tell me which version of python (and, possibly, OS version and dbm library version) are involved in generating that? I hope it won't matter, but I might need to spin up a replica of the original to use those caches.

@JacobCoffee
Copy link
Member

JacobCoffee commented Oct 10, 2024

@offbyone
Python3 is 3.12.2
Ubuntu 18.04.6 (behind due to Python2 impl)

@ewdurbin
Copy link
Member

ee@planet:~$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
ee@planet:~$ uname -a
Linux planet 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
ee@planet:~$ python
Python 2.7.17 (default, Mar  8 2023, 18:40:28) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

@ewdurbin
Copy link
Member

@offbyone 3.12.2

That's the preview deploy, the state is from the original.

@ewdurbin
Copy link
Member

Ah, @offbyone do you want the cache/output for the new version or the original?

@offbyone
Copy link
Contributor Author

Honestly, "both is good" is even better. That way I can apples-to-apples it.

@ewdurbin
Copy link
Member

for the new thing: https://planetpython.org/cache3-1728570557.tgz

$ python3.12 
Python 3.12.2 (main, Oct 10 2024, 13:10:02) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

@offbyone
Copy link
Contributor Author

Okay, so I have some good news and some bad news.

The good news is that I can definitely get the ordering issues sorted.

The bad news is that as far as I can tell, I can't use the cache from the v2 version with v3, to retain the "hidden" state from the old feed cache. I'm looking into seeing how I can get that status, because it would radically change the feed if we lose it.

@offbyone
Copy link
Contributor Author

I'm going to see if I can figure that one out somehow. I suspect I'm missing something with the weird ways dbm loads files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants