-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor xopen #143
Refactor xopen #143
Conversation
a6a551c
to
5c8555d
Compare
This is quite a big PR and I’m not so comfortable reviewing it as-is. I’d rather review many PRs with few commits rather than 1 PR with many commits. Do you think you can split this up a bit? You already have the three branches. Regarding Git history, I noticed that there are a lot of "Fix [something]" commits and many commits that do not pass CI. I call these "half commits" because they are not self contained. Ideally, no commit should break CI. Half commits lead to a noisy Git history and make I don’t want to ask you to perfectly clean up the history here because it is more work to do this afterwards than to do it while working. That said, here are a couple of strategies that I use for manipulating Git history so that it looks nice. You probably do some of these, but I thought I’d just write down what came to my mind. Let’s say I work on refactor B and realize I need to do refactor A beforehand. I either start a new branch or use I very often use I try to run all tests after each commit. (Since this is a bit boring, I sometimes don’t, but I also regret it sometimes.) Local tests should be run on the clean repository, so I use When I realize that the most recent commit does not pass CI, I fix the problem and use If I notice that commit earlier than the immediate preceding is broken, I make a "fix" commit, then use rebase to squash the broken and "fix" commit. I make lots of temporary commits that I mark with "fix ..." or "wip ..." so that I will remember to clean them up later. I use interactive rebase ( |
Thanks for the feedback. I see where you are coming from. I will do this in separate PRs with a more clean git history. Since the changes are individually not so big (less than 200 lines) I can each put them in a separate commit. I really should be more mindful of my git behavior. Currently I spam quite a lot of commits. Something to work on for 2024! |
Created #145 . I will close this as this PR will not be pulled as is. |
fixes #141
So I started with making PipedCompressionReader and -Writer binary. This was a refactor that saved quite a few lines, and after that I just kept going.
Changes:
_open_format
functions only return binary streams. Thexopen
function wraps these streams in an io.TextIOWrapper if the requested stream is a text stream. The advantage of this model is that text encoding arguments do not have to be passed through the application. Change compared to main).As you can see the changing is iterative so the definitive edition is in this branch. I made the other branches to make it easier to look at each change in isolation. For your reviewing comfort.
Before:
__init__.py
1296 linesAfter:
__init__.py
: 734 lines.The
xopen
API is unchanged. Thetest_xopen
file contains no changes. Internally I made quite a lot of backwards-incompatible changes. This is only problematic if people were using our internals before. (I only ever usedxopen
, so I don't think this is very likeley). Nevertheless a signaling with a version 2.0.0 wouldn't hurt. I made_PipedCompressionProgram
start with an_
so it is clearly signalled that this is an internal and not a public API.EDIT: This is purely solving technical debt. So it is not necessary to review this quickly. Mabe we should make a new release first with all the zstandard changes?