-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor _read_into_buffer #47
Conversation
src/dnaio/_core.pyx
Outdated
self.record_start = 0 | ||
self.file = file | ||
if buffer_size < 1: | ||
raise ValueError("Starting buffer size too small") | ||
|
||
def __dealloc__(self): | ||
if self.buffer != NULL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, you don’t need to explicitly check for a null pointer because PyMem_Free
will just not do anything if it gets passed one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven’t gone through everything in detail, but I like the individual improvements. I’ll trust the tests to have caught any problems this time. I added a single comment, but that’s essentially just a FYI and you don’t need to change it if you prefer. Please merge this one yourself!
Your test suite is pretty thorough, which is why I dared to refactor like this in the first place. Thanks for the review! |
When making #39 _read_into_buffer was begging for a refactoring, but that would mess up the diff immensely.
So here is my refactor that I did over the weekend. I realize that I have been bombarding this project with PRs, but the good news is: this is the last one. Everything I have to add is now a PR or an issue.
This refactor:
self.c_buf
->self.buffer
.self.buf
andself.buf_view
are now gone. There is now only a C buffer. The original code had three representations of the same buffer and I thought that was a bit confusing.self.bufend
->self.bytes_in_buffer
. A more clear description of the variable meaning.PyMem_Realloc
instead of creating a new bytesarray object when the buffer is too small. This uses less code and is easier to understand.memmove
instead of bytearray semantics. This is equally understandable.self.record_start
is now set to 0 just below the memmove part. This makes the code easier to follow. Previously it was all the way down at the end of the function because its value was used by the EOF checks.The one disadvantage is that
readinto
cannot be used for C buffers, so this usesread
andmemcpy
. This does not matter for speed because io.BufferedReader and gzip.GzipFile use the same implementation of readinto (and these cover >99% of our input use cases). It is a bit more verbose though than readinto, but quite understandable. The advantage is that we read afilechunk
and that we can usefilechunk_size
to see if we reached EOF. That is a very idiomatic python thing to do.There are no speed advantages or disadvantages to this PR. It does the same thing semantically as the old code. There is however a size advantage:
This is because cython does not generate all sorts of generic memoryview and bytearray method code.