-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ipfsapi dependency to new name, adapt ipwb tests to Py3 #609
Conversation
Fix client invocation per the new API module's docs. Fixes #608
Travis fails because the tests are still written for Py2 despite Py3 compatibility. I'll update Travis to use Py3 then fix some tests before we should merge this. This should ultimately help us close #51, too. |
Updated to use xenial in 139e4ca, as required by TravisCI for testing in Python 3.7 but when Travis tries to pull the Python 3.7 binary, it still seems to have the 14.04 reference in the URI. |
Oops, still had the old dist listed further down in .travisci. |
I am encountered the rabbit hole of our tests where Py3 expects bytes and Py2 considers a string. We'll work in this PR to resolve it and hopefully complete the Py2->3 process in doing so. |
@@ -52,7 +52,7 @@ | |||
|
|||
DEBUG = False | |||
|
|||
IPFS_API = ipfsapi.Client(IPFSAPI_HOST, IPFSAPI_PORT) | |||
IPFS_API = ipfsapi.Client(f"/dns/{IPFSAPI_HOST}/tcp/{IPFSAPI_PORT}/http") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is time to retire IPFSAPI_HOST
and IPFSAPI_PORT
and replace them with a single config called APFSAPI_ADDR
that will hold the multaddr value as described in the new API. The rationale behind this is the fact that the formation of address now depends on the format of the host. This change assumes that the IPFSAPI_HOST
will be a hostname, but if it's an IP address, then this will render invalid. Consolidating complete ADDR in one string will make sure that the user modifies everything accordingly (and any misconfiguration is users' fault, not of the system).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibnesayeed I agree that the semantics are off on HOST vs. IP. Let's update that in a separate optimization request with respect to #349, where it will be most useful. Please see #610. Until then, for the sake of integrating with the new version of the module, let's keep HOST and PORT for now, despite the lack of semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have made this change in this PR, but if you would prefer doing it in yet another PR, that is alright too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in #610 will require additional parsing logic and assuring that they have been applied across more files. The ticket ought to be representative of the need to do it and will hopefully allow us to build some test cases around it, which seems like it would be scope creep for this PR/issue.
@@ -1,7 +1,7 @@ | |||
# Number of entries in CDXJ == number of response records in WARC | |||
|
|||
import pytest | |||
import testUtil as ipwbTest | |||
from . import testUtil as ipwbTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change in the import style seems unrelated to this particular PR as it is more about Py3 support. If this is a dependency, then these changes should be made independently and merged prior to this PR. Consider this comment for the rest of the changes in test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did not originally intend of your code review of this, but because I used f-strings above, which are a Py3 feature, and the tests are still in Py2, we could not have the original changes in this PR without updating the tests to also be Py3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with Py3 changes and understand the dependency situation here, but my primary concern was the title is not reflective of changes. That's why I suggested that we make these Py3 changes in a separate PR, merge it in this branch, and then test this branch. However, if that is too much work, we can go ahead with this one as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibnesayeed I had started to do this...3 months ago in the issue-51-tests branch. The effort was never completed and I ended up redoing some of that here with additional fixes for the test.
I have updated the title of the PR to better reflect the changes. 😃
A current holdup in validating the tests is string replacement of a placeholder within a WARC. This is used to produce a unique WARC, which should generate a unique IPFS multihash, at least for testing purposes. The previous logic was tailored to Py2. In partially adapting the tests to Py3, the approach coded to replace the string within a string does not work. Distilling this down to a code snippet, as derived from
This comes down to the ol' string vs. bytes approach. @ibnesayeed, can you recommend a fix for this? I am a little short on cycles but would like to get this PR merged at some point for the sake of maintainability a la Py3 adaptations. |
A solution might lie in changing |
I would suggest using both find and replace strings as bytes and call replace before decoding the content. |
That's the approach I'm looking into per above, @ibnesayeed. The conversion has other side effects in the tests, which I am hoping to adapt. Thanks! |
cd6fa76 fixes the bytes/string randomization issue but Travis still reports that some tests in |
One issue exists in the reported error:
would return a list but returns an iterable in Py3. Using EDIT: or, as I did in 5f30101, convert the iterables to lists and keep the same testing logic. 😄 |
Codecov Report
@@ Coverage Diff @@
## master #609 +/- ##
==========================================
- Coverage 24.3% 24.14% -0.17%
==========================================
Files 6 6
Lines 1218 1222 +4
Branches 182 184 +2
==========================================
- Hits 296 295 -1
- Misses 901 905 +4
- Partials 21 22 +1
Continue to review full report at Codecov.
|
pip3 installing 5f30101 then
This is reminiscent of ipfs-shipyard/py-ipfs-http-client#137 . We may want to retest ipfshttpclient with a zero length string to see if the issue resurfaced. |
A quick manual test with verbose logging shows that the trouble is from the binary froggie images, as expected. |
Above is invoked when the indexer calls |
I encountered this related ticket: ipfs-shipyard/py-ipfs-http-client#184 ...which seems to pin the problem on urllib3. Using a virtual environment instead:
produces
but still precedes this with
while indexing. EDIT: While inspecting this closer, the IPFS hash for the image body is |
I isolated and replicated the behavior with:
In manipulating the PNG's bytes, I noticed an issue that might be the source of the problem. When I removed any byte or manipulated the byte string in some way, no exception was thrown. If I reran the same code twice, the From this I gather that with binary data, when the hash is being retrieved from IPFS upon I have yet to isolate what it is about the byte string that causes this issue, as HTML in a byte string does not. |
Perhaps it's the size of the payload being pushed to IPFS? If I restart the IPFS daemon, it seems to allow the payload to be pushed again without issue. This non-deterministic behavior might be better explained by how the API module is pushing content using the daemon. Will report to the module repo. EDIT: Reported the issue at ipfs-shipyard/py-ipfs-http-client#187 |
@ibnesayeed Per my latest comment in ipfs-shipyard/py-ipfs-http-client#187, using a more updated go-ipfs fixes the broken pipe issue. |
Thanks for figuring this out. I think we should subscribe to the Release notifications of |
I was aware that there was a new release of go-ipfs but was unsure if ipfsapi supported it due to explicit blacklisting of go-ipfs 0.4.20. As an aside, the Py3-based tests in this PR pass now. I will do some further sanity checks and we can hopefully get an ipwb release out in the next day and close the relevant GitHub issue. 🚀 |
Fix semantics of host/port as @ibnesayeed recommended in #609
Fixes #608