-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Avoid requesting data from s3 buckets from our docs #56762
DOC: Avoid requesting data from s3 buckets from our docs #56762
Conversation
Make consistent with other s3 bucket URL examples and avoid doc build error when problem with s3 url.
Make example consistent with other code block examples
/preview |
doc/source/whatsnew/v2.3.0.rst
Outdated
@@ -206,6 +206,7 @@ Styler | |||
|
|||
Other | |||
^^^^^ | |||
- Bug when building html documentation from ``doc\source\user_guide\io.rst`` no longer calls S3 bucket URL (:issue:`56592`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will resubmit shortly.
doc/source/user_guide/io.rst
Outdated
|
||
df = pd.read_xml( | ||
pd.read_xml( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please write down the output?
Thanks @JackCollins91 for the help with this. The bucket does actually exist, the error inside a docker seems to be related to connectivity. The idea is that regardless of it working, it'd still be nice to not have our docs getting those files from s3 at every build. What you did to achieve that is correct, replacing the I think we want to do this for every block that access s3 buckets, not only one single block. You can find the output you need to add to the examples in the rendered docs: https://pandas.pydata.org/docs/dev/user_guide/io.html Let me know if you have any question. |
Thanks @datapythonista . Thanks for the guidance. I'll update the PR shortly. |
For each S3 bucket code block, ideally we show what the output would be, but without making an actual call. Unfortunately, for several of the S3 buckets, there are issues with the code, which we must fix in another commit or PR. For now, the two S3 examples that do work, we edit to make the code block show what the output would have been if it had run successfully. Find details on issues in conversation on PR pandas-dev#56592
Code still doesn't run, but at least unmatched } is no longer the issue.
Hi @phofl and @datapythonista, The original S3 example which was causing the issue for #56592 is now fixed and displays the intended output as requested, so I believe this PR now fixes the original issue. I have attached a pdf file (not sure of the etiquette here - hope that solves any dependency issues) of a preview of the output now. However, in trying to extend the solution to all S3 example codes in the doc, I ran into some issues I'd appreciate advice on how the pandas community would prefer these be managed.
They are as follows. I can suggest options 1) Leave this PR as is, at least it fixes the simple S3 issues. or 2) try and fix all errors, but I might require some assistance to check all the S3 buckets are still ok and the authorizations needed therein. Error messages as follows. 1/3
2/3
3/3
|
Preview of the fixed output and the errors when the S3 bucket code blocks run in ipython. |
I think the output should be the same as in this page: Do you mind to send a link inside that page to the example with the very long dataframe, and one of the failing cases, to see how is it shown now please? |
Hi @datapythonista thanks for looking this over. I'm sorry but I'm not sure I understand your request. Do you want the PR to make the new io.html page look exactly the same as the existing one? I can do that, but only one of the S3 bucket examples would then show what the output looks like. If that's ok, I can do so. I am not sure what kind of 'link' you are requesting above. If you mean you want to see what the io page will look like after this PR is merged, I have that here... And if you want to see a page preview that shows what the S3 related errors are in the page, that can be seen here Let me know if this is what you wanted or if I have misunderstood. Sorry for inconvenience. Cheers |
Sorry, what I would like is to see how the example with the big dataframe looks now, before merging this PR (I don't know which example it is). Same for one of the failing ones. My understanding is that what we want is to simply stop running code that access s3 buckets. If the dataframes aren't been displayed now, I would expect to keep that. If I missed the point and we want to start showing output in examples that currently don't, I'd personally do this in two steps. Replacing in this PR ipython blocks by code-block blocks and keeping exactly the same output. And displaying the content we want in a separate PR. Let me know if I'm misunderstanding things. @mroeschke I think you are the one who wanted to stop running code that access s3. Does it make sense what I say? |
Thanks for the clarification @datapythonista . Regarding the large DataFrame (happy to revert if not desired) Regarding the failing code blocks, I have listed the failing output as code blocks in the comment above - is that sufficient? Regarding the code block which caused the original issue #56592 Before (issue is it makes an S3 call): Hopefully I've provided the info you wanted in a convenient way, but let me know if any issues. Sorry I am new to pandas community open dev. |
avoids unnecessary file change in PR
Why do you prefer your version in your versions and not what's in the original? I'm unsure what you are trying to achieve sorry |
Hi @datapythonista , You requested this to be the case for every S3 bucket example, but right now it is only possible for the two listed above. Is this a reasonable motivation for the PR? |
Yes, absolutely, your changes are great. I think there is only one thing when we are not understanding each other. What I'm suggesting is to not change the output. If there is output being displayed now, to continue to display it. If there is not, to continue to not display it. There was a comment earlier on the reviews about showing the output. What I understood is that the changes you made at that point stopped showing the output that previously existed. My understanding is that in your changes now you start showing output in places where we currently don't show it. The output that you are showing as you say is a bit tricy, dataframes are huge and you then show the columns. My last question was whybdo you think introducing this output is better than continuing to not show output. Is it just because in the review comment you were asked to show the output. Or is it because younpersonally thing showing the columns is better than not showing anything? My suggestion is that in this PR you just focus on making the changes to stop running the examples that query s3. This should be non-controversial and we can get it merged quickly. Changing the output is way more opinionated and will take more discussions. I would do that if you want to do it in a follow up PR. Does this make sense? |
Rollback changes to one of the examples (out of scope)
Hi @datapythonista , this makes sense to me now. Thanks so much for taking the time to clarify. You have correctly assessed the PR and yes, I only added in the large DF outputs because I thought there was a desire for consistency among S3 examples. However, I now understand the preference is to keep the document the same as before, with the sole change being that now S# calls are not made. Code chunks that previously had no output should continue to do so. The same for code chunks which display output. I will have made a commit now that does exactly this. Let me know if any issues and thanks again :) |
doc/source/user_guide/io.rst
Outdated
... "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", | ||
... xpath=".//journal-meta", | ||
...) | ||
>>> df.head(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the file contain only one row already? I see that now we simply show df
and the output is the same. Am I missing something?
>>> df.head(1) | |
>>> df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it's redundant. Will revise now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll let @mroeschke have a look before merging.
What is the plan regarding the other blocks that access s3 buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Yeah if interested in following up, another PR turning all examples in our docs that access s3 buckets to code-block
s woul be appreciated
Thanks @JackCollins91 |
…56762) * Update io.rst Make consistent with other s3 bucket URL examples and avoid doc build error when problem with s3 url. * Update io.rst Make example consistent with other code block examples * Update v2.3.0.rst * immitating interactive mode For each S3 bucket code block, ideally we show what the output would be, but without making an actual call. Unfortunately, for several of the S3 buckets, there are issues with the code, which we must fix in another commit or PR. For now, the two S3 examples that do work, we edit to make the code block show what the output would have been if it had run successfully. Find details on issues in conversation on PR pandas-dev#56592 * Update io.rst Code still doesn't run, but at least unmatched } is no longer the issue. * Update v2.3.0.rst avoids unnecessary file change in PR * Update io.rst Rollback changes to one of the examples (out of scope) * Update io.rst * Update io.rst --------- Co-authored-by: JackCollins1991 <[email protected]>
doc/source/whatsnew/v2.3.0.rst
file if fixing a bug or adding a new feature.ipython code chunk would make call to S3 bucket URL. Most often harmless and therefore not easy to replicate, but some users reported error when building html documentation (see issue #56592) when there was some access issue to the S3 bucket URL. Decision was to change to code block to avoid calls.