-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20360][PYTHON] reprs for interpreters #17662
Conversation
Thanks for working on this @rgbkrk, having better IJupyter/nteract support seems useful. |
Nevermind what I just said, I was context switching between working on javascript and python -- I introduced the |
4feb079
to
25f0ce6
Compare
cool, should we add something similar to SparkSession? |
Happy to! What would you like to see? |
Should it just defer to the session's spark context? |
for starter, version and spark UI like this ok, maybe not too many things :) |
python/pyspark/context.py
Outdated
</div> | ||
""".format( | ||
spark_version=self.version, | ||
spark_ui_url=self.uiWebUrl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change @rgbkrk. I am kind of newbee to python, would like to understand the significance of "," at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're allowed to have trailing commas in Python. It's purely aesthetic -- the primary benefit is for the next coder. Every line has the same format and adding entries to the end is the same as adding them to the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @rgbkrk. That make sense. The reason for asking is method repr which you have added don't have the "," at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the __repr__
declaration now has matching code style.
After this PR I'll be more than happy to do some representations for RDDs, Spark DataFrames, and the whole lot. Keep encouraging me and you'll have a happy contributor. 😄 |
python/pyspark/sql/session.py
Outdated
@@ -221,6 +221,9 @@ def __init__(self, sparkContext, jsparkSession=None): | |||
or SparkSession._instantiatedSession._sc._jsc is None: | |||
SparkSession._instantiatedSession = self | |||
|
|||
def _repr_html_(self): | |||
return self.sparkContext._repr_html_() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @felixcheung suggested I think it might make sense to include some extra Spark SQL specific things.
I think the catalog implementation type here (can be very useful for understanding why hive UDFs are not working) (which you can get from the session config with the spark.sql.catalogImplementation
key) and the current database (which you can get from the catalog object from the session). The catalog implementation is especially useful since we currently do a "fallback" from hive supported to non-hive supported and the user might not have noticed if they launched in Jupyter where the log messages are a bit more obscure -- something I've been meaning to work on in #17298 but I've gotten a bit distracted).
It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've set the catalogImplementation
here now. This seems like something we could put in the SparkContext HTML repr as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.
I don't know enough internals here to tell what URL(s) you'd want here. What properties or calls should I make for the SQL page?
Thanks for working on this @rgbkrk , I have a few minor suggestions for the SparkSession representation to be a bit clearer but it looks useful already :) |
I like it, I'm going to leave this a for a bit and see if anyone has any comments overnight :) |
This looks good to me, since we've just cut the 2.2 branch but not yet built an RC & this PR is already outstanding, I'm going to merge this into master & 2.2 so we can have some faster feedback on this (and maybe pique some other peoples interest). |
## What changes were proposed in this pull request? Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`. ## How was this patch tested? nteract: ![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png) Jupyter: ![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png) Hydrogen: ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png) Author: Kyle Kelley <[email protected]> Closes #17662 from rgbkrk/repr. (cherry picked from commit f654b39) Signed-off-by: Holden Karau <[email protected]>
## What changes were proposed in this pull request? Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`. ## How was this patch tested? nteract: ![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png) Jupyter: ![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png) Hydrogen: ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png) Author: Kyle Kelley <[email protected]> Closes apache#17662 from rgbkrk/repr.
What changes were proposed in this pull request?
Establishes a very minimal
_repr_html_
for PySpark'sSparkContext
.How was this patch tested?
nteract:
Jupyter:
Hydrogen: