[SPARK-20360][PYTHON] reprs for interpreters #17662

rgbkrk · 2017-04-17T21:10:49Z

What changes were proposed in this pull request?

Establishes a very minimal _repr_html_ for PySpark's SparkContext.

How was this patch tested?

nteract:

Jupyter:

Hydrogen:

holdenk · 2017-04-17T21:15:57Z

Thanks for working on this @rgbkrk, having better IJupyter/nteract support seems useful.
Jenkins, OK to test.

rgbkrk · 2017-04-17T21:27:42Z

I'm tempted to switch this to <p> tags for the header, based on how the hydrogen styling comes out by default. It then looks like this:

~~Should I be tearing the $ off the front of the version string that's handed back? What's canonical, what's good for the user?~~

rgbkrk · 2017-04-17T21:29:20Z

Nevermind what I just said, I was context switching between working on javascript and python -- I introduced the $ in the format string. Silly me.

felixcheung · 2017-04-17T21:41:24Z

cool, should we add something similar to SparkSession?

rgbkrk · 2017-04-17T21:41:42Z

Happy to! What would you like to see?

rgbkrk · 2017-04-17T21:43:18Z

Should it just defer to the session's spark context?

felixcheung · 2017-04-17T21:48:03Z

for starter, version and spark UI like this
I'm sure we could add appname, session catalog current database, or conf like spark.sql.catalogImplementation

ok, maybe not too many things :)

vundela · 2017-04-17T22:05:02Z

python/pyspark/context.py

+        </div>
+        """.format(
+            spark_version=self.version,
+            spark_ui_url=self.uiWebUrl,


Thanks for this change @rgbkrk. I am kind of newbee to python, would like to understand the significance of "," at the end.

You're allowed to have trailing commas in Python. It's purely aesthetic -- the primary benefit is for the next coder. Every line has the same format and adding entries to the end is the same as adding them to the middle.

Thanks for the explanation @rgbkrk. That make sense. The reason for asking is method repr which you have added don't have the "," at the end.

Thanks, the __repr__ declaration now has matching code style.

rgbkrk · 2017-04-17T22:42:49Z

Went ahead and put master and appName, since those seemed sensible for knowing if you were on yarn, local[*], etc.

rgbkrk · 2017-04-17T22:47:27Z

After this PR I'll be more than happy to do some representations for RDDs, Spark DataFrames, and the whole lot. Keep encouraging me and you'll have a happy contributor. 😄

holdenk · 2017-04-17T23:18:17Z

python/pyspark/sql/session.py

@@ -221,6 +221,9 @@ def __init__(self, sparkContext, jsparkSession=None):
                or SparkSession._instantiatedSession._sc._jsc is None:
            SparkSession._instantiatedSession = self

+    def _repr_html_(self):
+        return self.sparkContext._repr_html_()


As @felixcheung suggested I think it might make sense to include some extra Spark SQL specific things.

I think the catalog implementation type here (can be very useful for understanding why hive UDFs are not working) (which you can get from the session config with the spark.sql.catalogImplementation key) and the current database (which you can get from the catalog object from the session). The catalog implementation is especially useful since we currently do a "fallback" from hive supported to non-hive supported and the user might not have noticed if they launched in Jupyter where the log messages are a bit more obscure -- something I've been meaning to work on in #17298 but I've gotten a bit distracted).

It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.

Ok, I've set the catalogImplementation here now. This seems like something we could put in the SparkContext HTML repr as well.

It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.

I don't know enough internals here to tell what URL(s) you'd want here. What properties or calls should I make for the SQL page?

holdenk · 2017-04-17T23:19:57Z

Thanks for working on this @rgbkrk , I have a few minor suggestions for the SparkSession representation to be a bit clearer but it looks useful already :)

holdenk · 2017-04-18T03:23:37Z

I like it, I'm going to leave this a for a bit and see if anyone has any comments overnight :)

holdenk · 2017-04-18T19:34:19Z

This looks good to me, since we've just cut the 2.2 branch but not yet built an RC & this PR is already outstanding, I'm going to merge this into master & 2.2 so we can have some faster feedback on this (and maybe pique some other peoples interest).

## What changes were proposed in this pull request? Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`. ## How was this patch tested? nteract: ![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png) Jupyter: ![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png) Hydrogen: ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png) Author: Kyle Kelley <[email protected]> Closes #17662 from rgbkrk/repr. (cherry picked from commit f654b39) Signed-off-by: Holden Karau <[email protected]>

## What changes were proposed in this pull request? Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`. ## How was this patch tested? nteract: ![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png) Jupyter: ![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png) Hydrogen: ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png) Author: Kyle Kelley <[email protected]> Closes apache#17662 from rgbkrk/repr.

rgbkrk mentioned this pull request Apr 17, 2017

Improving Jupyter for Spark jupyter/jupyter#212

Closed

rgbkrk force-pushed the repr branch 2 times, most recently from 4feb079 to 25f0ce6 Compare April 17, 2017 21:37

rgbkrk force-pushed the repr branch from 06013a3 to d58ed65 Compare April 17, 2017 21:46

vundela reviewed Apr 17, 2017

View reviewed changes

holdenk reviewed Apr 17, 2017

View reviewed changes

rgbkrk added 4 commits April 18, 2017 09:09

[SPARK-20360][PYTHON] Create _repr_html_ for SparkContext

dfd4c3b

[SPARK-20360][PYTHON] REPResent SparkContext

1c60087

[SPARK-20360][PYTHON] Cleaner SparkContext HTML

a2acd97

[SPARK-20360][PYTHON] Provide catalogImplementation in output

75c9880

rgbkrk force-pushed the repr branch from d82bb3e to 75c9880 Compare April 18, 2017 16:10

rgbkrk changed the title ~~[SPARK-20360][PYTHON] Create _repr_html_ for SparkContext~~ [SPARK-20360][PYTHON] reprs for interpreters Apr 18, 2017

asfgit closed this in f654b39 Apr 18, 2017

rgbkrk deleted the repr branch April 18, 2017 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20360][PYTHON] reprs for interpreters #17662

[SPARK-20360][PYTHON] reprs for interpreters #17662

rgbkrk commented Apr 17, 2017 •

edited

Loading

holdenk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017 •

edited

Loading

rgbkrk commented Apr 17, 2017

felixcheung commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

felixcheung commented Apr 17, 2017

vundela Apr 17, 2017

rgbkrk Apr 17, 2017

vundela Apr 17, 2017

rgbkrk Apr 18, 2017

rgbkrk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

holdenk Apr 17, 2017 •

edited

Loading

rgbkrk Apr 18, 2017

rgbkrk Apr 18, 2017

holdenk commented Apr 17, 2017

holdenk commented Apr 18, 2017

holdenk commented Apr 18, 2017

[SPARK-20360][PYTHON] reprs for interpreters #17662

[SPARK-20360][PYTHON] reprs for interpreters #17662

Conversation

rgbkrk commented Apr 17, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

holdenk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017 • edited Loading

rgbkrk commented Apr 17, 2017

felixcheung commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

felixcheung commented Apr 17, 2017

vundela Apr 17, 2017

Choose a reason for hiding this comment

rgbkrk Apr 17, 2017

Choose a reason for hiding this comment

vundela Apr 17, 2017

Choose a reason for hiding this comment

rgbkrk Apr 18, 2017

Choose a reason for hiding this comment

rgbkrk commented Apr 17, 2017

rgbkrk commented Apr 17, 2017

holdenk Apr 17, 2017 • edited Loading

Choose a reason for hiding this comment

rgbkrk Apr 18, 2017

Choose a reason for hiding this comment

rgbkrk Apr 18, 2017

Choose a reason for hiding this comment

holdenk commented Apr 17, 2017

holdenk commented Apr 18, 2017

holdenk commented Apr 18, 2017

rgbkrk commented Apr 17, 2017 •

edited

Loading

rgbkrk commented Apr 17, 2017 •

edited

Loading

holdenk Apr 17, 2017 •

edited

Loading