Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

joshreuben456 · 2016-11-25T06:55:24Z

We use Jupyter notebooks to access BigTable data like so:

from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)

for key, row in table.scan:

(we then convert this in Pandas DataFrames)

In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?

Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?

The text was updated successfully, but these errors were encountered:

chmeyers · 2016-11-28T21:10:20Z

For Datalab/Dataproc integration, take a look at:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/datalab
This is not yet completely documented, but the engineering work is in place.

chmeyers added the enhancement label Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

joshreuben456 commented Nov 25, 2016

chmeyers commented Nov 28, 2016

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

Comments

joshreuben456 commented Nov 25, 2016

chmeyers commented Nov 28, 2016