Skip to content

0.7.4 OrientDB in Jupyter

Daniel Smith edited this page Jan 20, 2017 · 1 revision

#TAP 0.7.4 Using OrientDB with Jupyter

Starting with TAP 0.7.4, you can access OrientDB from a Jupyter Notebook. This page shows you how to do that.

##Connecting OrientDB with Jupyter

  1. Create an instance of OrientDB in TAP via Services >Marketplace. See Creating a service instance.

  2. After the OrientDB instance has been created, download the OrientDB keys to use in the Jupyter notebook. Here are the steps to do that:

    a. In the TAP Console, navigate to Services >Instances.

    OrientDB in Jupyter Screen 1

    b. Locate your OrientDB instance and click on it. TAP displays Create Key.

    OrientDB in Jupyter Screen 2

    OrientDB in Jupyter Screen 3

    c. click Create Key, enter a name for your key, and click the Add button.

    OrientDB in Jupyter Screen 4

    d. The name is displayed along with an + Add to exports option. Click + Add to exports to add these OrientDB keys to the export queue for the services on the page.

    OrientDB in Jupyter Screen 5

    e. Scroll up and click the Export keys button at the top right of the screen to export the keys.

    OrientDB in Jupyter Screen 6

    The Export keys button exports the all the keys in the export queue. Typically, these are the keys for just one service. If you need to export keys for multiple services, however, you can create the keys and add them to the export queue for all the desired services first, then click the Export keys button.

    f. Scroll down to see the OrientDB keys in JSON format. Click the Download JSON file button to download the keys as a JSON file, so you can copy/paste them into your Jupyter notebook.

    OrientDB in Jupyter Screen 7

##Export/Import a graph to/from OrientDB database

These steps assume you have already exported your OrientDB instance keys, as previously described. They also assume you have already created a Jupyter notebook.

See Creating a Jupyter Notebook Instance if you are new to working with Jupyter notebooks.

  1. Import the spark-tk library to your Jupyter notebook and establish a spark-tk context.

import sparktk as tk
tc= tk.TkContext()
```

  1. The create_OrientDB_conf API creates a connection to the OrientDB container. Copy the required settings from the previously downloaded OrientDB instance keys (connection settings) for use with this API.

hostname = "localhost" portnumber = "xxxx" root_password = "rxkp094rbtvbli6d" orient_conf = tc.graph.create_orientdb_conf(hostname,portnumber,"admin","admin",root_password) orient_conf ```

  1. The export_to_orientdb API creates an OrientDB database with the specified name and exports the spark-tk graph to the database. The API returns summary statistics for the exported data. You can see this in the example that follows.

  2. The import_from_orientdb API imports the graph from the given OrientDB database name to the spark-tk graph. You can see this in the example that follows.

#Example

The code snippets here show a graph being exported to the OrientDB database, then imported back.

For the following graph dataset:

v = tc.frame.create([("a", "Alice", 34,"female"),
                     ("b", "Bob", 36,"male"),
                     ("c", "Charlie", 30,"male"),
                     ("d", "David", 29,"male"),
                     ("e", "Esther", 32,"female"),
                     ("f", "Fanny", 36,"female")], ["id", "name", "age","gender"])
e = tc.frame.create([("a", "b", "friend"),
                     ("b", "c", "follow"),
                     ("c", "b", "follow"),
                     ("f", "c", "follow"),
                     ("e", "f", "follow"),
                     ("e", "d", "friend"),
                     ("d", "a", "friend"),
                     ("a", "e", "friend")], ["src", "dst", "relationship"])

Create a spark-tk graph:

graph.graphframe.vertices.show()
graph.graphframe.edges.show()

Export the graph to the OrientDB database:

graph.export_to_orientdb(orient_conf,
                         db_name = "Demo" ,
                         vertex_type_column_name="gender",
                         edge_type_column_name="relationship",
                         batch_size=1000,
                         db_properties=({"db.validation":"false"}))

Import the data back from OrientDB to the spark-tk graph:

orient_graph = tc.graph.import_orientdb_graph(orient_conf,
                                              db_name="Demo",
                                              db_properties=({"db.validation":"false"}))

The db_properties parameter is an optional parameter to configure the OrientDB database default settings. For more information on OrientDB database properties options, see http://orientdb.com/docs/2.1/Configuration.html

To access the OrientDB studio from TAP, see https://github.com/trustedanalytics/platform-wiki-0.7/wiki/OrientDB#accessing-the-orientdb-dashboard

Clone this wiki locally