title	slug	date	keyword	license
How to use Apache Gravitino Python client	/how-to-use-gravitino-python-client	2024-05-09	Gravitino Python client	This software is licensed under the Apache License version 2.

Apache Gravitino Python client

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions, also provides users the unified metadata access for data and AI assets.

Gravitino Python client helps data scientists easily manage metadata using Python language.

Use Guidance

You can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.

First of all, You must have a Gravitino server set up and run, You can refer document of How to install Gravitino to build Gravitino server from source code and install it in your local.

Apache Gravitino Python client API

pip install apache-gravitino

Manage metalake using Gravitino Python API
Manage fileset metadata using Gravitino Python API

Apache Gravitino Fileset Example

We offer a playground environment to help you quickly understand how to use Gravitino Python client to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the document How to use the playground to launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.

Waiting for the playground Docker environment to start, you can directly open http://localhost:8888/lab/tree/gravitino-fileset-example.ipynb in the browser and run the example.

The gravitino-fileset-example contains the following code snippets:

Install HDFS Python client.
Create a HDFS client to connect HDFS and to do some test operations.
Install Gravitino Python client.
Initialize Gravitino admin client and create a Gravitino metalake.
Initialize Gravitino client and list metalakes.
Create a Gravitino Catalog and special type is Catalog.Type.FILESET and provider is hadoop
Create a Gravitino Schema with the location pointed to a HDFS path, and use hdfs client to check if the schema location is successfully created in HDFS.
Create a Fileset with type is Fileset.Type.MANAGED, use hdfs client to check if the fileset location was successfully created in HDFS.
Drop this Fileset.Type.MANAGED type fileset and check if the fileset location was successfully deleted in HDFS.
Create a Fileset with type is Fileset.Type.EXTERNAL and location pointed to exist HDFS path
Drop this Fileset.Type.EXTERNAL type fileset and check if the fileset location was not deleted in HDFS.

How to development Apache Gravitino Python Client

You can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.

Prerequisites

Python 3.8+
Refer to How to build Gravitino to have necessary build environment ready for building.

Build and testing

Clone the Gravitino project.

git clone git@github.com:apache/gravitino.git

Build the Gravitino Python client module
```
./gradlew :clients:client-python:build
```

Run unit tests

./gradlew :clients:client-python:test -PskipITs

Run integration tests

Because Python client connects to Gravitino Server to run integration tests, So it runs ./gradlew compileDistribution -x test command automatically to compile the Gravitino project in the distribution directory. When you run integration tests via Gradle command or IDE, Gravitino integration test framework (integration_test_env.py) will start and stop Gravitino server automatically.
```
./gradlew :clients:client-python:test
```

Distribute the Gravitino Python client module

./gradlew :clients:client-python:distribution

Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/
```
./gradlew :clients:client-python:deploy
```

Resources

Official website https://gravitino.apache.org/
Project home on GitHub: https://github.com/apache/gravitino/
Playground with Docker: https://github.com/apache/gravitino-playground
User documentation: https://datastrato.ai/docs/
Videos on Youtube: https://www.youtube.com/@Datastrato
Slack Community: https://the-asf.slack.com#gravitino

License

Gravitino is under the Apache License Version 2.0, See the LICENSE for the details.

ASF Incubator disclaimer

Apache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how-to-use-python-client.md

how-to-use-python-client.md

Apache Gravitino Python client

Use Guidance

Apache Gravitino Python client API

Apache Gravitino Fileset Example

How to development Apache Gravitino Python Client

Prerequisites

Build and testing

Resources

License

ASF Incubator disclaimer

Files

how-to-use-python-client.md

Latest commit

History

how-to-use-python-client.md

File metadata and controls

Apache Gravitino Python client

Use Guidance

Apache Gravitino Python client API

Apache Gravitino Fileset Example

How to development Apache Gravitino Python Client

Prerequisites

Build and testing

Resources

License

ASF Incubator disclaimer