Skip to content

Commit

Permalink
[#596] feat(doc): Add doc for Hadoop access (#602)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
 - add `How to access Hadoop`
 - add runtime environment set up doc
 - Revised some content

### Why are the changes needed?
The workaround of cannot specify the Hadoop username

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
local test
  • Loading branch information
mchades authored Oct 26, 2023
1 parent 972dec9 commit f813a52
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 4 deletions.
2 changes: 1 addition & 1 deletion conf/gravitino-env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@
# export GRAVITINO_HOME
# export GRAVITINO_CONF_DIR
# export GRAVITINO_LOG_DIR # Where log files are stored. PWD by default.
# export GRAVITINO_MEM # Gravitino jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
# export GRAVITINO_MEM # Gravitino jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
12 changes: 12 additions & 0 deletions docs/gravitino-server-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,15 @@ The following table lists the configuration items in the `gravitino.conf` file.
|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|---------------|
| `gravitino.catalog.cache.evictionIntervalMs` | The interval in milliseconds to evict the catalog cache, default 3600000ms(1h) | `3600000` | 0.1.0 |
| `gravitino.catalog.classloader.isolated` | Whether to use an isolated classloader for catalog, if it's true, all catalog-related libraries and configurations will be loaded by an isolated classloader NOT by AppClassLoader. Default value is `true` | `true` | 0.1.0 |

## How to set up runtime environment variables

Gravitino server also supports setting up runtime environment variables by editing the `gravitino-env.sh` file, which is located in the `conf` directory.

### How to access Hadoop

Currently, due to the absence of a comprehensive user permission system, Gravitino can only use a single username for
Hadoop access. Please ensure that the user starting the Gravitino server has Hadoop (HDFS, YARN, etc.) access
permissions; otherwise, you may encounter a `Permission denied` error. There are also several ways to resolve this error:
* Granting the Gravitino startup user permissions in Hadoop
* Specify the authorized Hadoop username in the environment variables `HADOOP_USER_NAME` before starting the Gravitino server.
4 changes: 3 additions & 1 deletion docs/iceberg-rest-service.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "How to setup Gravitino Iceberg REST server"
title: "How to set up Gravitino Iceberg REST server"
date: 2023-10-18T09:03:20-08:00
license: "Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2."
Expand All @@ -12,6 +12,8 @@ Gravitino Iceberg REST Server follows the [Iceberg REST API specification](https
* Support Iceberg REST API defined in Iceberg 1.3.1, support all namespace&table interface. `Token`, `ReportMetrics` and `Config` interface are not supported yet.
* Worked as a catalog proxy, supports HiveCatalog and JdbcCatalog for now.
* Build with Iceberg `1.3.1`, which means the Iceberg table format version is `1` by default.
* When writing to HDFS, the Gravitino Iceberg REST server can only operate as the specified HDFS user when startup and
does not yet support proxying to other HDFS users. see *How to access Hadoop* in document *How to customize Gravitino server configurations* gravitino-server-config for more details.

## How to start the Gravitino Iceberg REST server

Expand Down
4 changes: 2 additions & 2 deletions docs/integration-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Before running the tests, make sure Docker is installed.
Additionally, the Gravitino Server and third-party data source Docker runtime environments will use certain ports. Ensure that these ports are not already in use:

- Gravitino Server: Port `8090`
- Hive Docker runtime environment: Ports are `22`, `7180`, `8088`, `8888`, `9000`, `9083`, `10000`, `10002`, `50070`, and `50075`
- Hive Docker runtime environment: Ports are `22`, `7180`, `8088`, `8888`, `9000`, `9083`, `10000`, `10002`, `50010`, `50070`, and `50075`

## Debugging Gravitino Server and Integration Tests

Expand All @@ -118,7 +118,7 @@ To debug the Gravitino Server and integration tests, you have two modes: `embedd
- If you only debug integration test codes, You don't have to do any setup to debug directly
- If you need to debug Gravitino server codes, follow these steps:
- Enable the `GRAVITINO_DEBUG_OPTS` environment variable in the `distribution/package/conf/gravitino-env.sh` file to enable remote JVM debugging
- Manually start the Gravitino Server using the `./distribution/package/bin/gravitino-server.sh start` command
- Manually start the Gravitino Server using the `./distribution/package/bin/gravitino.sh start` command
- Select `gravitino.server.main` module classpath in the `Remote JVM Debug` to attach the Gravitino Server process and debug it

## Running on GitHub Actions
Expand Down

0 comments on commit f813a52

Please sign in to comment.