Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4467] docs: Refactor the security document and add access control page #4496

Merged
merged 12 commits into from
Aug 15, 2024

Conversation

jerqi
Copy link
Contributor

@jerqi jerqi commented Aug 13, 2024

What changes were proposed in this pull request?

Refactor the security document and add access control page.

Why are the changes needed?

Fix: #4467

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Just docs.

@jerqi jerqi changed the title [#4467] docs: Refactor the security document and add access control [#4467] docs: Refactor the security document and add access control page Aug 13, 2024

Catalogs are under the metalake. Catalogs represent different kinds of data sources.

Gravitino supports Hive, Iceberg, MySQL, PostgreSQL, Hadoop, and Kafka catalogs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't have to mention the supported catalogs here, we will add more catalogs, so it needs to be updated continuously. Removing this sentence, so we don't have to update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I remove this.

Comment on lines 61 to 63
Schemas are under the catalog.

There are tables, topics, or filesets under the schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you write an image about the hierarchical structure of metadata objects here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment on lines 193 to 195
`deny` condition is prior to `allow` condition. If a role has the `allow` condition and `deny` condition at the same time.

The user won't be able to use the privilege.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't separate into two paragraphs, the sentence is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will the blank line.

-H "Content-Type: application/json" -d '{
roleNames: ["role1"]
}'http://localhost:8090/api/metalakes/test/permissions/users/user1/revoke

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this empty line in here and other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@jerryshao
Copy link
Contributor

Can you please add a chapter about security in the index.md.

@jerqi
Copy link
Contributor Author

jerqi commented Aug 13, 2024

Can you please add a chapter about security in the index.md.

I updated the index.

Comment on lines 182 to 184
For example, if you give a use that `SELECT_TABLE` privilege on a catalog, then that the user

will be able to select(read) all tables in that catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one-line sentence will break into two lines according to your current way of writing markdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I checked other places, too. Fix them by the way.


If parent securable object has the same privilege name with different condition, the parent securable privilege will still take effect.

For example, securable metalake object allows to use the catalog, but securable catalog denys to use the catalog, the user isn't able to use the catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

denies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix.


Simple mode is the default authentication option of the server.

For the client side, if it doesn't set the authentication explicitly, it will use anonymous to access the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about this, if user doesn't set authentication, it will be simple mode, the user is anonymous or from GRAVITINO_USER ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anonymous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if users do not set the authentication explicitly, they will use the simple mode to access the Gravitino server and the corresponding user name is anonymous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic is too odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many systems have similar logic. If user doesn't enable authentication, the user uses anonymous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but you said the default authentication is simple in which the user is not anonymous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If request don't have credential, simple mode think it's anonymous.


:::info

Gravitino only supports authorization and doesn't support metadata authentication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this more clearly?

Copy link
Contributor Author

@jerqi jerqi Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know to how to make it clearer. Do you have some suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doesn't support metadata authentication"

Why do we mention authentication in the access control chapter?

Furthermore, I believe you need to add more words to make the sentence more natural. for example

Gravitino only supports authorization for secureable objects, when it comes to authentication. Gravitino doesn't support metadata authentication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is metadata authentication ? and As I know Gravitino doesn't support authorization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authorization means that write privileges to underlying system.
medata authentication means that check privileges about metadata, if has privileges to allow the operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add this to the document to clarify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

docs/index.md Show resolved Hide resolved

:::info

Gravitino only supports authorization and doesn't support metadata authentication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doesn't support metadata authentication"

Why do we mention authentication in the access control chapter?

Furthermore, I believe you need to add more words to make the sentence more natural. for example

Gravitino only supports authorization for secureable objects, when it comes to authentication. Gravitino doesn't support metadata authentication.

`COLUMN`, `FILESET`, `TOPIC`, `COLUMN`, `ROLE`, `METALAKE`. A metadata object is combined by a `type` and a
comma-separated `name`. For example, a `CATAGLOG` object has a name "catalog1" with type
"CATALOG", a `SCHEMA` object has a name "catalog1.schema1" with type "SCHEMA", a `TABLE`
object has a name "catalog1.schema1.table1" with type "TABLE".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of metalake metadata objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Every securable object resides within a logical container in a hierarchy of containers.

The top container is the metalake. You can understand that metalake a customer organization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the markdown file here. I don't think it will look very well.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put them into paragraphs, currently each sentence is a paragraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel every sentence is a paragraph will be more clear. It they are a paragraph. The sentence is too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged them.


The relationship of the concepts is as below.

![user_group_relationshi_image](../assets/user-group.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's GroupMappingService?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, please refine the picture like

Role Table1 reviewer -> Role: Table1 reviewer
Role Fileset 3 scientist -> Role: Fileset3 scientist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may need to add the following contents: Securable objects consist of metadata object and a set of privileges for the securable objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's GroupMappingService?

I will changed it to external user system.


### Add a user

The external systems like LDAP, Scim and etc manage the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Scim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a popular user system.


## Authentication

Apache Gravitino supports three kinds of authentication mechanisms: simple,OAuth and Kerberos.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before OAuth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Simple mode is the default authentication option of the server.

For the client side, if it doesn't set the authentication explicitly, it will use anonymous to access the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if users do not set the authentication explicitly, they will use the simple mode to access the Gravitino server and the corresponding user name is anonymous.


For the client side, if it doesn't set the authentication explicitly, it will use anonymous to access the server.

If the client sets the simple mode, it will use the environment variable `GRAVITINO_USER` as the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRAVITINO_USER in the client-server....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support setting a custom user name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Hadoop has the similar behaviour.


If the client sets the simple mode, it will use the environment variable `GRAVITINO_USER` as the user.

If the environment variable `GRAVITINO_USER` isn't set, the client uses the user of the machine that sends requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of user of the machine? Is the user currently logged into the client machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@jerqi jerqi requested review from FANNG1 and qqqttt123 August 15, 2024 05:57

:::info

Gravitino only supports authorization for secureable objects, when it comes to authentication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of "when it comes to authentication", this sentence is hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to express: the second sentence Gravitino doesn't support metadata authentication. is about authentication because the first sentence is just about authorization. The sentences in this passage are unnaturally connected.


A metadata object to which access can be granted. Unless allowed by a grant, access is denied.
Every securable object resides within a logical container in a hierarchy of containers.
The top container is the metalake. You can understand that metalake a customer organization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of "You can understand"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


The relationship of the concepts is as below.

![user_group_relationshi_image](../assets/user-group.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"relationship"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

|-------------|---------------------------|---------------------|
| ManageUsers | Metalake | Add or remove users |


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove one blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


:::


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this one blank line here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

</Tabs>

## Group Operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this one blank line here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

6. You can refer to the [Configurations](gravitino-server-config.md) and append the configurations to the conf/gravitino.conf.

```text
gravitino.authenticator = oauth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this key has been replaced with gravitino.authenticators, Am I correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't merge this document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get your word We don't merge this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4489 This pull request.


Gravitino supports Kerberos mode.

For the server side, users should set `gravitino.authenticator` as `kerberos` and give
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


First, users need to guarantee that the external correctly configured OAuth 2.0 server supports Bearer JWT.

Then, on the server side, users should set `gravitino.authenticator` as `oauth` and give
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


For the client side, if it doesn't set the authentication explicitly, it will use anonymous to access the server.

If the client sets the simple mode, it will use the environment variable `GRAVITINO_USER` as the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support setting a custom user name?


If parent securable object has the same privilege name with different condition, the securable object won't override the parent object privilege.
For example, securable metalake object allows to use the catalog, but securable catalog denies to use the catalog, the user isn't able to use the catalog.
If securable metalake object denies to use the catalog, but securable catalog allows to use the catalog, the user isn't able to use the catalog, too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog, too -> catalog too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


A metadata object to which access can be granted. Unless allowed by a grant, access is denied.
Every securable object resides within a logical container in a hierarchy of containers.
The top container is the metalake. You can understand that metalake a customer organization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can view metalake as a xxxxx.

The word understand is not elegant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this sentence.

@jerqi jerqi requested review from yuqi1129 and FANNG1 August 15, 2024 07:28

### User

A user identity recognized by Gravitino. External user system instead of Gravitino manages users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excessive empty space before External.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

`COLUMN`, `FILESET`, `TOPIC`, `COLUMN`, `ROLE`, `METALAKE`. A metadata object is combined by a `type` and a
comma-separated `name`. For example, a `CATAGLOG` object has a name "catalog1" with type
"CATALOG", a `SCHEMA` object has a name "catalog1.schema1" with type "SCHEMA", a `TABLE`
object has a name "catalog1.schema1.table1" with type "TABLE". `METALAKE` object has a name "metalake1".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a METALAKE object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Every metadata object has an owner. The owner could be a user or group.
The owner have all the privileges of the metadata object.
The owner could be transferred to another user or group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please optimize these three sentences that start with The owner. It's too tedious to have the same starting points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@FANNG1
Copy link
Contributor

FANNG1 commented Aug 15, 2024

LGTM

@jerqi jerqi requested a review from yuqi1129 August 15, 2024 08:43
Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerryshao jerryshao merged commit 4e7b846 into apache:main Aug 15, 2024
14 checks passed
jerqi added a commit to qqqttt123/gravitino that referenced this pull request Aug 15, 2024
…trol page (apache#4496)

### What changes were proposed in this pull request?

Refactor the security document and add access control  page.

### Why are the changes needed?

Fix: apache#4467

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Just docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Add the document about access control
5 participants