Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2179] Improvement: Improve security when creating and dropping schemas and tables #2335

Merged
merged 2 commits into from
Jun 18, 2024

Conversation

zivali
Copy link
Contributor

@zivali zivali commented Feb 24, 2024

What changes were proposed in this pull request?

Improve security when creating and dropping schemas and tables.

This PR adds the following checks for identifier names using the capability framework

  • Regex check
    • As a best practice, it's generally advised to avoid including spaces in database names. In this PR, database names that include space will be considered illegal.
  • String length check, since SQL injection usually requires using longer string
    • Mysql: at most 64 characters
    • Postgresql: at most 63 characters

We refer to specifications of the earliest version of DB that gravitino currently supports:

Why are the changes needed?

Fix: #2179

Does this PR introduce any user-facing change?

Add name identifier checks before attempting to create or drop schemas and tables.

How was this patch tested?

Add IT tests.

@yuqi1129
Copy link
Contributor

@zivali
Can you also survey the table naming conventions besides the database name?

@zivali
Copy link
Contributor Author

zivali commented Feb 27, 2024

@zivali Can you also survey the table naming conventions besides the database name?

Absolutely! Do you mean we should apply similar checks in JdbcTableOperations (MysqlTableOperations / SqliteTableOperations / PostgreSqlTableOperations)?

p.s. I modified the PR per comments. Thanks for help reviewing!

@simarjeetss
Copy link

hey @justinmclean, is this improvement done? If not, may I look into this?

@justinmclean
Copy link
Member

@simarjeetss IThanks for offering to help, but as this is almost complete, I woudl suggest you find another issue to work on.

@simarjeetss
Copy link

@simarjeetss IThanks for offering to help, but as this is almost complete, I woudl suggest you find another issue to work on.

Sure, thanks for letting me know!

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 7, 2024

@jerryshao
Is this related to the issue #1652? If so, should we temporarily merge it?

@justinmclean
Copy link
Member

I would say it is not related to #1652. Might consider merging this as it is an overall improvement to the existing code?

@justinmclean
Copy link
Member

@zivali Are you still interested in working on this? No issue if you can't, it would just be good to know.

@zivali
Copy link
Contributor Author

zivali commented Apr 14, 2024

@justinmclean Hi, I am still working on it. Will update here when finished. Thank you!

@mchades
Copy link
Contributor

mchades commented Apr 14, 2024

I'm introducing a catalog capability framework(#2819), I think this should be one of the catalog capabilities - name specification capability.

If you want, I think you can continue to work based on #2819 or after it is merged

@zivali
Copy link
Contributor Author

zivali commented Apr 15, 2024

@mchades Got it. Will integrate this PR to #2819 Thanks for the info!

// dollar signs
// \w matches [a-zA-Z0-9_]
// \p{L} matches any kind of letter from any language
if (!databaseName.matches("^[\\w\\p{L}$]*$")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a rule from experience? Or is it an official MySQL constraint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just saw the description in your PR, it's a limitation of MySQL official.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the most basic permitted characters in unquoted identifiers for MySQL. I think most DB users uses only these characters. But other characters can also be used with quoted identifiers. Do you think we should include those characters as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with stricter restrictions and gradually relax them as needed.

@mchades
Copy link
Contributor

mchades commented Apr 29, 2024

The name specification capability of catalog is now available, you can rebase the main branch and proceed with the PR

@zivali zivali marked this pull request as draft June 4, 2024 15:44
@zivali zivali force-pushed the dev branch 2 times, most recently from 635678b to 488b8e2 Compare June 15, 2024 07:13
@zivali zivali changed the title [#2179] Improvement: Potential SQL injection point in generateDropDat… [#2179] Improvement: Improve security when creating and dropping schemas and tables Jun 15, 2024
@zivali zivali closed this Jun 15, 2024
@zivali zivali reopened this Jun 15, 2024
@zivali zivali closed this Jun 16, 2024
@zivali zivali reopened this Jun 16, 2024
@zivali zivali closed this Jun 16, 2024
@zivali zivali reopened this Jun 16, 2024
@zivali zivali marked this pull request as ready for review June 17, 2024 11:13
@zivali
Copy link
Contributor Author

zivali commented Jun 17, 2024

@mchades @justinmclean Could you help review again when you have time? Thank you!

@mchades mchades self-requested a review June 17, 2024 14:24
// The constraints of the name spec may be more strict than underlying catalog,
// and for compatibility reasons, we only apply case-sensitive capabilities here.
return dispatcher.dropTable(applyCaseSensitive(ident, Capability.Scope.TABLE, dispatcher));
return dispatcher.dropTable(normalizeNameIdentifier(ident));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after these changes, can I drop a Mysql table named a&b by Gravitino?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as the current regex pattern in this PR does not try to match the character &.
Should we add & to the regex pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just my assumption.

We can gradually relax the strategy after encountering real needs, or make this rule configurable when the demand is frequent.

Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contributions!

@mchades mchades merged commit 8e7293e into apache:main Jun 18, 2024
33 checks passed
shaofengshi pushed a commit to shaofengshi/gravitino that referenced this pull request Jun 24, 2024
…g schemas and tables (apache#2335)

### What changes were proposed in this pull request?

Improve security when creating and dropping schemas and tables.

This PR adds the following checks for identifier names using the
capability framework
- Regex check
- As a best practice, it's generally advised to avoid including spaces
in database names. In this PR, database names that include space will be
considered illegal.
- String length check, since SQL injection usually requires using longer
string
    - Mysql: at most 64 characters
    - Postgresql: at most 63 characters

We refer to specifications of the earliest version of DB that gravitino
currently supports:
- Postgresql identifier rules:
https://www.postgresql.org/docs/12/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
- Mysql identifier naming:
https://dev.mysql.com/doc/refman/5.7/en/identifiers.html
- Mysql identifier length limit:
https://dev.mysql.com/doc/refman/5.7/en/identifier-length.html
### Why are the changes needed?

Fix: apache#2179 

### Does this PR introduce _any_ user-facing change?
Add name identifier checks before attempting to create or drop schemas
and tables.

### How was this patch tested?
Add IT tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Potential SQL injection point in generateDropDatabaseSql
5 participants