Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Anonymize sensitive data in queries exposed to RestSqlAction logs #419

Merged
merged 13 commits into from
Apr 10, 2020

Conversation

chloe-zh
Copy link
Member

@chloe-zh chloe-zh commented Apr 9, 2020

Issue #, if available:

Description of changes:

  • Created a rewriting sensitive data rule
  • Created a sensitive data anonymizer
  • Changed the log message for the incoming request to a request with sensitive data gone
  • Added UT

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@penghuo
Copy link
Contributor

penghuo commented Apr 9, 2020

  1. Do we need the setting to control this feature?
  2. How this solution could be used in other place when logging is needed?

@chloe-zh
Copy link
Member Author

chloe-zh commented Apr 9, 2020

1. Do we need the setting to control this feature?

2. How this solution could be used in other place when logging is needed?

Good point!

  1. We can create setting to switch this mask on&off if necessary. We need more discussion over that.
  2. This PR is to remove the query information but remain the query pattern in logs. Thus the users/developers can still see the incoming requests from es logs but the exact query would not be exposed to the remote host for information security. We can set options to turn it off as you mentioned if logging with full queries is needed.

@abbashus
Copy link
Contributor

abbashus commented Apr 9, 2020

Open to discussion:

I assume more work, but instead of masking, why not anonymize the identifiers and literals. This way we will not loose the semantics.

Copy link
Member

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment. Thanks for your fix!

@chloe-zh chloe-zh changed the title Mask sensitive data in queries exposed to RestSqlAction logs Anonymize sensitive data in queries exposed to RestSqlAction logs Apr 10, 2020
Copy link
Contributor

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@abbashus abbashus merged commit 45fa29c into opendistro-for-elasticsearch:master Apr 10, 2020
chloe-zh added a commit to chloe-zh/sql that referenced this pull request Apr 29, 2020
…endistro-for-elasticsearch#419)

* remove sensitive data from queries for logging

* Added rule to mask sensitive data from es logs

* Applied API in SQLUtils to rebuild query string from AST; replace data masks with anonymous words

* Inlined log message; added doc for new rule
penghuo pushed a commit to penghuo/sql that referenced this pull request May 2, 2020
…endistro-for-elasticsearch#419)

* remove sensitive data from queries for logging

* Added rule to mask sensitive data from es logs

* Applied API in SQLUtils to rebuild query string from AST; replace data masks with anonymous words

* Inlined log message; added doc for new rule

(cherry picked from commit 45fa29c)
penghuo added a commit that referenced this pull request May 5, 2020
* Update the opendistro sql 1.4.0 release notes (#359)


(cherry picked from commit d9fe9dc)

* adding DATETIME cast support (#310)

Adding full support for CAST(), and adding it as a function. Fixed Datetime casting to be UTC-timezone default. 

(cherry picked from commit 68b971f)

* Documentation for simple query (#366)

* Add doc test for simple query

* Add to index

* Fix format

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add documentation

* Add rdd

* Add rdd

* Add rdd

* Add rdd

* Fix order by example

* Add more diagrams

* Add more diagrams

* Skip doctest in gradlew build

(cherry picked from commit 57b379c)

* Return Correct Type Information for Fields  (#365)

Return correct type information for fields in JDBC format and changed the return type of trig functions to return DOUBLE. 

(cherry picked from commit 9769c30)

* Report date data as a standardized format (#367)

* expose date fields using a standardized date format

* move date format logic to new class

* add tests for DateTimeFormatter

* added some negative tests

* style fixes

* testing locale fixes

* addressed code review comments and build failures

* post-merge fix

* additional fixes

* get CAST alias info from result set class, rather than cluster state

* remove unused import

* remove unused import

* reduce duplication & reference enum value

* setting default timezone to UTC while parsing date values

* add support for custom & multiple formats

* add case for Kibana flights date data with T but no time field

(cherry picked from commit 7c57d72)

* Integration test with external ES cluster (#374)

* Fix build issue by migrating to RestTestCase

* Fix query IT

* Fix subquery IT

* Fix csv formatter and SQL functions IT

* Fix aggregation and delete IT

* Fix date IT

* Fix jdbc IT

* Fix show and metadata IT

* Fix subquery IT

* Fix date functions UT

* Fix test data setup and cleanup issue

* Fix correctness IT

* Fix doctest IT

* Add JavaDoc on base class

* Address PR comments

* Address PR comments

(cherry picked from commit 56a2f44)

* Bug fix, return object type for field which has implicit object datatype when describe the table (#377)

* Bug fix, return object type for field which has implicit object datatype when describe the table

(cherry picked from commit c30e342)

* Pagination doc (#379)

* Add design details in doc

* Added images for pagination doc

* Add image links and fix formatting

* Add cursor setting details

* Add Salient Points section in Detailed Design section

(cherry picked from commit ac8a020)

* Handle the elasticsearch exceptions in JDBC formatted outputs (#362)

* Caught ES exception

* Added details in errMsgs to enrich the behavior; added IT

* Handled cases where ES exceptions are wrapped up; added default fetching details method

* Added factory method to construct ErrorMessage; extended exception type for ErrorMessage

* Added UT for ErrorMessageFactory

* addressed comments

(cherry picked from commit 71aba38)

* Modified the wording of exception messages and created the troubleshooting page (#372)

* Edit the wording of exception messages

* Created troubleshooting page

* Revised troubleshooting page

* addressed comments

(cherry picked from commit a4afc75)

* Sql CI/CD (#384)

* Create CD.yml

* Create CI.yml

(cherry picked from commit fa1d4a0)

* FIX: field function name letter case preserved in select with group by (#381)

* FIX: Method field name

* REF: parser logic

* RMV: remove unused function

* STY: unused imports

* STY: unused import

(cherry picked from commit ad8ad3a)

* Fix broken LICENSE link in README.md (#394)

* Fix broken License link

* Reverting NOTICE name change

(cherry picked from commit 62e36c9)

* New SQL cluster settings endpoint (#400)

* Add new _sql/settings endpoint, and logic to only affect opendistro.sql settings

* Add integration tests

* Change endpoint HTTP method to PUT

* Update Settings doc

(cherry picked from commit f1d538f)

* Bug Fix, add support for strict_date_optional_time (#412)


(cherry picked from commit 9ed430f)

* Invalidate HTTP GET method (#414)

* Removed http GET method; added integTest

* Removed GET method in doc and doctest

(cherry picked from commit 56464f0)

* More docs in reference manual and add architecture doc (#417)

* Initial commit for PartiQL and complex query docs

* Initial commit for PartiQL and complex query docs

* Initial commit for PartiQL and complex query docs

* Ignore multi-query for now because of bug

* Add test index mappings

* Fix partiql doc

* Bypass LEFT JOIN for now

* Add doc for JOINs

* Add doc for JOINs

* Add doc for JOINs

* Add more cases for PartiQL

* Add more cases for PartiQL

* Add multi-line support

* Add multi-line support

* Multi-line all complex queries

* Multi-line all complex queries

* Add doc for full text search

* Add doc for full text search

* Add doc for metadata query

* Add doc for multi match query

* Add doc for delete statement

* Remove join explain

* Add RDD

* Add RDD

* Print test data for PartiQL

* Print test data for PartiQL

* Change titles

* Add architecture doc

* Add docs for SQL functions

* Update index

* Update docs

* Update docs

* Update docs

* Address PR comments

(cherry picked from commit 6457b0b)

* Bug fix, support subquery in from doesn't have alias (#418)


(cherry picked from commit 5e5f485)

* Anonymize sensitive data in queries exposed to RestSqlAction logs (#419)

* remove sensitive data from queries for logging

* Added rule to mask sensitive data from es logs

* Applied API in SQLUtils to rebuild query string from AST; replace data masks with anonymous words

* Inlined log message; added doc for new rule

(cherry picked from commit 45fa29c)

* Bug fix, ignore the term query rewrite if there is no index found (#425)

* Bug fix, ignore the term query rewrite if there is no index found

* move IsEqualIgnoreCaseAndWhiteSpace to MatcherUtils

(cherry picked from commit 2bfe326)

* Simple Query Cursor support (#390)

* Add  integration tests to be passed

* Add cluster settings for cursor - enabled, fetch_size, keep_alive

* Add fetch_size and cursor params. fetch_size validation

* new SqlRequest constructor for cursor

* Add logic to open scroll based on settings, fetch_size and limit values

* Add cursor close endpoint

* Remove date formatting changes

* Fix unit and integ tests, Ignored date format tests for a while, synced previous cursor changes

* Add cursor generation

* Add test helper methods

* Cursor close API

* Remove commented code and add partial date formatting change

* Add error metrics when not able to close cursor

* Add indexname and fieldAliasMap to cursor context

* Remove ignored test cases affected by date formatting changes

* Remove unneeded interface, refactor CursorType enum

* Remove logs, unneeded fields, comments, refactor

* Disable cursor by default

* Fix cursor for parameterized request, add integration test for same

* LIMIT changes

* Changes to handle different LIMIT cases

* Add default cursor metrics

* Add integration test on explain cursor

* Update monitoring, settings and endpoint docs

* Refactor cursor classes to separate package

* Add Lombok for DefaultCursor

* Add unit test for DefaultCursor

* Update doc

* Unit tests, bug fix , refactoring

(cherry picked from commit bf4810d)

* [BugFix] Enforce AVG to return double data type (#437)

* Enforce AVG to return double. Update UT, add IT.

(cherry picked from commit 5870bbf)

* Bug fix, count(distinct field) should transalte to cardinality aggregation (#442)


(cherry picked from commit b3df480)

* Fix CSV injection issue (#447)

* Sanitize to avoid CSV injection

* Add IT

(cherry picked from commit 78d4158)

* [BugFix] mock LocalClusterState settings in QueryPlanner base class (#446)


(cherry picked from commit 6525dca)

* Escape comma for CSV header and all queries (#456)


(cherry picked from commit ced1cd5)

* Bug Fix, support using aggregation function in order by clause (#452)

* Bug Fix, support using aggregation function in order by clause

* address comments

(cherry picked from commit 30b76ce)

* Remove CI and change release workflow based on tags (#457)

(cherry picked from commit 9758598)

Co-authored-by: David Cui <[email protected]>
Co-authored-by: Chen Dai <[email protected]>
Co-authored-by: Jordan Wilson <[email protected]>
Co-authored-by: Abbas Hussain <[email protected]>
Co-authored-by: Chloe <[email protected]>
Co-authored-by: Rishabh Singh <[email protected]>
Co-authored-by: Qi Chen <[email protected]>
Co-authored-by: Zhongnan Su <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants