Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added design for core refactor #331

Open
wants to merge 1 commit into
base: integ-core-refactor-docs
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/dev/core-refactor.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, rename, core refactor is too common.

Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# 1. Overview

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this file a bit more specifically? I think core-refactor doesn't describe the changes or objectives accurately enough.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an abstract, title or intro.


Improve extensibility of the SQL plugin, OpenSearch functions need to be moved from `core` module and moved to the `opensearch` module. This will prevent other data sources from being able to parse queries with OpenSearch specific functions like `MATCH_QUERY`.

# 2. Requirements

Resolve https://github.com/opensearch-project/sql/issues/811

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this resolve the entire issue? According to opensearch-project#811 (comment), I believe this only resolves core functions. But does it also refactor the parser and pull out the push-down functions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I think we need to list exactly what we are resolving as part of issue opensearch-project#811.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add more info. Probably, describe a user story with this feature implemented: plugin for plugin case. It also add dependency clean up and extensibility.


## 2.1 Use case

1. Using SQL or PPL with OpenSearch data source. Users should be able to use all available functions that is supported by SQL, PPL, and OpenSearch.
2. Using SQL or PPL with data sources other than OpenSearch (Prometheus, for example). Users should be able to use implemented SQL and PPL functions.

## 2.2 Functional change

- Using data sources other than OpenSearch, users will encounter a `SyntaxCheckException` when they query with OpenSearch specific functions. (Some of these functions can be found in https://opensearch.org/docs/latest/search-plugins/sql/sql/functions/)
- The Parser will have a different token to categorize OpenSearch functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a architectural design element, not functional. The users won't see this, nor is this a requirement.

- Core module will no longer contain OpenSearch functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the #1 goal, right?
Everything is a by-product of getting the functions out of the core module.


# 3. Design

## 3.1 Datasource specific parser

Core will maintain its parser, however OpenSearch functions will be under a new token called `userDefinedFunctions`. This new token will encompass datasource specific functions. Each datasource module will have its own parser which will be called once the core module analyzer visits this `userDefinedFunction` node in the generated parse tree. Then, the datasource module will parse this function and with its own analyzer, add these function nodes to the parse tree.

```mermaid
sequenceDiagram
participant Core Parser
participant Core Analyzer
participant #8249;DataSource#8250; Parser
participant #8249;DataSource#8250; Analyzer
participant QueryPlanFactory

Core Parser ->> Core Analyzer: #8249;DataSource#8250; specific query
Core Analyzer ->> #8249;DataSource#8250; Parser: #8249;DataSource#8250; specific query
#8249;DataSource#8250; Parser ->> #8249;DataSource#8250; Analyzer: #8249;DataSource#8250; specific query
#8249;DataSource#8250; Analyzer ->> #8249;DataSource#8250; Parser: ParserTree
#8249;DataSource#8250; Parser ->> Core Analyzer: ParserTree
Core Analyzer ->> Core Parser: ParserTree
Core Parser ->> QueryPlanFactory: ParserTree
```

```mermaid
flowchart LR
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the preferred diagram here? Sequence or flowchart?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are fine

A[User] -- query --> B[Core Parser]
B -- query --> C[Core Analyzer]
C --> D{DS specific?}
D --> |True| E[DS Parser]
E -- query --> F[DS Analyzer]
F -- ParserTree --> G[QueryPlanFactory]
D --> |False\nParserTree| G
```

If the function for the above is `MATCH_QUERY`, OpenSearch will be the DataSource.

## 3.2 Datasource specific LogicalPlanOptimizer

Rules in core LogicalPlanOptimizer should not include Datasource specific push down rules. For example, PUSH_DOWN_NESTED will be removed from the core LogicalPlanOptimizer. When optimizing a list of rules will be requested from the datasource module (This list can be empty. Prometheus would return an empty list currently).
```mermaid
sequenceDiagram
participant Planner
participant Core LogicalPlanOptimizer
participant #8249;DataSource#8250; LogicalPlanOptimizer

Planner ->> Core LogicalPlanOptimizer: create()
Core LogicalPlanOptimizer -->> Planner: general rules list
Planner ->> Core LogicalPlanOptimizer: optimize()
Core LogicalPlanOptimizer ->> #8249;DataSource#8250; LogicalPlanOptimizer: getList()
#8249;DataSource#8250; LogicalPlanOptimizer -->> Core LogicalPlanOptimizer: Datasource specific rules list
Core LogicalPlanOptimizer -->> Planner: LogicalPlan
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe:

  1. discovered DS interfaces, for example:
    • parse for UDF
    • parse for identifier
    • analyze, optimize, etc
  2. API to plug/attach new DS on runtime
  3. how parser with UDF should work
  4. algorithm of resolution of a DS for a query with examples; consider tricky examples, when UDF or identifier exists in multiple DS
  5. override policy and workflow (in case if a DS overrides a built-in function), or if it is prohibited - how it is checked
  6. Query processing flow and response if UDF/identifier isn't resolved