Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Add ES|QL Custom Library for Rule Support #3134

Closed
wants to merge 43 commits into from
Closed
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
c90ab9d
load unsupported rule type from schema
Mikaayenson Jun 29, 2023
f92b34f
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Jul 20, 2023
b245d5b
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Aug 1, 2023
f589ad4
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Aug 11, 2023
7ad3012
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Aug 24, 2023
7887392
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Aug 24, 2023
40a8e64
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Sep 11, 2023
172aa04
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Sep 22, 2023
c4af6f8
initial development for ESQL
terrancedejesus Sep 26, 2023
0a9e0bb
updated workflow to view-rule correctly
terrancedejesus Sep 26, 2023
538fc8c
Merge branch 'main' of github.com:elastic/detection-rules
Mikaayenson Nov 6, 2023
db43a33
Merge branch 'main' into esql-dev
Mikaayenson Nov 11, 2023
9b1eda2
Merge branch 'main' into esql-dev
Mikaayenson Nov 27, 2023
8c4a6ca
Add initial semantic validation and core logic
Mikaayenson Nov 27, 2023
b0ce5e9
Merge branch 'main' into esql-dev
Mikaayenson Nov 27, 2023
2e3d0c0
update grammar and add license headers
Mikaayenson Nov 27, 2023
ed76f4f
light cleanup
Mikaayenson Nov 27, 2023
c930c1b
refactor base esql methods and remove extra antlr4 files
Mikaayenson Nov 28, 2023
1e08afd
use packaged integrations
Mikaayenson Nov 28, 2023
652167b
updating walker; bug in utils.get_node with recursion
terrancedejesus Nov 30, 2023
62ca601
Merge branch 'esql-dev' of github.com:elastic/detection-rules into es…
Mikaayenson Nov 30, 2023
c81455d
update recursion to get multiple nodes
Mikaayenson Nov 30, 2023
49251aa
added event.dataset schema validation
terrancedejesus Dec 1, 2023
bb54c20
capture syntax errors and lint
Mikaayenson Dec 1, 2023
3b9128c
addressed TODOs; added more TODOs
terrancedejesus Dec 1, 2023
1662378
adding changes from #3297
terrancedejesus Dec 1, 2023
ba20d47
Merge branch 'main' into esql-dev
terrancedejesus Dec 1, 2023
e30e8c9
remove duplicate class
Mikaayenson Dec 1, 2023
2f03d86
adding metadata and stats semantic validation
terrancedejesus Dec 1, 2023
f2e2616
Add more type check support and lint
Mikaayenson Dec 1, 2023
17c9dd3
Merge branch 'esql-dev' of github.com:elastic/detection-rules into es…
Mikaayenson Dec 1, 2023
ee40a53
update flake linting
Mikaayenson Dec 1, 2023
e6f8b19
small cleanup
Mikaayenson Dec 1, 2023
1c9b91c
add more support for type checking
Mikaayenson Dec 1, 2023
b5e1672
small cleanup
Mikaayenson Dec 1, 2023
776b1dc
add support for related integrations
Mikaayenson Dec 1, 2023
d1fdd67
add initial esql unit tests
Mikaayenson Dec 4, 2023
e91d294
remove java related logic
Mikaayenson Dec 4, 2023
df5d442
refactor esql logic to support versioned grammars, parsers, and liste…
Mikaayenson Dec 5, 2023
ecb83e6
update devtool commands to use new paths
Mikaayenson Dec 5, 2023
182af10
lint
Mikaayenson Dec 5, 2023
09cf26c
lint and docstrings
Mikaayenson Dec 5, 2023
7c5199d
small cleanup
Mikaayenson Dec 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:

- name: Python Lint
run: |
python -m flake8 tests detection_rules --ignore D203 --max-line-length 120
python -m flake8 tests detection_rules esql --ignore D203 --max-line-length 120 --exclude="esql/generated/*.py"

- name: Python License Check
run: |
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ repos:
rev: 5.0.4
hooks:
- id: flake8
args: ['--ignore=D203,C901,E501,W503', '--max-line-length=120','--max-complexity=10', '--statistics']
args: ['--ignore=D203,C901,E501,W503', '--max-line-length=120','--max-complexity=10', '--statistics', '--exclude="esql/generated/*.py"']
exclude: '^rta|^kql'
- repo: https://github.com/PyCQA/bandit
rev: 1.7.4
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ license-check: $(VENV) deps
.PHONY: lint
lint: $(VENV) deps
@echo "LINTING"
$(PYTHON) -m flake8 tests detection_rules --ignore D203 --max-line-length 120
$(PYTHON) -m flake8 tests detection_rules esql --ignore D203 --max-line-length 120 --exclude="esql/generated/*.py"

.PHONY: test
test: $(VENV) lint pytest
Expand Down
75 changes: 75 additions & 0 deletions detection_rules/devtools.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
from .version_lock import VersionLockFile, default_version_lock

RULES_DIR = get_path('rules')
ESQL_DIR = get_path('esql')
GH_CONFIG = Path.home() / ".config" / "gh" / "hosts.yml"
NAVIGATOR_GIST_ID = '1a3f65224822a30a8228a8ed20289a89'
NAVIGATOR_URL = 'https://ela.st/detection-rules-navigator'
Expand Down Expand Up @@ -1488,3 +1489,77 @@ def guide_plugin_to_rule(ctx: click.Context, rule_path: Path, save: bool = True)
updated_rule.save_toml()

return updated_rule


@dev_group.group('esql')
def esql_group():
"""Commands for managing ESQL library."""


@esql_group.command('pull-grammar')
@click.option('--token', required=True, prompt=get_github_token() is None,
default=get_github_token(), help='GitHub personal access token.')
@click.option('--version', required=True, help='Version of the ESQL grammar to pull (e.g., 8.11.0).')
@click.pass_context
def pull_grammar(ctx: click.Context, token: str, version: str, branch: str = 'esql/lang'):
"""Pull the ESQL grammar from the specified repository."""
github = GithubClient(token)
client = github.authenticated_client
repo_instance = client.get_repo('elastic/elasticsearch-internal')

formatted_version = f"v{'_'.join(version.split('.'))}"
grammar_dir = Path(ESQL_DIR) / "grammar" / formatted_version
grammar_dir.mkdir(parents=True, exist_ok=True)

for filename, path in definitions.ELASTICSEARCH_ESQL_GRAMMAR_PATHS.items():
try:
file_content = repo_instance.get_contents(path, ref=branch).decoded_content.decode("utf-8")

# Write content to file
with open(grammar_dir / filename, 'w') as file:
file.write(file_content)

click.echo(f"Successfully downloaded {filename}.")

except Exception as e:
click.echo(f"Failed to download {filename}. Error: {e}")


@esql_group.command('build-parser')
@click.option('--version', required=True, help='Version of the ESQL grammar to build (e.g., 8.11.0).')
@click.pass_context
def build_parser(antlr_jar: str, version: str):
"""Build the ESQL parser using ANTLR."""
# Define paths
formatted_version = f"v{'_'.join(version.split('.'))}"
grammar_dir = Path(ESQL_DIR) / 'grammar' / formatted_version
lexer_file = grammar_dir / 'EsqlBaseLexer.g4'
parser_file = grammar_dir / 'EsqlBaseParser.g4'
output_dir = Path(ESQL_DIR) / 'generated' / formatted_version

# Ensure files exist
if not lexer_file.exists() or not parser_file.exists():
click.echo("Error: Required grammar files are missing.")
return

# Create the output directory if it doesn't exist
output_dir.mkdir(parents=True, exist_ok=True)

# Use the antlr4 binary installed with the python dependencies to generate parser and lexer
cmd_common = [
"antlr4",
"-Dlanguage=Python3",
"-o", str(output_dir)
]
cmd_lexer = cmd_common + [str(lexer_file)]
cmd_parser = cmd_common + [str(parser_file)]

try:
subprocess.run(cmd_lexer, check=True, cwd=grammar_dir)
subprocess.run(cmd_parser, check=True, cwd=grammar_dir)

# Create __init__.py file in the generated directory
(output_dir / '__init__.py').touch()
click.echo("ES|QL parser and lexer generated successfully.")
except subprocess.CalledProcessError:
click.echo("Failed to generate ES|QLparser and lexer.")
24 changes: 24 additions & 0 deletions detection_rules/ecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

from .utils import (DateTimeEncoder, cached, get_etc_path, gzip_compress,
load_etc_dump, read_gzip, unzip)
from .integrations import get_integration_schema_data

ECS_NAME = "ecs_schemas"
ECS_SCHEMAS_DIR = get_etc_path(ECS_NAME)
Expand Down Expand Up @@ -321,3 +322,26 @@ def get_endpoint_schemas() -> dict:
for f in existing:
schema.update(json.loads(read_gzip(f)))
return schema


def get_combined_schemas(data, meta, package_integrations, indices=[]):
"""Get the schemas for ECS, beats and integrations"""
# TODO: Revisit and update to account for all validator classes
# validate_integration methods have redundant code
current_stack_version = ""
combined_schema = {}
for integration_schema_data in get_integration_schema_data(data, meta, package_integrations):
integration_schema = integration_schema_data['schema']
stack_version = integration_schema_data['stack_version']

if stack_version != current_stack_version:
# reset the combined schema for each stack version
current_stack_version = stack_version
combined_schema = {}

if indices:
for index_name in indices:
integration_schema.update(**flatten(get_index_schema(index_name)))
integration_schema.update(**flatten(get_endpoint_schemas()))
combined_schema.update(**integration_schema)
return combined_schema
37 changes: 20 additions & 17 deletions detection_rules/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,20 +604,6 @@ def validates_query_data(self, data, **kwargs):
raise ValidationError("Alert suppression is only valid for query rule type.")


@dataclass(frozen=True)
class ESQLRuleData(QueryRuleData):
"""ESQL rules are a special case of query rules."""
type: Literal["esql"]
language: Literal["esql"]
query: str

@validates_schema
def validate_esql_data(self, data, **kwargs):
"""Custom validation for esql rule type."""
if data.get('index'):
raise ValidationError("Index is not valid for esql rule type.")


@dataclass(frozen=True)
class MachineLearningRuleData(BaseRuleData):
type: Literal["machine_learning"]
Expand Down Expand Up @@ -730,6 +716,20 @@ def interval_ratio(self) -> Optional[float]:
return interval / self.max_span


@dataclass(frozen=True)
class ESQLRuleData(QueryRuleData):
"""ESQL rules are a special case of query rules."""
type: Literal["esql"]
language: Literal["esql"]
query: str

@validates_schema
def validate_esql_data(self, data, **kwargs):
"""Custom validation for esql rule type."""
if data.get('index'):
raise ValidationError("Index is not valid for esql rule type.")


@dataclass(frozen=True)
class ThreatMatchRuleData(QueryRuleData):
"""Specific fields for indicator (threat) match rule."""
Expand Down Expand Up @@ -979,6 +979,7 @@ def _convert_add_related_integrations(self, obj: dict) -> None:
"""Add restricted field related_integrations to the obj."""
field_name = "related_integrations"
package_integrations = obj.get(field_name, [])
event_dataset = []

if not package_integrations and self.metadata.integration:
packages_manifest = load_integrations_manifests()
Expand All @@ -988,8 +989,10 @@ def _convert_add_related_integrations(self, obj: dict) -> None:
if (isinstance(self.data, QueryRuleData) or isinstance(self.data, MachineLearningRuleData)):
if (self.data.get('language') is not None and self.data.get('language') != 'lucene') or \
self.data.get('type') == 'machine_learning':
if isinstance(self.data, ESQLRuleData):
event_dataset = list(set(self.data.validator.event_datasets))
package_integrations = self.get_packaged_integrations(self.data, self.metadata,
packages_manifest)
packages_manifest, event_dataset)

if not package_integrations:
return
Expand Down Expand Up @@ -1096,9 +1099,9 @@ def compare_field_versions(min_stack: Version, max_stack: Version) -> bool:

@classmethod
def get_packaged_integrations(cls, data: QueryRuleData, meta: RuleMeta,
package_manifest: dict) -> Optional[List[dict]]:
package_manifest: dict, datasets: list = []) -> Optional[List[dict]]:
packaged_integrations = []
datasets = set()
datasets = set(datasets)

if data.type != "esql":
for node in data.get('ast', []):
Expand Down
114 changes: 104 additions & 10 deletions detection_rules/rule_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,28 @@
# 2.0.

"""Validation logic for rules containing queries."""
import sys
from functools import cached_property
from typing import List, Optional, Union, Tuple
from semver import Version
from io import StringIO
from typing import List, Optional, Tuple, Union

import eql
from antlr4 import Parser, ParserRuleContext, ParseTreeWalker
from semver import Version

import kql
from esql.errors import ESQLErrorListener, ESQLSyntaxError
from esql.iesql_listener import ESQLListenerFactory, IESQLListener
from esql.iesql_parser import ESQLParserFactory
from esql.utils import get_node, pretty_print_tree

from . import ecs, endgame
from .integrations import get_integration_schema_data, load_integrations_manifests
from .integrations import (get_integration_schema_data,
load_integrations_manifests)
from .misc import load_current_package_version
from .rule import (EQLRuleData, QueryRuleData, QueryValidator, RuleMeta,
TOMLRuleContents, set_eql_config)
from .schemas import get_stack_schemas
from .rule import QueryRuleData, QueryValidator, RuleMeta, TOMLRuleContents, EQLRuleData, set_eql_config

EQL_ERROR_TYPES = Union[eql.EqlCompileError,
eql.EqlError,
Expand Down Expand Up @@ -348,22 +357,107 @@ def validate_rule_type_configurations(self, data: EQLRuleData, meta: RuleMeta) -

class ESQLValidator(QueryValidator):
"""Validate specific fields for ESQL query event types."""
field_list = []
event_datasets = []
min_stack_version = None

@cached_property
def ast(self):
def ast(self) -> ParserRuleContext:
"""Return an AST."""
return None
# Capture stderr
original_stderr = sys.stderr
sys.stderr = StringIO()

try:
tree = self.parser.singleStatement()

# Check for errors captured by the custom error listener
if self.error_listener.errors:

# Check for additional errors (like predictive errors usually printed to stderr)
stderr_output = sys.stderr.getvalue()
error_message = "\n".join(self.error_listener.errors)
raise ESQLSyntaxError(f"\n\n{stderr_output}{error_message}")
finally:
# Restore the original stderr
sys.stderr = original_stderr
return tree

@cached_property
def listener(self) -> IESQLListener:
"""Return a listener instance."""
if Version.parse(self.min_stack_version) >= Version.parse("8.11.0"):
version = self.min_stack_version
return ESQLListenerFactory.getListener(version)
else:
raise ValueError(f"Unsupported ES|QL grammar version {self.min_stack_version}")

def run_walker(self, ctx_class=None):
"""Walk the AST with the listener."""
generic_walker = ParseTreeWalker() # TODO: Do we need to create a new walker each time?
tree = self.ast

if ctx_class:
ctx_objs = get_node(tree, ctx_class)
if not ctx_objs:
# warning message
print(f"Warning: Could not find {ctx_class} in {tree}")
# raise ESQLSyntaxError(f"Could not find {ctx_class} in {tree}")
if ctx_objs:
for ctx_obj in ctx_objs:
generic_walker.enterRule(self.listener, ctx_obj)
generic_walker.exitRule(self.listener, ctx_obj)
else:
generic_walker.walk(self.listener, tree)

@cached_property
def parser(self) -> Parser:
"""Return a parser instance."""
return ESQLParserFactory.createParser(self.query, self.min_stack_version)

@cached_property
def error_listener(self) -> ESQLErrorListener:
"""Return an error listener instance."""

# Attach the custom error listener
self.parser.removeErrorListeners() # TODO: Should we remove default error listeners?
error_listener = ESQLErrorListener()
self.parser.addErrorListener(error_listener)
return error_listener

@cached_property
def unique_fields(self) -> List[str]:
"""Return a list of unique fields in the query."""
# return empty list for ES|QL rules until ast is available (friendlier than raising error)
# raise NotImplementedError('ES|QL query parsing not yet supported')
return []
# return empty list for ES|QL rules until ast is available
return set(self.field_list)

def validate(self, data: 'QueryRuleData', meta: RuleMeta) -> None:
"""Validate an ESQL query while checking TOMLRule."""
# temporarily override to NOP until ES|QL query parsing is supported

self.min_stack_version = meta.min_stack_version
if Version.parse(meta.min_stack_version) < Version.parse("8.11.0"):
raise ESQLSyntaxError(f"Rule minstack must be greater than 8.10.0 {data.rule_id}")

# Traverse the AST to extract event datasets
# TODO: Do we want to error if no event datasets are found?
self.run_walker(self.parser.BooleanDefaultContext) # TODO: Walk entire tree?

tree = self.ast
pretty_print_tree(tree)

# get event datasets
self.event_datasets = self.listener.event_datasets
self.field_list = self.listener.field_list
# TODO: Pass unique field list to required fields workflow

# Create an instance of the listener with schema
packages_manifest = load_integrations_manifests()
package_integrations = TOMLRuleContents.get_packaged_integrations(data, meta, packages_manifest,
self.event_datasets)
combined_schemas = ecs.get_combined_schemas(data, meta, package_integrations)
self.listener.schema = combined_schemas
self.run_walker()
print("Validation completed successfully.")


def extract_error_field(exc: Union[eql.EqlParseError, kql.KqlParseError]) -> Optional[str]:
Expand Down
6 changes: 6 additions & 0 deletions detection_rules/schemas/definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@
"allow_sample": (Version.parse('8.6.0'), None),
"elasticsearch_validate_optional_fields": (Version.parse('7.16.0'), None)
}
ELASTICSEARCH_ESQL_GRAMMAR_PATHS = {
"EsqlBaseLexer.g4": "x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4",
"EsqlBaseParser.g4": "x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4",
"EsqlBaseLexer.tokens": "x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.tokens",
"EsqlBaseParser.tokens": "x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.tokens"
}
NON_DATASET_PACKAGES = ['apm', 'endpoint', 'system', 'windows', 'cloud_defend', 'network_traffic']
NON_PUBLIC_FIELDS = {
"related_integrations": (Version.parse('8.3.0'), None),
Expand Down
4 changes: 4 additions & 0 deletions esql/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License
# 2.0; you may not use this file except in compliance with the Elastic License
# 2.0.
Loading
Loading