Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of the YAML structure of query files #660

Merged
merged 36 commits into from
Sep 26, 2023

Conversation

FlorianBracq
Copy link
Collaborator

Goal of the PR

Add through library jsonschema an automated validation of the query files.

Details

The structure of a query file is defined in the JSON file; .schemas\queries.json

For now, the validation is fairly lenient, but more checks can be added, such as ensuring that when a key is used, a value is assigned to it.
I've noticed a few existing queries defining a key "metadata" without setting any metadata for instance; should it be considered invalid?

Also, it integrates nicely with VSCode as you can (after installing the VSCode extension "YAML" by RedHat) configure your editor to automatically highlight YAML structure errors, or autocomplete by adding in the workspace's settings.json:

    "yaml.schemas": {
        ".schemas/queries.json": [
            "/msticpy/data/queries/**/*.yaml"
        ]
    },

@ianhelle
Copy link
Contributor

ianhelle commented May 8, 2023

Hey Florian - weird coincidence. I've been playing around with Github CopilotX to generate code for a couple of things. One of the things was UI for creating queries with an object model. Separately I was looking at using Pydantic to validate the msticpyconfig.yaml. I think we could maybe combine these (the query object mode + pydantic) to generate the json schema for the checker.
I love the VSCode aspect - I'd experimented with this before but got a bit overwhelmed by the complexity of specifying the JSON schema. I think it would be nice to have this also for msticpyconfig.yaml.

One thing with changing the type from "str" to "string" is that we use the type name to format the replace param in queries so I think we'll need to update the code to maybe handle either type name.

@FlorianBracq
Copy link
Collaborator Author

Generating a similar schema for the msticpyconfig file should not that hard if you can provide a 'complete' example of all the options. That's what stopped me here as I was not able to find that list and was afraid it would break something?

Another point I was not sure about was when should this validation be executed?

  • from vscode with the user 'manually opting-in' by configuring his/her workspace sounded good
  • from the unitary tests sounded reasonable too
  • when importing msticpy to confirm all queries are valid sounded possible, but I was not entirely sure it would be a good idea?

Happy to receive your feedback on this!

@ianhelle
Copy link
Contributor

Maybe we should have talked about this....to avoid a bit of duplication. I started down the json schema route and created a schema for msticpyconfig but then went to have a look at Pydantic (Ryan is a big fan). On the whole this seems much easier to create and maintain since you define the schema in terms of Python classes.
I have a couple of relevant branches:

As far as when to do the validation.

  • For queries, it doesn't make sense to do this before importing them (triggered by loading a specific driver/provider type) but it does make sense then. In all likelihood the code would crash when trying to read a bad query file but maybe not. There is some simple validation done in the query reader file.
  • For the msticpyconfig - it makes sense to do this when reading it (i.e. when importing msticpy) but I also wanted to make a more verbose checker for helping diagnose user issues (e.g. you could have a syntantically-correct file but no Sentinel workspaces defined, which would obv be a problem if you were trying to connect to Sentinel).

I'm now hesitant about doing the pydantic thing for query files since it duplicates this PR. Even if we went the pydantic route it would be good to generate json schemas for both file types for use in vscode and elsewhere.

Copy link
Contributor

@ianhelle ianhelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
I have a PR open for a query editor implemented in ipywidgets. This has a pydantic data model for the query structure. Not sure if this might be in conflict with some of the requirements of the json schema.

.schemas/queries.json Outdated Show resolved Hide resolved
Copy link
Contributor

@ianhelle ianhelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments.

tests/common/test_pkg_config.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ianhelle ianhelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed a couple of items last time.
Looks good otherwise though (better than good :-) )

tests/common/test_pkg_config.py Show resolved Hide resolved
tests/common/test_pkg_config.py Outdated Show resolved Hide resolved
@ianhelle ianhelle merged commit 6cd72dc into microsoft:main Sep 26, 2023
17 checks passed
@FlorianBracq FlorianBracq deleted the yaml-validation branch June 26, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants