-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate REST specs against schema #55117
Conversation
A JSON schema was recently introduced for the REST API specification. elastic#54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes elastic#54314
Pinging @elastic/es-core-infra (:Core/Infra/Build) |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
Also, one issue we have here is that this is all very coupled to the notion of at most two source sets (main and test). Once we start splitting these things up this assumption is going to break. We already have plans to split out YAML tests into their own source set, how will that affect this?
buildSrc/src/main/java/org/elasticsearch/gradle/precommit/ValidateRestSpecTask.java
Outdated
Show resolved
Hide resolved
buildSrc/src/main/java/org/elasticsearch/gradle/precommit/ValidateRestSpecTask.java
Outdated
Show resolved
Hide resolved
public void validate() throws IOException { | ||
// load schema | ||
JsonSchema jsonSchema; | ||
FileTree jsonSchemas = Util.getJavaMainSourceResources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not directly access the schema from the task implementation. We should do this in the plugin and pass them in as an input to the task. Two reasons for this:
- This logic simply shouldn't exist in the task. This "wiring" should be done by plugins. This makes the tasks more reusable and composable.
- The schema isn't tracked as an input. Which means, if it changes we won't correctly flag this task as out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and thanks for the feedback. I think the result is much cleaner.
I changed the task so that it is generic Json validator and moved the ES specific location of stuff to the plugin.
|
||
private static final String SCHEMA_PROJECT = ":rest-api-spec"; | ||
private static final String DOUBLE_STARS = "**"; // checkstyle thinks this is an empty javadoc statement, so string concat instead | ||
private static final String JSON_SPEC_PATTERN_INCLUDE = DOUBLE_STARS + "/rest-api-spec/api/" + DOUBLE_STARS + "/*.json"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this stuff should be in the plugin. These are conventions specific to our project. We should be able to easily reuse this task to validate something else. Right now that would be impossible do to these conventions being hard-coded into the task implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and done.
@InputFiles | ||
public FileTree getInputDir() { | ||
// if null results in in NO-SOURCE | ||
return Util.getJavaTestAndMainSourceResources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this should be in the plugin. We should be just add a setInputFiles(FileCollection files)
that we pass exactly what we want to validate. Then the plugin can do all this source set stuff. Also the plugin should confirm these source sets exist (or apply the plugin that creates them) so things don't explode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
sb.append(System.lineSeparator()); | ||
}); | ||
}); | ||
throw new JsonSchemaException(sb.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid smashing the output of this task in an exception message. Instead let's follow the example of other core Gradle tasks which is to simply log the errors in the task using logger
and then throw an exception message that says something like "There were errors, refer to task output.". We can (and should) include a summary in the exception (ex: "Validation failed: 3 files contained 8 violations"), but details should be in task output.
For example, here's a checkstyle error: https://gradle-enterprise.elastic.co/s/ixyvqw455k6m6/failure?openFailures=WzBd&openStackTraces=WzFd#top=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The a validation error looks like:
> Task :rest-api-spec:validateRestSpec FAILED
[validate JSON][ERROR][cat.thread_pool.json][$.cat.thread_pool.params.size.options[0]: does not match the regex pattern ^[a-zA-Z_]+$]
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':rest-api-spec:validateRestSpec'.
> Error validating JSON. See the report at: file:/Users/jakelandis/workspace/8x/elasticsearch/rest-api-spec/build/reports/validateJson.txt
JSON validation failed: 1 files contained 1 violations
The report looks like:
Schema: /Users/jakelandis/workspace/8x/elasticsearch/rest-api-spec/src/main/resources/schema.json
----------Validation Errors-----------
/Users/jakelandis/workspace/8x/elasticsearch/rest-api-spec/src/main/resources/rest-api-spec/api/cat.thread_pool.json: $.cat.thread_pool.params.size.options[0]: does not match the regex pattern ^[a-zA-Z_]+$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need the report file and the log output, what do you think? To me the log output is the priority, as that's the only convenient way to debug errors in CI. And if we have all the info we need in the console, do we need to create the report file?
Sorry if my example sent you down that path. I was mainly getting at the fact that individual errors are reported in the task output and the exception message simply refers to that output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is marginal benefit to having a report. It allows me to use the fqn for the files without polluting the log output with long names. Also, this validation as-is should feel familiar since it like checkstyle (not sure if that is good or bad)
I have a minor preference to keep the report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor preference to keep the report.
SGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to keep the report, let's just go ahead and ditch the success marker then. No point in having two different outputs. In the case of scenario where we found no errors, we can just say as much in the report file. So let's get rid of that and the need to extend PrecommitTask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. 6090fba
@@ -75,4 +84,91 @@ public static URI getBuildSrcCodeSource() { | |||
throw new GradleException("Error determining build tools JAR location", e); | |||
} | |||
} | |||
|
|||
public static Function<FileTree, FileTree> filterByPatterns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of an awkward way of defining a filter. Instead of creating a function that takes a FileTree
and returns another FileTree
why not model the filter as an Action<? super PatternFilterable>
? Then your methods that take this could just look like:
getJavaMainSourceResources(project, spec -> spec.include(**/foo/**))
This also avoids the need to create a PatternSet
via PatternSetFactory
which is technically an internal API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would alleviate the need for a bunch of null-checking logic on the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the pointer. Updated to use Action<? super PatternFilterable>
and think it is much cleaner now.
private static final String SCHEMA_PROJECT = ":rest-api-spec"; | ||
private static final String DOUBLE_STARS = "**"; // checkstyle thinks this is an empty javadoc statement, so string concat instead | ||
private static final String JSON_SPEC_PATTERN_INCLUDE = DOUBLE_STARS + "/rest-api-spec/api/" + DOUBLE_STARS + "/*.json"; | ||
private static final String JSON_SPEC_PATTERN_EXCLUDE = DOUBLE_STARS + "/_common.json"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file ever get validated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. this is a hard coded exception to the JSON schema validation.
thanks @mark-vieira for the review. This should be good to go again.
With the latest round of changes, I have separated the concerns of validation and setup, where the setup is handled by the plugin, and the validation the task. When YAML test source sets are introduced, the plugin will add that source set to the set of inputs to look at and no changes should be required from the task. So instead of just main, and test, the plugin will also include yaml tests (or what ever they will be called). FWIW, splitting the YAML tests to their own source sets is up next on my todo list. |
private File jsonSchema; | ||
private FileTree inputFiles; | ||
// no need to rebuild the JsonSchema multiple times per build | ||
private static ConcurrentHashMap<File, JsonSchema> cache = new ConcurrentHashMap<>(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops... this isn't effective across multiple tasks. Will remove this, or move out to the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just ditch it. These tasks are already pretty quick and the risk of daemon memory leaks here are probably not worth the performance benefit. We can come back later and micro-optimize if we get some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done e06e434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments.
*/ | ||
public class ValidateJsonAgainstSchemaTask extends PrecommitTask { | ||
|
||
private static final ObjectMapper mapper = new ObjectMapper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove static
here. I realize there's some overhead in allocating these but the potential for weird memory leaks in the daemon is probalby not worth it. The Gradle daemon is long-lived and also caches classloaders. Static references can cause weird issues with leaking classloaders which can cause daemons to eventually OOM.
throw new UncheckedIOException(e); | ||
} | ||
}); | ||
ConcurrentHashMap<File, Set<String>> errors = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no concurrency that I can see here. We are simply validating each input file serially on the task execution thread. Also, we should probably use a map implementation that maintains insertion order like LinkedHashMap
so that in cases with multiple errors the ordering makes sense (all errors for the same file in the same place in line order).
ConcurrentHashMap<File, Set<String>> errors = new ConcurrentHashMap<>(); | ||
// incrementally evaluate input files | ||
inputChanges.getFileChanges(getInputFiles()).forEach(fileChange -> { | ||
File file = fileChange.getFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind here that fileChange
could be a removed file. In which case we'd fail here as we'd try to validate a file that has been deleted. We should filter those out.
|
||
@Incremental | ||
@InputFiles | ||
public FileTree getInputFiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to simplify things by just making this a FileCollection
instead of a FileTree
. There's no reason to keep a tree structure here as ignores are done of a file name basis and file paths are completely irrelevant for the purposes of validation.
This considerably simplifies things like adding input files from completely disparate directories. You can do this with file trees but it ends up creating a UnionFileTree
which can have some funky behavior.
|
||
private void maybeLogAndCollectError(Set<ValidationMessage> messages, ConcurrentHashMap<File, Set<String>> errors, File file) { | ||
for (ValidationMessage message : messages) { | ||
String error = String.format("[validate JSON][ERROR][%s][%s]", file.getName(), message.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to use String.format
here as the Logger.error()
method supports formatting strings using {}
as a placeholder.
String error = String.format("[validate JSON][ERROR][%s][%s]", file.getName(), message.toString()); | ||
getLogger().error(error); | ||
try { | ||
errors.computeIfAbsent(file, k -> new HashSet<>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, let's use a set implementation that maintains ordering so we don't potentially print an error for line 50 before one on line 30 in the report.
Files.write(getSuccessMarker().toPath(), new byte[] {}, StandardOpenOption.CREATE); | ||
} else { | ||
// build output and throw exception | ||
Files.writeString(getErrorReport().toPath(), String.format("Schema: %s", jsonSchemaOnDisk), StandardOpenOption.CREATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should probably just create a single PrintWriter
and do all this stuff in one go. Each call to Files.write*
is going to have to open and then close a stream on that file. Let's just do this once.
getLogger().error(error); | ||
try { | ||
errors.computeIfAbsent(file, k -> new HashSet<>()) | ||
.add(String.format("%s: %s", file.getCanonicalFile(), message.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it strictly necessary to resolve the canonical file path here? I mean, if folks are working in some kind linked workspace folder wouldn't it be friendlier to report that path instead of the resolved one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it matters too much. are you advocating for absolute path, or relative path ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolute path, but we don't have to resolve the canonical path. It's costly, and requires catching an IOException
vs just using the given absolute path.
|
||
@Override | ||
public void apply(Project project) { | ||
if (jsonSchema == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, project configuration is never done in parallel, so there's no need for the thread safety stuff here. Second, I think this is overly complex. Instead of all this nastiness to try and find the file, let's just go ahead and have the project configure its location. It's one file, and it's rarely if every going to move. Doing a bunch of work at configuraiton time to scan for this file seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the work into the task configuration ... is that what you meant ? e9dd891
(I think this is still configuration time work... but since it is a registered task, won't do any work unless needed ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, although this is better since we now defer the work unless the task is actually executed. I mean we should remove this "search" altogether and just do something along the lines of validateJson.jsonSchema = file('rest-api-spec/src/main/resources/schema.json')
in the project build script or even hard-code in the plugin for now (since it's already tightly coupled to our project).
My reasoning being, there is exactly one file, we always know where it'll be and it rarely moves. Why all the work to dynamically locate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK .. i understand now. I hard coded it in the plugin. 1768c6d
@@ -109,3 +110,15 @@ testClusters.integTest { | |||
extraConfigFile nodeCert.name, nodeCert | |||
extraConfigFile 'roles.yml', file('src/test/resources/roles.yml') | |||
} | |||
|
|||
validateRestSpec { | |||
ignore 'searchable_snapshots.mount.json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to ignore these files? Do they fail validation? If so, shouldn't we fix the failures so it doesn't break the clients team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, these will need to be fixed. I plan to log issues to fix once this get merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we get this fixed I'd argue for removing the ability to ignore. The purpose of doing validation here is so that the clients folks don't have to do it downstream. Not much point in validating these twice, both the producer and consumer sides.
Going forward we need to make it such that a change that doesn't conform to the schema fails checks and effectively cannot be merged. This way the clients team can ditch this on their end, simplifying their process.
implemented requested changes on e9dd891 ... However, have one question I wasn't sure about . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments, othewise latest changes LGTM 👍
Map<File, Set<String>> errors = new LinkedHashMap<>(); | ||
// incrementally evaluate input files | ||
StreamSupport.stream(inputChanges.getFileChanges(getInputFiles()).spliterator(), false) | ||
.filter(f -> ChangeType.REMOVED.equals(f.getChangeType()) == false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think just writing this as f.getChangeType() != ChangeType.REMOVED
is simpler yes? Does our stance on using == false
in the codebase mean avoid !=
as well in cases where using .equals()
is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure ... i think i am just used to == false
now :) ... changed it to !=
return inputFiles; | ||
} | ||
|
||
public void setInputFiles(FileTree inputFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setter should be FileCollection
as well. Also the field should be updated.
@mark-vieira - thanks so much for the thorough review! I am learning so much about Gradle and how it should be modeled. |
@elasticmachine update branch |
1 similar comment
@elasticmachine update branch |
Thank you for the work and review that went into this PR, @jakelandis and @mark-vieira! |
A JSON schema was recently introduced for the REST API specification. elastic#54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes elastic#54314
A JSON schema was recently introduced for the REST API specification. elastic#54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes elastic#54314
A JSON schema was recently introduced for the REST API specification. #54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes #54314
A JSON schema was recently introduced for the REST API specification. #54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes #54314
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms the schema.
The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.
An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.
Closes #54314