Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CUE to define schema for files #391

Closed
wants to merge 13 commits into from
25 changes: 11 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,26 +52,26 @@ An Elastic Package specification describes:
1. the folder structure of packages and expected files within these folders; and
2. the structure of the expected files' contents.

There may be multiple versions of specifications. At the root of this repository is a `versions` folder. In this folder you will find sub-folders for each active major version of the specification, e.g. `versions/1`, `versions/2`, etc. Read more in the _Specification Versioning_ section below.

Within each major version folder, there must be a `spec.yml` file. This file is the entry point for the specification for a package's contents. It describes the the folder structure of packages and expected files within these folders (this is point 1. above). The specification is expressed using a schema similar to [JSON Schema](https://json-schema.org/), but with a couple of differences:
- The `type` field can be either `folder` or `file`,
- A new field, `contents` is introduced to (recursively) describe the contents of folders (i.e. when type == folder), and
- The specification is written as YAML for readability.
In the spec folder there is be a `spec.yml` file. This file is the entry point for the
specification for a package's contents. It describes the the folder structure of packages and expected
files within these folders (this is point 1. above). The specification is expressed using a schema simil
ar to [JSON Schema](https://json-schema.org/), but with a couple of differences:
-- The `type` field can be either `folder` or `file`,
-- A new field, `contents` is introduced to (recursively) describe the contents of folders (i.e. when ty
pe == folder), and
-- The specification is written as YAML for readability.

Expected package files, e.g. `manifest.yml` themselves have a structure to their contents. This structure is described in specification files using JSON schema (this is point 2. above). These specification files are also written as YAML for readability.

Note that the specification files primarily define the structure (syntax) of a package's contents. To a limited extent they may also define some semantics, e.g. enumeration values for certain fields. Richer semantics, however, will need to be expressed as validation code.

# Specification Versioning

As mentioned above, package specifications are versioned. Versions follow the [semantic versioning](https://semver.org/) scheme. In the context of package specifications, this means the following.
Package specifications are versioned. Versions follow the [semantic versioning](https://semver.org/) scheme. In the context of package specifications, this means the following.

* Packages must specify the specification version they are using. This is done via the `format_version` property in the package's root `manifest.yml` file. The value of `format_version` must conform to the semantic versioning scheme.

* Specifications are organized under the `versions` folder located at the root of this repository. The `versions` folder will contain a sub-folder for each **major version** of the specification, e.g. `versions/1`, `versions/2`, etc.

* Within each major version folder, there is a `spec.yml` file. It contains a root-level property called `version` which specifies the complete, current version of the specification. The value of `version` conforms to the semantic versioning scheme.
* Specifications are defined by runes in the spec, some atributes or files will only be available since, or till a version.

* Note that the latest version — and _only the latest_ version — of the specifications may include a pre-release suffix, `e.g. 1.4.0-alpha1`. This indicates that this version is still under development and may be changed multiple times. Once the pre-relase suffix is removed, however, the specification at that version becomes immutable. Further changes must follow the process outlined below in _Changing a Specification_.

Expand All @@ -80,10 +80,7 @@ As mentioned above, package specifications are versioned. Versions follow the [s
* Consider the **latest** version of the specification. Say it is `x.y.z`. It will be located under the `versions/x` folder, where `x` is the major version of the specification.
* Now consider a proposal to change the specification in some way. The version number of the changed specification must be determined as follows:
* If the proposed change makes the specification stricter than it is at `x.y.z`, the new version number will be `X.0.0`, where `X = x + 1`. That is, we bump up the major version.
* Add a new folder named `versions/X`, where `X` is the new major version number.
* The changed specification — in its entirety — must be added to the new version folder.
* Set the root-level `version` property in the specification's root `spec.yml` file to `X.0.0`.
* Start a new `CHANGELOG.yml` file at the root of the `versions/X` folder, add a section for `X.0.0` and make an entry under it explaining your change. If there are multiple changes, please add multiple entries under the new section.
* Add a constraint for the new attributes or rules, so they only apply on version >= X.0.0.
* If the proposed change makes the specification looser than it is at `x.y.z`, the new version number will be `x.Y.0`, where `Y = y + 1`. That is, we bump up the minor version. Note that adding new, but optional, constraints to a specification is a change that makes a specification looser.
* Apply the proposed changes to the existing specification under the `versions/x` folder, where `x` is the major version number of the specification being changed.
* Set the root-level `version` property in the specification's root `spec.yml` file to `x.Y.0`.
Expand Down
87 changes: 87 additions & 0 deletions code/go/internal/cueschema/errors.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package cueschema

import (
"fmt"
"regexp"
"strconv"
"strings"

cueerrors "cuelang.org/go/cue/errors"
"cuelang.org/go/cue/token"
)

var (
reEmptyDisjunctionErr = regexp.MustCompile(`^(.*): (\d+) errors in empty disjunction`)
reConflictingValuesErr = regexp.MustCompile(`^(.*): conflicting values (.*) and (.*)`)
reRegexpDoesNotMatchErr = regexp.MustCompile(`^(.*): invalid value (.*) \(out of bound =~"(.*)"\)`)
)

// validationErrors transforms cue errors into more human-friendly errors.
func validationErrors(filename string, err error) []error {
var result []error
for i, errs := 0, cueerrors.Errors(err); i < len(errs); i++ {
e := errs[i]
if m := reEmptyDisjunctionErr.FindStringSubmatch(e.Error()); len(m) > 0 {
n, _ := strconv.Atoi(string(m[2]))
builder := validationErrorBuilder{
Filename: filename,
Field: m[1],
Conflicts: errs[i+1 : i+n],
}
err := builder.Build()
result = append(result, err)
i += n
continue
}

if m := reRegexpDoesNotMatchErr.FindStringSubmatch(e.Error()); len(m) > 0 {
fieldName := m[1]
pattern := m[3]
err := fmt.Errorf("field %s: Does not match pattern '%s'", fieldName, pattern)
result = append(result, err)
continue
}

pos := positionForError(e)
err := fmt.Errorf("%s:%d:%d: %s", filename, pos.Line(), pos.Column(), e)
result = append(result, err)
}
return result
}

type validationErrorBuilder struct {
Filename string
Field string
Conflicts []cueerrors.Error
}

func (b *validationErrorBuilder) Build() error {
pos := positionForError(b.Conflicts[0])
var expected []string
var found string
for _, conflict := range b.Conflicts {
m := reConflictingValuesErr.FindStringSubmatch(conflict.Error())
expected = append(expected, string(m[2]))
if found == "" {
found = string(m[3])
}
}
return fmt.Errorf("%s:%d:%d: %s: found %s, expected one of: %s",
b.Filename, pos.Line(), pos.Column(), b.Field,
found, strings.Join(expected, ", "),
)
}

func positionForError(err cueerrors.Error) token.Pos {
for _, pos := range err.InputPositions() {
// YAML filename is empty.
if pos.Filename() == "" {
return pos
}
}
return err.Position()
}
155 changes: 155 additions & 0 deletions code/go/internal/cueschema/loader.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package cueschema

import (
"bytes"
"fmt"
"io/fs"
"io/ioutil"
"os"
"path/filepath"
"strings"

"cuelang.org/go/cue"
"cuelang.org/go/cue/cuecontext"
"cuelang.org/go/cue/load"
cueyaml "cuelang.org/go/pkg/encoding/yaml"

spec "github.com/elastic/package-spec"
ve "github.com/elastic/package-spec/code/go/internal/errors"
"github.com/elastic/package-spec/code/go/internal/spectypes"
"github.com/elastic/package-spec/code/go/internal/yamlschema"
)

// FileSchemaLoader implements schema loading from CUE definitions.
type FileSchemaLoader struct{}

// NewFileSchemaLoader creates a new FileSchemaLoader for CUE definitions.
func NewFileSchemaLoader() *FileSchemaLoader {
return &FileSchemaLoader{}
}

// Load loads a schema from a CUE file in the given filesystem.
func (f *FileSchemaLoader) Load(fsys fs.FS, schemaPath string, options spectypes.FileSchemaLoadOptions) (spectypes.FileSchema, error) {
parts := strings.SplitN(schemaPath, "#", 2)

filePath := parts[0]
definition := ""
if len(parts) > 1 {
definition = "#" + parts[1]
}

d, err := fs.ReadFile(fsys, filePath)
if err != nil {
return nil, err
}

spec, err := loadSpec(d, options)
if err != nil {
return nil, fmt.Errorf("failed to load instance with spec: %w", err)
}

if definition != "" {
spec = spec.LookupDef(definition)
if err := spec.Err(); err != nil {
return nil, fmt.Errorf("failed to find CUE definition %q in %s: %w", definition, filePath, err)
}
}

return &FileSchema{spec, options}, nil
}

// FileSchema is a schema for a file.
type FileSchema struct {
spec cue.Value
options spectypes.FileSchemaLoadOptions
}

// Validate validates that the given file complies with the schema.
func (s *FileSchema) Validate(fsys fs.FS, filePath string) ve.ValidationErrors {
d, err := fs.ReadFile(fsys, filePath)
if err != nil {
return ve.ValidationErrors{err}
}

d, err = yamlschema.ConvertYAMLToJSON(d)
if err != nil {
return ve.ValidationErrors{err}
}

expr, err := cueyaml.Unmarshal(d)
if err != nil {
return ve.ValidationErrors{
fmt.Errorf("failed to parse yaml file %q: %w", filePath, err),
}
}

v := s.spec.Context().BuildExpr(expr, cue.Filename(filePath))
v = v.Unify(s.spec)
errs := v.Validate(cue.Concrete(true))
if errs != nil {
return ve.ValidationErrors(validationErrors(filePath, errs))
}

return nil
}

func loadSpec(specBytes []byte, options spectypes.FileSchemaLoadOptions) (cue.Value, error) {
// This is a hack till https://github.com/cue-lang/cue/issues/607 is solved.
tmpDir, err := os.MkdirTemp("", "package-spec-")
if err != nil {
return cue.Value{}, fmt.Errorf("failed to create tmp dir: %w", err)
}
defer os.RemoveAll(tmpDir)

files := []string{
"cue.mod/module.cue",
"definitions.cue",
}

specFS := spec.FS()
for _, f := range files {
d, err := fs.ReadFile(specFS, f)
if err != nil {
return cue.Value{}, fmt.Errorf("failed to read %q", f)
}
dstPath := filepath.Join(tmpDir, f)
os.MkdirAll(filepath.Dir(dstPath), 0755)
err = ioutil.WriteFile(dstPath, d, 0644)
if err != nil {
return cue.Value{}, fmt.Errorf("failed to write %q for copy of definitions", dstPath)
}
}

var specBuffer bytes.Buffer
specBuffer.Write(specBytes)
if sv := options.SpecVersion; sv != nil {
// This way of injection is a bit hacky, but tags require that all defined tags are used.
specBuffer.WriteString("\n")
fmt.Fprintf(&specBuffer, "spec_version_major: %d\n", sv.Major())
fmt.Fprintf(&specBuffer, "spec_version_minor: %d\n", sv.Major())
fmt.Fprintf(&specBuffer, "spec_version_patch: %d\n", sv.Major())
fmt.Fprintf(&specBuffer, "spec_version_prerelease: \"%s\"\n", sv.Prerelease())
fmt.Fprintf(&specBuffer, "spec_version: \"%s\"\n", sv.String())
}

specFilePath := filepath.Join(tmpDir, "spec.cue")
err = ioutil.WriteFile(specFilePath, specBuffer.Bytes(), 0644)
if err != nil {
return cue.Value{}, fmt.Errorf("failed to write %q for copy of spec", specFilePath)
}

instances := load.Instances([]string{specFilePath}, &load.Config{
Dir: tmpDir,
})
if len(instances) != 1 {
return cue.Value{}, fmt.Errorf("only 1 instance expected, found %d", len(instances))
}

cueCtx := cuecontext.New()
v := cueCtx.BuildInstance(instances[0])
return v, v.Err()
}
20 changes: 20 additions & 0 deletions code/go/internal/cueschema/loader_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package cueschema

import (
"testing"

spec "github.com/elastic/package-spec"
"github.com/elastic/package-spec/code/go/internal/spectypes"
"github.com/stretchr/testify/require"
)

func TestLoadIntegrationManifest(t *testing.T) {
loader := NewFileSchemaLoader()
options := spectypes.FileSchemaLoadOptions{}
_, err := loader.Load(spec.FS(), "integration/manifest.spec.cue", options)
require.NoError(t, err)
}
43 changes: 43 additions & 0 deletions code/go/internal/mixedloader/loader.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package mixedloader

import (
"fmt"
"io/fs"
"path"
"strings"

"github.com/elastic/package-spec/code/go/internal/cueschema"
"github.com/elastic/package-spec/code/go/internal/spectypes"
"github.com/elastic/package-spec/code/go/internal/yamlschema"
)

// FileSchemaLoader can load schemas from different formats based on the extension of the file.
type FileSchemaLoader struct {
cueschema *cueschema.FileSchemaLoader
yamlschema *yamlschema.FileSchemaLoader
}

// NewFileSchemaLoader builds a new FileSchemaLoader.
func NewFileSchemaLoader() *FileSchemaLoader {
return &FileSchemaLoader{
cueschema: cueschema.NewFileSchemaLoader(),
yamlschema: yamlschema.NewFileSchemaLoader(),
}
}

// Load loads a schema from a file in the given filesystem. It uses a different decoder depending on the
// extension of the file.
func (f *FileSchemaLoader) Load(fs fs.FS, schemaPath string, options spectypes.FileSchemaLoadOptions) (spectypes.FileSchema, error) {
parts := strings.SplitN(schemaPath, "#", 2)
switch path.Ext(parts[0]) {
case ".yml", ".spec.yml":
return f.yamlschema.Load(fs, schemaPath, options)
case ".cue", ".spec.cue":
return f.cueschema.Load(fs, schemaPath, options)
}
return nil, fmt.Errorf("not implemented loading for %q (decided by extension: %q)", schemaPath, path.Ext(parts[0]))
}
8 changes: 7 additions & 1 deletion code/go/internal/spec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ import (

func TestBundledSpecsForIntegration(t *testing.T) {
fs := spec.FS()
_, err := fs.Open("1/integration/spec.yml")
_, err := fs.Open("integration/spec.yml")
require.NoError(t, err)
}

func TestBundledSpecsForInput(t *testing.T) {
fs := spec.FS()
_, err := fs.Open("input/spec.yml")
require.NoError(t, err)
}
Loading