Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema for configuration file with yamale #4084

Merged
merged 30 commits into from
Jun 8, 2018
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
*.db
*.rdb
*.egg-info
*.log
*.pyc
Expand Down
187 changes: 187 additions & 0 deletions readthedocs/buildconfig/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
"""Validator for the RTD configuration file."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As general feedback, I'm not sure this needs to be broken out into a separate django application path. I'd implement this next to similar code so that it was easy to find. I believe doc_builder/ already has similar code around our config file.

See note below though. I don't think this should be application level yet (pending a design discussion around usage of yamale)


from __future__ import division, print_function, unicode_literals

from os import path

import six
import yamale
from yamale.validators import DefaultValidators, Validator

V2_SCHEMA = path.join(
path.dirname(__file__),
'../rtd_tests/fixtures/spec/v2/schema.yml'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have code outside tests that references tests fixtures. This module should probably live under rtd_tests somewhere instead, as we've only decided to use the library for testing and validating our thoughts around our spec for now.

)

ALL = 'all'

DEFAULT_VALUES = {
'formats': [],
'conda': None,
'build': {
'image': '2.0',
},
'python': {
'version': '3.6',
'requirements': None,
'install': None,
'extra_requirements': [],
'system_packages': False,
},
'sphinx': {
'configuration': None,
'fail_on_warning': False,
},
'mkdocs': {
'configuration': None,
'fail_on_warning': False,
},
'submodules': {
'include': [],
'exclude': [],
'recursive': False,
},
'redirects': None,
}

CONSTRAINTS = {
'Documentation type': {
'unique': ['sphinx', 'mkdocs'],
'default': 'sphinx',
},
'Submodules': {
'unique': ['submodules.include', 'submodules.exclude'],
'default': 'submodules.include',
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults and constraints should probably live where we are actually parsing the configuration file, as I don't think we're testing defaults yet. Testing defaults and constraints will happen at the parsing level, this PR is working on the spec level.



def flatten(dic, keep_key=False, position=None):
"""
Returns a flattened dictionary

Given a dictionary of nested dictionaries, it returns
a dictionary with all nested keys at the top level.

For example

{
'key': 'value',
'nested': {
'subkey': 'value'
},
}

Becomes

{
'key': value,
'nested': {...},
'nested.subkey': 'value'
}
"""
child = {}

for key, value in dic.items():
if isinstance(key, six.string_types):
key = key.replace('.', '_')
if position:
item_position = '{}.{}'.format(position, key)
else:
item_position = key

if isinstance(value, dict):
child.update(
flatten(dic[key], keep_key, item_position)
)
if keep_key:
child[item_position] = value
else:
child[item_position] = value

return child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the underlying need for flattening the dictionary? This seems like it could be an unnecessary step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes it easy to manipulate and check the keys, also is better to describe constraints, for example

https://github.com/stsewd/readthedocs.org/blob/yaml-schema-yamale/readthedocs/buildconfig/__init__.py#L176-L180

We don't have to use recursion or several nested bucles to do things like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear if this is still required. It seems like a confusing way to do a nested dict merge. I think if we are implementing defaults through yamale there is a tighter integration rather than keeping this data in a dictionary.



class PathValidator(Validator):

"""
Path validator

Checks if the given value is a string and a existing
file.
"""

tag = 'path'
constraints = []
configuration_file = '.'

def _is_valid(self, value):
if isinstance(value, six.string_types):
file_ = path.join(
path.dirname(self.configuration_file),
value
)
return path.exists(file_)
return False


class BuildConfig(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't played with yamale, it's not clear how much of this class is needed to validate our specification in unit tests. If this is more for implementing yamale in our application, we should break this out and save it.


"""Wrapper object to validate to configuration file."""

_schema = V2_SCHEMA
_default_values = DEFAULT_VALUES
_constraints = CONSTRAINTS

def __init__(self, configuration_file):
self.configuration_file = configuration_file
self.data = yamale.make_data(self.configuration_file)
self.defaults = flatten(self._default_values, keep_key=True)
self.constraints = self._constraints
self.schema = yamale.make_schema(
self._schema,
validators=self._get_validators()
)

def _get_validators(self):
"""Custom validators for yamale."""
validators = DefaultValidators.copy()
PathValidator.configuration_file = self.configuration_file
validators[PathValidator.tag] = PathValidator
return validators

def check_constraints(self):
"""
Check constraints between keys.

Such as relations of uniquiness and set default values for those.
"""
for subject, constraint in self.constraints.items():
present_keys = [
key
for key in constraint['unique']
if key in self.data[0]
]
default_key = constraint.get('default')
if not present_keys and default_key:
self.data[0][default_key] = self.defaults[default_key]
elif len(present_keys) > 1:
raise ValueError(
'{subject} can not have {keys} at the same time'.format(
subject=subject,
keys=constraint['unique'],
)
)

def set_defaults(self):
"""Set default values to the currently processed data."""
for key, value in self.defaults.items():
if key not in self.data[0]:
self.data[0][key] = value

def validate(self):
"""Validate the current configuration file."""
self.check_constraints()
self.set_defaults()
yamale.validate(self.schema, self.data)
return self.data[0]
96 changes: 96 additions & 0 deletions readthedocs/rtd_tests/fixtures/spec/v2/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Read the Docs configuration file

# The version of the spec to be use
version: enum('2')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this spec will be used only on the version 2, if there is another version, we will need to create another schema.

Also, I was thinking that we can use this to validate the current v1 API. But I'm not sure if that's worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think we can keep v1 as it is.


# Formats of the documentation to be built
# Default: []
formats: any(list(enum('htmlzip', 'pdf', 'epub')), enum('all'), required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only valid for sphinx, I wonder if we can move this to the sphinx key, or maybe we want to support this #1939? (still no pdf for mkdocs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only valid for sphinx for now. We might have other engines that produce a subset of these later. I'm fine with this being top level


# Configuration for Conda support
conda: include('conda', required=False)

# Configuration for the documentation build process
build: include('build', required=False)

# Configuration of the Python environment to be used
python: include('python', required=False)

# Configuration for sphinx documentation
sphinx: include('sphinx', required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the Documentation type I have added a sphinx key. We want to still support mkdocs, right? If so, do you prefer the documentation type key back or having a mkdocs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use this to extend the build and support things like #1139

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It does seem reasonable to have a sphinx and mkdocs section, and just always default to Sphinx unless we see a mkdocs section or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two different sections could be useful be extended in the future.

In case this works as Eric mentioned, we will need a check to to accept one or the other, but not both.


# Configuration for mkdocs documentation
mkdocs: include('mkdocs', required=False)

# Submodules configuration
submodules: include('submodules', required=False)

# Redirects for the current version to be built
# Key/value list, represent redirects of type `type`
# from url -> to url
# Default: null
redirects: map(enum('page'), map(str(), str()), required=False)

---

conda:
# The path to the Conda environment file from the root of the project
environment: path()

build:
# The build docker image to be used
# Default: '2.0'
image: enum('1.0', '2.0', 'latest', required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per readthedocs/readthedocs-docker-images#62, this will be changing:

  • The default should be latest
  • stable is also an allowed tag
  • Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)
  • We have versions 3.0 and 4.0rc1 (soon to be 4.0), we could have a 5.0 before this spec changes at all. I don't know how we plan to handle this (update the spec as images are released, and don't bump the version number of the spec?). At least we should include current images here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to update the spec without increasing the version number. If there is a v3 spec we can stop updating the images there and add new ones in the most recent spec.

Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)

But it is is our docs https://docs.readthedocs.io/en/latest/yaml-config.html#build-image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's changing. We don't want to support numeric versions directly anymore. See the discussion in readthedocs/readthedocs-docker-images#62 for more background.

I think we just need to update the spec without increasing the version number.

Yup, agreed. I think this just needs our current images then.


python:
# The Python version
# Default: '3.6'
version: enum('2', '2.7', '3', '3.5', '3.6', required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do here. This is dependent on the docker image. Our current latest image (our default image will soon be latest) has 2.7, 3.3, 3.4, 3.5, and 3.6. The 4.0 image has 2.7, 3.5, 3.6, and should probably have 3.7 soon.

I think verifying the python version list is a config parsing level validation, we don't necessarily need to worry about validating across fields as part of this exercise. Perhaps here, for our spec discussion, we just put all possible python versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps here, for our spec discussion, we just put all possible python versions?

I'd say yes


# The path to the requirements file from the root of the project
# Default: null
requirements: path(required=False)

# Install the project using python setup.py install or pip
# Default: null
install: enum('pip', 'setup.py', required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first breaking change :), I think is more simple to do this, instead of having a boolean field for each

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the extra requirements field just works when the project is installed using pip, but setup.py doesn't allow this too? 🤔


# Extra requirements sections to install in addition to the package dependencies
# Default: []
extra_requirements: list(str(), required=False)

# Give the virtual environment access to the global site-packages dir
# Default: false
system_packages: bool(required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't 100% sure if this should belong in the python key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly want this option to go away, but I don't know if we can do it quite yet. I think it can go in python though.


sphinx:
# The path to the conf.py file
# Default: rtd will try to find it
configuration: path(required=False)

# Add the -W option to sphinx-build
# Default: false
fail_on_warning: bool(required=False)

mkdocs:
# The path to the mkdocs.yml file
# Default: rtd will try to find it
configuration: path(required=False)

# Add the --strict optio to mkdocs build
# Default: false
fail_on_warning: bool(required=False)


submodules:
# List of submodules to be included
# Default: []
include: any(list(str()), enum('all'), required=False)

# List of submodules to be ignored
# Default: []
exclude: any(list(str()), enum('all'), required=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to mark these include and exclude options mutually exclusive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, there isn't :/, I'll see if there is a way to implement a custom validator to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we'll probably need some kind double validation:

  • First validation is whether it passes the schema
  • Second will be whether the values are actually correct in RTD (eg. a redirect with a nonsense string, or mutually exclusive options (eg. sphinx & mkdocs) and similar. Something like field-level and model-level form validation.


# Do a recursive clone?
# Default: false
recursive: bool(required=False)
Loading