Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create simple non-XML OSCAL development language #203

Closed
redhatrises opened this issue Jun 18, 2018 · 27 comments
Closed

Create simple non-XML OSCAL development language #203

redhatrises opened this issue Jun 18, 2018 · 27 comments
Assignees
Labels
help wanted Scope: Modeling Issues targeted at development of OSCAL formats User Story

Comments

@redhatrises
Copy link
Contributor

User Story:

As an OSCAL developer, maintainer, new member, potential member, new user, etc., I would want a simple non-XML OSCAL specification development language that makes it easy to contribute, extend, and use the OSCAL spec. Using the multiple sources of feedback from SCAP community users about the difficulty of the SCAP specification(s), as an example, the development language of the specification should be non-XML to enable broader usage and acceptance of the specification and the development of the specification.

Goals:

In addition for OSCAL to be easily consumed, the goal would to be able to easily and quickly contribute to the development of the OSCAL specification as well as increase OSCAL's visibility and use in the community.

Dependencies:

Nothing obvious

Acceptance Criteria

  • A simple development language such as YAML, Markdown, Python Dictionaries, or something else that makes it simple to advance the development and acceptance of the OSCAL specification to broader audiences than just government compliance audiences.
@trevor-vaughan
Copy link

Personally, I have no issue with XML at all and, it's actually a great language for transforms into multiple output formats without requiring additional processing.

The issue with SCAP was not XML. The difficulty was that there were no useful GUIs created for manipulating the XML successfully.

To enable broad adoption, it has to be easy to use, regardless of the underlying language. For instance, I can't imagine security officers hand editing YAML on a daily basis.

This can't just be for developers, this needs to be for everyone.

That said, the XML can be carefully constrained in such a way that a markup language can be used to have an 'easy format' (which appears to be what is happening with the JSON markup) and the XML is used for the authoritative source. This way you don't have to reinvent includes or anything else.

I would also highly discourage the use of a specific programming language since different systems have different requirements that may, or may not, be able to be modified.

@david-waltermire
Copy link
Contributor

@trevor-vaughan We have an OSCAL JSON format as an XML alternative. I have nothing against the development of a YAML-to-JSON pipeline, if someone wants to volunteer to work on this issue. I have marked this issue as "Help Wanted" to indicate this.

Personally, I believe the way to broad adoption is through tooling. If we want OSCAL to be usable for a layperson, then we need tooling that takes the need to write content in any format out of the picture. YAML is still a developer language. It's syntax is easier than XML or JSON, but it still requires a developer-level understanding of working with data formats.

This project is focused on data formats and is programming language agnostic. XML, JSON, and YAML can be created and processed by most widely used languages out there. I don't see a reason to be restrictive in this regard.

Agree totally with your points on XML. To your point about SCAP, the lack of GUIs could be the result of SCAP being based on complex formats. The underlying format complexities made creating GUI tools difficult. IMHO, creating simpler formats is a key step towards more tooling.

@redhatrises
Copy link
Contributor Author

@david-waltermire-nist this request is to make this repo (not the resultant schema content) non-XML. Sorry for the confustion as I don't think that I was clear originally.

The issue with SCAP was not XML. The difficulty was that there were no useful GUIs created for manipulating the XML successfully.

This is actually not true either. Multiple feedback sources have expressed this for SCAP, and it explicitly being in XML. XML also makes tooling much harder and difficult to create hence the lack of useful GUIs and tools.

This can't just be for developers, this needs to be for everyone.

It for sure does need to be for everyone; however, the content has to be simple for non-GUI AND GUI formats. XML is really not that especially when we talk about not using XML for machine language (which it shouldn't be used).

For instance, I can't imagine security officers hand editing YAML on a daily basis.

From my experience at multiple operations sites across the military, government agencies, and system integrators, they do hand edit, and they want that capability. However, it cannot be just limited, like you said, to hand editing, GUI tools are needed as well. Tooling is important, but the base format has to be simpler otherwise lack of adoption, tooling, and GUIs will persist.

Anyway, this issue is to take what is in this repo (not people developing) and make the content non-XML for faster development purposes as well as have this repo automatically generate the schema in the various output formats like JSON, YAML, XML, new_format, etc.

@david-waltermire
Copy link
Contributor

@redhatrises I don't think that we need to make the repo non-XML to address your concerns.

A key philosophy of this project is to be multi-format. Our priority has been to make equivalent formats for all of our models in XML/XHTML and JSON/Markdown. Both formats are first-class OSCAL representations, allowing seamless translation between the two. The provides for the following:

  1. An implementer can choose which format they want to support, and OSCAL content can be easily translated from the opposite format.
  2. We can use best of breed tools for data production and transformation. XML has much more capability in this space vs. JSON.
  3. Using mechanical, automated processes, we can provide Catalog and Profile content in both formats. This allows for easy creation and maintenance of examples in one format, and then automatic translation to the other.
  4. Documentation on the OSCAL models can be largely shared between the two formats.

We use XML at NIST as our development format, not because we prefer it, but because we have resources and good tooling to develop using it. Other contributors can use JSON or another format if they prefer. Given the ability to translate between the two (and perhaps others), I don't think it matters in the end. We are close to having the ability for the repo to automatically generate the schema in the various output formats like JSON, XML, and new formats like YAML.

We haven't picked a preferred format for OSCAL yet. If we did, my preference would probably be for JSON, given the momentum in the industry around its use. My preference is to defer on picking a preferred format for the time being until we better understand as a community how OSCAL will be used. What the community prefers will become crystal clear in time.

As I mentioned before, we are open to supporting YAML as well. We would need to work out if YAML could be a 3rd first-class citizen (alongside XML and JSON), or a useful subset. At the moment, the sprint team doesn't have the bandwidth to add a new format into the mix. @redhatrises would you be willing to contribute work on this? We would be happy to add you to our sprint team if you are willing to take on regular work on the topic. If not, you are also welcome to work on issues on your own time.

@trevor-vaughan
Copy link

@david-waltermire-nist As an aside, the only reason I like YAML over JSON is the ability to add comments. It's also why I don't mind simple, well formatted, XML.

Simple XML isn't an issue, it's when teams try to drag the world in that things become a mess.

@redhatrises
Copy link
Contributor Author

At the moment, the sprint team doesn't have the bandwidth to add a new format into the mix. @redhatrises would you be willing to contribute work on this? We would be happy to add you to our sprint team if you are willing to take on regular work on the topic. If not, you are also welcome to work on issues on your own time

@david-waltermire-nist sorry for the late reply on this, but I would definitely be willing to contribute work on this as well as join the sprint team. Let me know what I need to do.

@david-waltermire
Copy link
Contributor

david-waltermire commented Jun 28, 2018 via email

@redhatrises
Copy link
Contributor Author

@david-waltermire-nist sent you an email.

@redhatrises
Copy link
Contributor Author

@david-waltermire-nist bump... wanted to check in to see about the meeting invites for the sprint team.

@anweiss
Copy link
Contributor

anweiss commented Oct 4, 2018

@redhatrises are you thinking of some sort of high-level DSL, so to speak?

@redhatrises
Copy link
Contributor Author

@redhatrises are you thinking of some sort of high-level DSL, so to speak?

@anweiss could be for sure. More so thinking something like:

OSCAL development spec language (yaml for example) -> "build system" (python for example)
    -> {oscal.xml, oscal.json, oscal.yaml, ...} official OSCAL spec in different languages

@anweiss
Copy link
Contributor

anweiss commented Oct 9, 2018

Gotcha. So the metaschema aims to provide for this to a certain extent. However, what I think you're after is more for vendors and system owners to more easily describe their implementation in a high-level, human-readable language (e.g. yaml), which can then be processed into formal OSCAL XML and/or JSON .... somewhat akin to how the OpenSCAP SSG works, no?

@shawndwells
Copy link

@anweiss exactly! Creating SCAP by hand was an incredible inhibitor, so the shorthand to create XCCDF now looks like this:

https://github.com/ComplianceAsCode/content/blob/master/applications/openshift/etcd/etcd_cert_file/rule.yml

documentation_complete: true

title: 'Ensure That The etcd Client Certificate Is Correctly Set'

description: |-
    To ensure the <tt>etcd</tt> service is serving TLS to clients,
    edit the <tt>etcd</tt> configuration file
    <tt>/etc/etcd/etcd.conf</tt> on the master and adding a certificate
    to <tt>ETCD_CERT_FILE</tt>:
    <pre>ETCD_CERT_FILE /etc/etcd/server.crt</pre>
rationale: |-
    Without cryptographic integrity protections, information can be
    altered by unauthorized users without detection.
severity: medium

references:
    cis: 1.5.1

ocil_clause: 'the etcd client certificate is not configured'

ocil: |-
    Run the following command on the master node(s):
    <pre>$ grep ETCD_CERT_FILE /etc/etcd/etcd.conf</pre>
    Verify that there is a certificate configured.

A build system (python scripts) assemble the individual pieces into a SCAP specification-correct datastream. Once this shorthand format was implemented humans could start easily creating content. Was key to growing the content developer community.

@redhatrises
Copy link
Contributor Author

Gotcha. So the metaschema aims to provide for this to a certain extent. However, what I think you're after is more for vendors and system owners to more easily describe their implementation in a high-level, human-readable language (e.g. yaml), which can then be processed into formal OSCAL XML and/or JSON .... somewhat akin to how the OpenSCAP SSG works, no?

That is one of the outcomes for sure but not the intention of this issue. Understood about the metaschema, but the bigger intention is to facilitate an easier onboarding and development of the schema itself from outside contributors and vendors. Starting to hear more and more from US Gov integrators and others how they would love to contribute and help to define and possibly extend the schema but won't because infrastructure is XML. So, more so trying to suggest an idea that could be accepted or modified to the benefit of everyone.... not saying, of course, that it has to be accepted or even desired by the project.

@anweiss
Copy link
Contributor

anweiss commented Oct 9, 2018

@shawndwells ok cool!

@redhatrises agreed in that the XML-based metaschema may be a hindrance for folks wanting to contribute to and/or extend the schema itself. @wendellpiez I wonder if we can provide equivalent .yaml representations of the metaschemas that are a bit more approachable to the community? The metaschema model itself is pretty simplistic and a .yaml equivalent would be a nice way of putting the metaschema design notes into something a bit more tangible for folks that are looking to enhance/extend the schemas. Open to any thoughts/suggestions though.

@redhatrises
Copy link
Contributor Author

Here's a draft sample of a possible yaml representation of a catalog that could be used:

catalog:
  name: Catalog
  description: |-
    A collection of controls. Catalogs may use <code>section</code> to subdivide the textual contents of a catalog.

  xmlns: "http://csrc.nist.gov/ns/oscal/1.0"
    type: str
  id:
    required: True
    type: str
  model-version:
    type: str
  title: 
    type: str
    required: True
  declarations:
    name: Declarations
    description: "Either a reference to a declarations file, or a set of declarations"
    href: 
      type: str
  references:
    name: References
    description: "A group of reference descriptions"
    id:
      type: str
    refs:
    - id:
      type: str
      citations: 
      - name: Citations
        description: "Citation of a resource"
        id:
          type: str
        href:
          type: str
        value: 
          type: str

@secautobuilder
Copy link

@redhatrises How do you envision controls, subcontrols, and parameters to be included in this?

@redhatrises
Copy link
Contributor Author

@redhatrises How do you envision controls, subcontrols, and parameters to be included in this?

This is only a sample of the catalog, but it would be similar to the references key. I am toying with a couple of different implementations at the moment and will report back with something more concrete.

@wendellpiez
Copy link
Contributor

This is interesting. Do we have a processor capable of validating instances against such a schema? I am not against YAML indeed I guess I like it as well as or better than JSON - my question has to do more with its expressiveness wrt questions such as labeling, typing, cardinality, how to handle arbitrary mixed content, etc etc. One reason for the metaschema is that it gives us a framework within which to explore these issues before having to make a commitment.

FWIW I don't think there is (or must be) a perfect syntax for every level of the stack here. I could easily see preferring XML for catalog production and maintenance, JSON and XML for (faceted) catalog and profile publication, and a mix of YAML/md for the raw interfaces to users in an implementation layer (with sufficient control of the format to make it capable of pipelining into a syntax-neutral back end).

Bottom line is, I would love to see running code performing useful operations in whatever format, I think there can be a mix. (Indeed I take it as axiomatic that one reason to share a common format is so that people can do their own thing at home.) Then too, if one purpose for the metaschema is to provide information for a return trip from JSON to XML - we ought to be able to do the same thing with YAML data.

@anweiss
Copy link
Contributor

anweiss commented Oct 16, 2018

At the end of the day, I think this is going to come down to how the artifacts are being produced and edited. If the data is being compiled by hand, then yaml certainly needs to be considered as an option as human readability was one of the original design goals of the yaml spec in the first place. But if the data is being automatically generated and consumed by external tooling, then XML or JSON will likely be preferred for that data exchange. If we can properly address #200, then there should be no reason why we couldn't allow for a yaml representation of data with constraints on the functionality one gets.

@wendellpiez
Copy link
Contributor

My point was that while I am interested in cleaner/nicer/better syntaxes in general, in theory, and in principle, I am less interested in practice and day-to-day in specifications without tooling. Show me how you do the validations over YAML and I'm with you. There are plenty of XML folks who don't like XSD either, but it delivers on certain functional requirements like gateway validation of data instances.

I see three general functional areas for schemas --

  • validation - supporting queries returning "valid/not valid" where the "valid" property is meaningful and verifiable, can be (is) specified externally to tooling, and supports predictability of conforming instances for processing in general and in unseen environments
  • documentation - providing context and info for developers and yes sometimes users
  • tools configuration (either compile-time or runtime) for supporting certain functionalities e.g. mappings or user interfaces

On the XML stack, various schema technologies have been delivering all three since prehistory (aka SGML). And we are already using the metaschema architecture for all three, by way of XSD, Schematron and XSLT. Indeed @anweiss you are driving tools in Golang with it, so how hard can it be? Transpose metaschema to the internal model of your choice: why is the syntax a blocker?

So, definitely let's experiment and develop new approaches, but keep an eye on context also. Do these tools exist for YAML or do we propose to develop them? (Or would a YAML schema be parsed into an XML data object and treated as if had been XML all along? Heh.)

@anweiss
Copy link
Contributor

anweiss commented Oct 17, 2018

YAML can be validated with JSON schema ... example here includes a list of YAML validation options and tooling -> https://json-schema-everywhere.github.io/yaml. However, any superset features of YAML that don't exist in JSON (e.g. references, anchors, extends, etc) must be processed into equivalent JSON first before JSON schema can be used to validate the data.

@david-waltermire
Copy link
Contributor

10/25/2018 Status

@redhatrises Will start to investigate representing the SP 800-53 rev4 catalog in YAML.

@redhatrises
Copy link
Contributor Author

Draft SP800-53 catalog. Doesn't pass yamllint yet.

sp800-53_catalog.tar.gz

@david-waltermire
Copy link
Contributor

11/1/2018 Status

Gabe will create a [WIP] labeled pull request with these interim artifacts.

@iMichaela
Copy link
Contributor

11/08/2018

This issue will be resolved when the PR #262 is reviewed and accepted.

@iMichaela
Copy link
Contributor

#262 PR was approved and merged. Under #262, the first draft of SP 800-53 rev4 catalog represented in YAML was generated. The conversion was done from JSON. The team is encouraged to provide additional feedback to #262 .

@david-waltermire david-waltermire added the Scope: Modeling Issues targeted at development of OSCAL formats label May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Scope: Modeling Issues targeted at development of OSCAL formats User Story
Projects
None yet
Development

No branches or pull requests

8 participants