Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Versioning in Catalogs #961

Open
3 tasks
ohsh6o opened this issue Jun 11, 2021 · 16 comments
Open
3 tasks

Dataset Versioning in Catalogs #961

ohsh6o opened this issue Jun 11, 2021 · 16 comments
Labels
Aged A label for issues older than 2023-01-01 enhancement question User Story
Milestone

Comments

@ohsh6o
Copy link
Contributor

ohsh6o commented Jun 11, 2021

User Story:

As an OSCAL developer, I want to explicitly explain what dataset (NIST 800-53, ISO-27001 respectively) and which version of that dataset (respectively 4.0 and 5.1; ISO/IEC 27001:2013 and ISO/IEC 27001:2018) is the source of the catalog and resolved profile catalog without using human interpretation of semantic context, or externalized file and directory naming.

Goals:

When processing catalogs in software, especially beyond the pre-existing oscal-content resources, especially beyond SP800-53 baselines, ETL pipelines cannot rely on explicit file and directory names. Introspecting the content of an OSCAL catalog and resolved profile catalog, the closest to understanding the catalog's content is "800-53 Revision 5, version 5.1 from NIST specifically" requires reading in file names or free-form title text values. This is achievable, but runs counter the objectives of OSCAL with structured, machine-readable content.

First-Order Question and Goal

First order problem: within a given document, how do we determine the origin (provenance) and that origin document’s version when referenced in a particular document?

  • We currently have //metadata/version
  • Maybe not a defined way to determine provenance
  • document-id exists but may not be sufficiently
  • This other idea
    • “Is this document 800-53?” “Or 800-53 Rev 5?” “Is it ISO-27001?” or “ISO/IEC 27000:2018 or ISO/IEC 27000:2013?”
  • Maybe we need a series or dataset prop to clarify questions like: “is this document 800-53?”
    • Continue the version to answer questions like: “Or 800-53 Rev 5?” “ISO/IEC 27000:2018 or ISO/IEC 27000:2013?”

Second-Order Question and Goal

Second order question: In a resolved profile catalog, how do I know the provenance of the profile the new profile is based off of?

Dependencies:

Acceptance Criteria

  • All OSCAL website and readme documentation affected by the changes in this issue have been updated. Changes to the OSCAL website can be made in the docs/content directory of your branch.
  • A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
  • The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.

{The items above are general acceptance criteria for all User Stories. Please describe anything else that must be completed for this issue to be considered resolved.}

ohsh6o pushed a commit to ohsh6o/OSCAL that referenced this issue Jun 11, 2021
@ohsh6o
Copy link
Contributor Author

ohsh6o commented Jun 11, 2021

We discussed this today in the model meeting and a few pieces of feedback or other impressions on this work came up.

  • Is this a deficiency in the current Metadata model: yes, no, or maybe?
  • Can we just add additional prop elements to metadata for dataset-name and dataset-version and the like?
  • @rgauss had some interesting insights into potentially refining this further to match a Maven-like POM structure with group, artifactId, and version.
    • More interesting is: can this be the first parts of an open public catalog/profile registry and this question, and going further with Maven like metadata, would take it a step further.
  • @mosi-k-platt had some interesting insights into the applicability here for UCF but I know more of them in respect to accredited mapping of compliance controls, I would like to hear more about how this fits in.

@david-waltermire
Copy link
Contributor

@ohsh6o Do you have an update to this proposal based on your notes above?

@ohsh6o
Copy link
Contributor Author

ohsh6o commented Jul 9, 2021

@ohsh6o Do you have an update to this proposal based on your notes above?

I believe the last model meeting we had, I believe consensus would be to keep as close to the current v1.0.0 models (with a o:prop) as possible. The main issue is the profile resolution pipeline changes. I presume we can discuss in the meeting today, @david-waltermire-nist ? :-)

@david-waltermire
Copy link
Contributor

@ohsh6o Can you create a concrete change proposal listing each property to add and providing a corresponding definition. We talked about organization name (or party reference), dataset-name, and dataset-version on the 7/9/2021 model review.

How do we point to the dataset-source? Maybe by a reference to a backmatter resource? This would allow a cryptographic hash to be included.

How do we handle multiple source datasets? Perhaps by multiple backmatter references?

@ohsh6o Will draft an updated proposal that will address this. @david-waltermire-nist will assist.

@ohsh6o
Copy link
Contributor Author

ohsh6o commented Jul 23, 2021

So following up on this comment in anticipation of tomorrow's meeting, I am going to recommend:

  • Add a link/@rel="dataset-source for catalogs and profiles and point that to one or more back-matter/resources.
  • Add the following props for the back-matter/resources to match previously discussed expectations:
    • @name="dataset" @class="collection" to represent a grouping, such as how NIST has Special Publications (SP) versus Internal Reports (IR).
    • @name="dataset" @class="name" to give the specific dataset a particular name.
    • @name="dataset" @class="version" to give the dataset a logical version, such as "5" for Revision 5.
    • @name="dataset" @class="organization" which I am torn over: using a UUID to the relevant /metadata/party/@uuid or a reverse DNS record notation to allow optional grouping the organization that provided the source material; I prefer the latter since it might allow build primitives similar to dependency/package-management tools.

This recommendation would allow for 0 to ∞ dataset-source elements and, with recommended to profile resolution, would permit rudimentary tagging of dataset provenance for one or more sources and optionally tracing it through successive hops of resolution, but that can also be retrieved from a previous hop, if preferred.

<resource uuid="example-uuid">
         <prop name="dataset" class="collection" value="Special Publication"/>
         <prop name="dataset" class="name" value="800-53"/>
         <prop name="dataset" class="version" value="5"/>
         <prop name="dataset" class="organization" value="gov.nist.csrc"/>
</resource>

I have prepared some example code and a presentation for tomorrow.

@ohsh6o
Copy link
Contributor Author

ohsh6o commented Aug 20, 2021

Very interested in how this will figure into the profile resolution updates with #954. It appears this pertains to the Second Order Question and Goal in this issue.

@david-waltermire-nist and @wendellpiez , can we make time in the coming week to discuss feedback on this proposal? I went on leave around the time Dave got back, and I was not sure when we could pick up the technical feedback from your end and work towards realizing this into implementation.

@wendellpiez
Copy link
Contributor

@ohsh6o let's not confuse metadata describing an entity "in the world" (such as a person, place, thing or document) such as "Rev 5 of SP800-53 as published by NIST", with requirements for traceability in the stricter sense, that when an OSCAL catalog is inspected, it can be seen (in applicable cases) to reference a 'document' (or 'serialized instance') (somewhere else) that "turns out" to be a profile that produces that catalog.

Over and above this, whatever metadata you choose to put into either your catalog(s) or your profile(s), as OSCAL instances, is perfectly fine. But such metadata addresses a different set of requirements (even if still a requirement for 'traceability' in a broader sense).

So a FedRAMP profile might well have to say "I am based on Pub X" (with a link) and also, you might want a catalog produced by that profile to be able to point back to the profile, just for traceability/validability.

@ohsh6o
Copy link
Contributor Author

ohsh6o commented Aug 20, 2021

@ohsh6o let's not confuse metadata describing an entity "in the world" (such as a person, place, thing or document) such as "Rev 5 of SP800-53 as published by NIST", with requirements for traceability in the stricter sense, that when an OSCAL catalog is inspected, it can be seen (in applicable cases) to reference a 'document' (or 'serialized instance') (somewhere else) that "turns out" to be a profile that produces that catalog.

Hence I called it a second order question. :-)

Over and above this, whatever metadata you choose to put into either your catalog(s) or your profile(s), as OSCAL instances, is perfectly fine. But such metadata addresses a different set of requirements (even if still a requirement for 'traceability' in a broader sense).

So a FedRAMP profile might well have to say "I am based on Pub X" (with a link) and also, you might want a catalog produced by that profile to be able to point back to the profile, just for traceability/validability.

More generally why do I keep pushing for this? I would like the second order question address so we can have graphs of how people derive catalogs and profiles from one another, just like Github.

Screen Shot 2021-08-20 at 2 49 10 PM

And then once I have that, I want to be able to filter and collect all those that are notionally based in the same dataset. Give me all the graphs of separate unrelated catalogs that believe they are notionally derived from 800-53, OSCAL or not. I think tooling to advance this requires the dataset props and the provenance linkage for people to build tools this way, even if they are not distinctly related to each other. I envision tools for analyzing public catalogs this way. For internal tooling that is more scoped, they will have the same need as well.

But to be clear, I added a comment to ask what deficiencies there are in the proposal of the dataset properties and how to move that forward, not to discuss the implications of the relationship between #961 and #954.

@wendellpiez
Copy link
Contributor

@ohsh6o given what you are saying you would like to accomplish or enable, I think the requirements here are actually pretty open-ended. Especially since I also think there are other approaches to assessing and understanding provenance (actual, purported, assumed or inferred) than those that accept claims made in the metadata at face value (however useful that info might be). Given this as usual I am inclined to a minimalistic approach. So the question 'what have I left out' may not be all that useful. The question should be 'do I have what I need for now'.

@aj-stein-nist
Copy link
Contributor

Moving to Sprint 61.

@aj-stein-nist
Copy link
Contributor

FedRAMP PMO expressed interest in a feature similar to this. Since this is in sprint, I will discuss expectations and timelines with them in the next sync meeting as this is a smallish change.

@GaryGapinski
Copy link

One shortcoming noticed in FedRAMP OSCAL usage is that neither metadata/version nor metadata/oscal-version within a system-security-plan are currently sufficient to identify the related version of NIST SP 800-53 (as adopted by FedRAMP) with which the SSP was prepared. It is unlikely that metadata/version could be used to infer 800-53 version since its meaning is something unrelated.

So some other manner of association will be necessary. import-profile appears inadequate as its target will have analogous shortcomings.

@wendellpiez
Copy link
Contributor

Gary, this makes sense. The open-endedness of the requirement is not (really) a reason not to do it.

What would be best, or possibly some combination?

Let's limit it to the SP800-53 set of catalogs. There are various different ways this could be done - tagging to Github; tagging to UUID of referenced source catalog (too brittle?); tagging to certain metadata found in the source catalog; tagging to nominal version ("best available rev 5" kind of thing). Just to name a couple.

Also it occurs to me these data points are useful in at least two different ways - one, for nominal traceability; two, a statement of intention (as to how a profile should be used). Are these the same and could they be collapsed, or do we need both?

Finally - should this be an OSCAL thing, or maybe really is it a FedRAMP problem to solve using metadata/prop - since their needs will be different from those of other consumers.

In a consuming organization, this could be done either at the boundary, or internally. Again it depends on what need is being met. There might be features available on both sides of the fence.

@Compton-US
Copy link
Contributor

@aj-stein-nist since you have this one on the priority list, I might have a little additional information to share that relates (loosely), but might benefit from a common approach across models. We should chat sometime.

@aj-stein-nist
Copy link
Contributor

@aj-stein-nist since you have this one on the priority list, I might have a little additional information to share that relates (loosely), but might benefit from a common approach across models. We should chat sometime.

Can you set aside 15-30 minutes of time for us to meet and discuss next week in the sprint, thanks?

@aj-stein-nist aj-stein-nist removed their assignment Feb 2, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in NIST OSCAL Work Board Apr 20, 2023
@aj-stein-nist aj-stein-nist reopened this Apr 20, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in NIST OSCAL Work Board Apr 20, 2023
@aj-stein-nist aj-stein-nist removed this from the OSCAL 1.1.0 milestone Jul 27, 2023
@aj-stein-nist aj-stein-nist moved this from In Progress to Todo in NIST OSCAL Work Board Sep 20, 2023
@aj-stein-nist aj-stein-nist added this to the Future milestone Sep 28, 2023
@Arminta-Jenkins-NIST Arminta-Jenkins-NIST added the Aged A label for issues older than 2023-01-01 label Nov 2, 2023
@Arminta-Jenkins-NIST
Copy link
Contributor

At the 11/30 Triage Meeting: the team decided that this ticket is closable. We will leave the ticket open for 1 week (until 12/7/23) to hear any objections or comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aged A label for issues older than 2023-01-01 enhancement question User Story
Projects
Status: Todo
Development

No branches or pull requests

7 participants