Dataset Versioning in Catalogs #961

ohsh6o · 2021-06-11T00:56:55Z

User Story:

As an OSCAL developer, I want to explicitly explain what dataset (NIST 800-53, ISO-27001 respectively) and which version of that dataset (respectively 4.0 and 5.1; ISO/IEC 27001:2013 and ISO/IEC 27001:2018) is the source of the catalog and resolved profile catalog without using human interpretation of semantic context, or externalized file and directory naming.

Goals:

When processing catalogs in software, especially beyond the pre-existing oscal-content resources, especially beyond SP800-53 baselines, ETL pipelines cannot rely on explicit file and directory names. Introspecting the content of an OSCAL catalog and resolved profile catalog, the closest to understanding the catalog's content is "800-53 Revision 5, version 5.1 from NIST specifically" requires reading in file names or free-form title text values. This is achievable, but runs counter the objectives of OSCAL with structured, machine-readable content.

First-Order Question and Goal

First order problem: within a given document, how do we determine the origin (provenance) and that origin document’s version when referenced in a particular document?

We currently have //metadata/version
Maybe not a defined way to determine provenance
document-id exists but may not be sufficiently
This other idea
- “Is this document 800-53?” “Or 800-53 Rev 5?” “Is it ISO-27001?” or “ISO/IEC 27000:2018 or ISO/IEC 27000:2013?”
Maybe we need a series or dataset prop to clarify questions like: “is this document 800-53?”
- Continue the version to answer questions like: “Or 800-53 Rev 5?” “ISO/IEC 27000:2018 or ISO/IEC 27000:2013?”

Second-Order Question and Goal

Second order question: In a resolved profile catalog, how do I know the provenance of the profile the new profile is based off of?

In the vanilla profile, it is present.
It is not in the resolved profile, we can mark the source profile
Might be nicer to insert that in the resolved profile
Work commentary for this question and effort into: Profiles when resolved should show their provenance #680

Dependencies:

For the first order question, there would need to be potentially new props
For the second order question, there would need to be enhancement to the profile resolution standard and OSCAL implementation

Acceptance Criteria

All OSCAL website and readme documentation affected by the changes in this issue have been updated. Changes to the OSCAL website can be made in the docs/content directory of your branch.
A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.

{The items above are general acceptance criteria for all User Stories. Please describe anything else that must be completed for this issue to be considered resolved.}

The text was updated successfully, but these errors were encountered:

ohsh6o · 2021-06-11T16:23:55Z

We discussed this today in the model meeting and a few pieces of feedback or other impressions on this work came up.

Is this a deficiency in the current Metadata model: yes, no, or maybe?
Can we just add additional prop elements to metadata for dataset-name and dataset-version and the like?
@rgauss had some interesting insights into potentially refining this further to match a Maven-like POM structure with group, artifactId, and version.
- More interesting is: can this be the first parts of an open public catalog/profile registry and this question, and going further with Maven like metadata, would take it a step further.
@mosi-k-platt had some interesting insights into the applicability here for UCF but I know more of them in respect to accredited mapping of compliance controls, I would like to hear more about how this fits in.

david-waltermire · 2021-07-09T12:47:40Z

@ohsh6o Do you have an update to this proposal based on your notes above?

ohsh6o · 2021-07-09T13:13:47Z

@ohsh6o Do you have an update to this proposal based on your notes above?

I believe the last model meeting we had, I believe consensus would be to keep as close to the current v1.0.0 models (with a o:prop) as possible. The main issue is the profile resolution pipeline changes. I presume we can discuss in the meeting today, @david-waltermire-nist ? :-)

david-waltermire · 2021-07-09T14:41:40Z

@ohsh6o Can you create a concrete change proposal listing each property to add and providing a corresponding definition. We talked about organization name (or party reference), dataset-name, and dataset-version on the 7/9/2021 model review.

How do we point to the dataset-source? Maybe by a reference to a backmatter resource? This would allow a cryptographic hash to be included.

How do we handle multiple source datasets? Perhaps by multiple backmatter references?

@ohsh6o Will draft an updated proposal that will address this. @david-waltermire-nist will assist.

ohsh6o · 2021-07-23T03:15:37Z

So following up on this comment in anticipation of tomorrow's meeting, I am going to recommend:

Add a link/@rel="dataset-source for catalogs and profiles and point that to one or more back-matter/resources.
Add the following props for the back-matter/resources to match previously discussed expectations:
- @name="dataset" @class="collection" to represent a grouping, such as how NIST has Special Publications (SP) versus Internal Reports (IR).
- @name="dataset" @class="name" to give the specific dataset a particular name.
- @name="dataset" @class="version" to give the dataset a logical version, such as "5" for Revision 5.
- @name="dataset" @class="organization" which I am torn over: using a UUID to the relevant /metadata/party/@uuid or a reverse DNS record notation to allow optional grouping the organization that provided the source material; I prefer the latter since it might allow build primitives similar to dependency/package-management tools.

This recommendation would allow for 0 to ∞ dataset-source elements and, with recommended to profile resolution, would permit rudimentary tagging of dataset provenance for one or more sources and optionally tracing it through successive hops of resolution, but that can also be retrieved from a previous hop, if preferred.

<resource uuid="example-uuid">
         <prop name="dataset" class="collection" value="Special Publication"/>
         <prop name="dataset" class="name" value="800-53"/>
         <prop name="dataset" class="version" value="5"/>
         <prop name="dataset" class="organization" value="gov.nist.csrc"/>
</resource>

I have prepared some example code and a presentation for tomorrow.

ohsh6o · 2021-08-20T15:14:33Z

Very interested in how this will figure into the profile resolution updates with #954. It appears this pertains to the Second Order Question and Goal in this issue.

@david-waltermire-nist and @wendellpiez , can we make time in the coming week to discuss feedback on this proposal? I went on leave around the time Dave got back, and I was not sure when we could pick up the technical feedback from your end and work towards realizing this into implementation.

wendellpiez · 2021-08-20T18:22:23Z

@ohsh6o let's not confuse metadata describing an entity "in the world" (such as a person, place, thing or document) such as "Rev 5 of SP800-53 as published by NIST", with requirements for traceability in the stricter sense, that when an OSCAL catalog is inspected, it can be seen (in applicable cases) to reference a 'document' (or 'serialized instance') (somewhere else) that "turns out" to be a profile that produces that catalog.

Over and above this, whatever metadata you choose to put into either your catalog(s) or your profile(s), as OSCAL instances, is perfectly fine. But such metadata addresses a different set of requirements (even if still a requirement for 'traceability' in a broader sense).

So a FedRAMP profile might well have to say "I am based on Pub X" (with a link) and also, you might want a catalog produced by that profile to be able to point back to the profile, just for traceability/validability.

ohsh6o · 2021-08-20T19:00:14Z

@ohsh6o let's not confuse metadata describing an entity "in the world" (such as a person, place, thing or document) such as "Rev 5 of SP800-53 as published by NIST", with requirements for traceability in the stricter sense, that when an OSCAL catalog is inspected, it can be seen (in applicable cases) to reference a 'document' (or 'serialized instance') (somewhere else) that "turns out" to be a profile that produces that catalog.

Hence I called it a second order question. :-)

Over and above this, whatever metadata you choose to put into either your catalog(s) or your profile(s), as OSCAL instances, is perfectly fine. But such metadata addresses a different set of requirements (even if still a requirement for 'traceability' in a broader sense).

So a FedRAMP profile might well have to say "I am based on Pub X" (with a link) and also, you might want a catalog produced by that profile to be able to point back to the profile, just for traceability/validability.

More generally why do I keep pushing for this? I would like the second order question address so we can have graphs of how people derive catalogs and profiles from one another, just like Github.

And then once I have that, I want to be able to filter and collect all those that are notionally based in the same dataset. Give me all the graphs of separate unrelated catalogs that believe they are notionally derived from 800-53, OSCAL or not. I think tooling to advance this requires the dataset props and the provenance linkage for people to build tools this way, even if they are not distinctly related to each other. I envision tools for analyzing public catalogs this way. For internal tooling that is more scoped, they will have the same need as well.

But to be clear, I added a comment to ask what deficiencies there are in the proposal of the dataset properties and how to move that forward, not to discuss the implications of the relationship between #961 and #954.

wendellpiez · 2021-08-23T16:09:40Z

@ohsh6o given what you are saying you would like to accomplish or enable, I think the requirements here are actually pretty open-ended. Especially since I also think there are other approaches to assessing and understanding provenance (actual, purported, assumed or inferred) than those that accept claims made in the metadata at face value (however useful that info might be). Given this as usual I am inclined to a minimalistic approach. So the question 'what have I left out' may not be all that useful. The question should be 'do I have what I need for now'.

aj-stein-nist · 2022-12-05T22:47:57Z

Moving to Sprint 61.

aj-stein-nist · 2023-01-13T18:16:19Z

FedRAMP PMO expressed interest in a feature similar to this. Since this is in sprint, I will discuss expectations and timelines with them in the next sync meeting as this is a smallish change.

GaryGapinski · 2023-01-16T15:05:28Z

One shortcoming noticed in FedRAMP OSCAL usage is that neither metadata/version nor metadata/oscal-version within a system-security-plan are currently sufficient to identify the related version of NIST SP 800-53 (as adopted by FedRAMP) with which the SSP was prepared. It is unlikely that metadata/version could be used to infer 800-53 version since its meaning is something unrelated.

So some other manner of association will be necessary. import-profile appears inadequate as its target will have analogous shortcomings.

wendellpiez · 2023-01-17T13:14:34Z

Gary, this makes sense. The open-endedness of the requirement is not (really) a reason not to do it.

What would be best, or possibly some combination?

Let's limit it to the SP800-53 set of catalogs. There are various different ways this could be done - tagging to Github; tagging to UUID of referenced source catalog (too brittle?); tagging to certain metadata found in the source catalog; tagging to nominal version ("best available rev 5" kind of thing). Just to name a couple.

Also it occurs to me these data points are useful in at least two different ways - one, for nominal traceability; two, a statement of intention (as to how a profile should be used). Are these the same and could they be collapsed, or do we need both?

Finally - should this be an OSCAL thing, or maybe really is it a FedRAMP problem to solve using metadata/prop - since their needs will be different from those of other consumers.

In a consuming organization, this could be done either at the boundary, or internally. Again it depends on what need is being met. There might be features available on both sides of the fence.

Compton-US · 2023-02-02T14:07:19Z

@aj-stein-nist since you have this one on the priority list, I might have a little additional information to share that relates (loosely), but might benefit from a common approach across models. We should chat sometime.

aj-stein-nist · 2023-02-02T14:15:17Z

@aj-stein-nist since you have this one on the priority list, I might have a little additional information to share that relates (loosely), but might benefit from a common approach across models. We should chat sometime.

Can you set aside 15-30 minutes of time for us to meet and discuss next week in the sprint, thanks?

Arminta-Jenkins-NIST · 2023-11-30T19:15:00Z

At the 11/30 Triage Meeting: the team decided that this ticket is closable. We will leave the ticket open for 1 week (until 12/7/23) to hear any objections or comments.

ohsh6o added enhancement User Story labels Jun 11, 2021

ohsh6o pushed a commit to ohsh6o/OSCAL that referenced this issue Jun 11, 2021

Example of usnistgov#961.

b3c731f

ohsh6o mentioned this issue Jun 11, 2021

Add Dataset Model Presentation in PDF form #963

Merged

8 tasks

ohsh6o mentioned this issue Jul 6, 2021

Catalog and Profile Versioning Strategy GSA/fedramp-automation#88

Closed

2 tasks

This was referenced Jul 23, 2021

Example Addition of Dataset Source for OSCAL #961 ohsh6o/oscal-content#1

Open

Example for dataset source for usnistgov/OSCAL#961 ohsh6o/fedramp-automation#2

Open

ohsh6o pushed a commit to ohsh6o/OSCAL that referenced this issue Jul 23, 2021

Notional OSCAL metadata enhancements for usnistgov#961 v2 design.

e610d2e

This was referenced Jul 23, 2021

Notional OSCAL metadata enhancements for #961 v2 design. ohsh6o/OSCAL#1

Closed

Dataset Versioning Presentation for July 23, 2021 Model Meeting #1000

Merged

This was referenced Jan 25, 2022

Clarify Distinction Between OSCAL Data Types token and NCName in Website #1105

Closed

Version's Number Missing in Generated Docs from Model Documentation Generation Script #1106

Closed

david-waltermire assigned ohsh6o Feb 25, 2022

david-waltermire assigned aj-stein-nist and unassigned ohsh6o Mar 8, 2022

david-waltermire added this to the OSCAL 1.1.0 milestone Mar 8, 2022

david-waltermire added this to NIST OSCAL Work Board Jul 5, 2022

david-waltermire moved this to Todo in NIST OSCAL Work Board Jul 5, 2022

GaryGapinski mentioned this issue Jan 27, 2023

POA&M conversion from XML to JSON creates an assessment-plan usnistgov/oscal-cli#96

Closed

aj-stein-nist removed their assignment Feb 2, 2023

aj-stein-nist closed this as completed Apr 20, 2023

github-project-automation bot moved this from Todo to Done in NIST OSCAL Work Board Apr 20, 2023

aj-stein-nist reopened this Apr 20, 2023

github-project-automation bot moved this from Done to In Progress in NIST OSCAL Work Board Apr 20, 2023

aj-stein-nist removed this from the OSCAL 1.1.0 milestone Jul 27, 2023

aj-stein-nist moved this from In Progress to Todo in NIST OSCAL Work Board Sep 20, 2023

aj-stein-nist added this to the Future milestone Sep 28, 2023

Arminta-Jenkins-NIST added the Aged A label for issues older than 2023-01-01 label Nov 2, 2023

Compton-US added the question label Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Versioning in Catalogs #961

Dataset Versioning in Catalogs #961

ohsh6o commented Jun 11, 2021

ohsh6o commented Jun 11, 2021

david-waltermire commented Jul 9, 2021

ohsh6o commented Jul 9, 2021 •

edited

Loading

david-waltermire commented Jul 9, 2021

ohsh6o commented Jul 23, 2021 •

edited

Loading

ohsh6o commented Aug 20, 2021

wendellpiez commented Aug 20, 2021

ohsh6o commented Aug 20, 2021

wendellpiez commented Aug 23, 2021

aj-stein-nist commented Dec 5, 2022

aj-stein-nist commented Jan 13, 2023

GaryGapinski commented Jan 16, 2023

wendellpiez commented Jan 17, 2023

Compton-US commented Feb 2, 2023

aj-stein-nist commented Feb 2, 2023

Arminta-Jenkins-NIST commented Nov 30, 2023

Dataset Versioning in Catalogs #961

Dataset Versioning in Catalogs #961

Comments

ohsh6o commented Jun 11, 2021

User Story:

Goals:

First-Order Question and Goal

Second-Order Question and Goal

Dependencies:

Acceptance Criteria

ohsh6o commented Jun 11, 2021

david-waltermire commented Jul 9, 2021

ohsh6o commented Jul 9, 2021 • edited Loading

david-waltermire commented Jul 9, 2021

ohsh6o commented Jul 23, 2021 • edited Loading

ohsh6o commented Aug 20, 2021

wendellpiez commented Aug 20, 2021

ohsh6o commented Aug 20, 2021

wendellpiez commented Aug 23, 2021

aj-stein-nist commented Dec 5, 2022

aj-stein-nist commented Jan 13, 2023

GaryGapinski commented Jan 16, 2023

wendellpiez commented Jan 17, 2023

Compton-US commented Feb 2, 2023

aj-stein-nist commented Feb 2, 2023

Arminta-Jenkins-NIST commented Nov 30, 2023

ohsh6o commented Jul 9, 2021 •

edited

Loading

ohsh6o commented Jul 23, 2021 •

edited

Loading